Context
Follow-up from epic #116 (#126, router front). The router wires RequestRouter but the single-prefill / single-decode setup is degenerate.
Problem
RouterState (src/server/router_front.rs) registers the configured --prefill-peers and --decode-peers in a NodeRegistry and calls RequestRouter::route_to_prefill for selection, but with one prefill node it always returns that node, and the decode node is chosen by the prefill node's own --decode-peers config rather than by the router. So the load-balancing, health, and failover machinery (RequestRouter::apply_backpressure / handle_node_failure, NodeRegistry status) is unexercised. Scaling prefill and decode pools independently is the reason to run disaggregated, so this is the feature that makes it worthwhile.
Implementation
- Router-driven decode selection: have the router pick the decode node (
route_to_decode) and pass it to the prefill node in the request frame, instead of the prefill node choosing from static config, so the router balances both pools.
- Health and failover: probe peer liveness (or mark a node unreachable on a transport error) and update
NodeRegistry status so route_to_prefill / route_to_decode skip dead nodes; on a node failure mid-request, re-route or fail the request cleanly via handle_node_failure.
- Backpressure: wire
apply_backpressure into the admission path so the router queues or rejects when all nodes are at capacity, instead of unconditionally sending.
- Tests: a multi-prefill / multi-decode E2E (at least 2 of each) asserting requests spread across nodes and that killing one node reroutes subsequent requests.
Acceptance criteria
- With multiple prefill and decode nodes, requests distribute across them per the routing strategy.
- A node failure does not wedge the router; subsequent requests route to healthy nodes.
Context
Follow-up from epic #116 (#126, router front). The router wires
RequestRouterbut the single-prefill / single-decode setup is degenerate.Problem
RouterState(src/server/router_front.rs) registers the configured--prefill-peersand--decode-peersin aNodeRegistryand callsRequestRouter::route_to_prefillfor selection, but with one prefill node it always returns that node, and the decode node is chosen by the prefill node's own--decode-peersconfig rather than by the router. So the load-balancing, health, and failover machinery (RequestRouter::apply_backpressure/handle_node_failure,NodeRegistrystatus) is unexercised. Scaling prefill and decode pools independently is the reason to run disaggregated, so this is the feature that makes it worthwhile.Implementation
route_to_decode) and pass it to the prefill node in the request frame, instead of the prefill node choosing from static config, so the router balances both pools.NodeRegistrystatus soroute_to_prefill/route_to_decodeskip dead nodes; on a node failure mid-request, re-route or fail the request cleanly viahandle_node_failure.apply_backpressureinto the admission path so the router queues or rejects when all nodes are at capacity, instead of unconditionally sending.Acceptance criteria