Skip to content

Disaggregated serving: multi-node routing, load balancing, and failover #201

@inureyes

Description

@inureyes

Context

Follow-up from epic #116 (#126, router front). The router wires RequestRouter but the single-prefill / single-decode setup is degenerate.

Problem

RouterState (src/server/router_front.rs) registers the configured --prefill-peers and --decode-peers in a NodeRegistry and calls RequestRouter::route_to_prefill for selection, but with one prefill node it always returns that node, and the decode node is chosen by the prefill node's own --decode-peers config rather than by the router. So the load-balancing, health, and failover machinery (RequestRouter::apply_backpressure / handle_node_failure, NodeRegistry status) is unexercised. Scaling prefill and decode pools independently is the reason to run disaggregated, so this is the feature that makes it worthwhile.

Implementation

  • Router-driven decode selection: have the router pick the decode node (route_to_decode) and pass it to the prefill node in the request frame, instead of the prefill node choosing from static config, so the router balances both pools.
  • Health and failover: probe peer liveness (or mark a node unreachable on a transport error) and update NodeRegistry status so route_to_prefill / route_to_decode skip dead nodes; on a node failure mid-request, re-route or fail the request cleanly via handle_node_failure.
  • Backpressure: wire apply_backpressure into the admission path so the router queues or rejects when all nodes are at capacity, instead of unconditionally sending.
  • Tests: a multi-prefill / multi-decode E2E (at least 2 of each) asserting requests spread across nodes and that killing one node reroutes subsequent requests.

Acceptance criteria

  • With multiple prefill and decode nodes, requests distribute across them per the routing strategy.
  • A node failure does not wedge the router; subsequent requests route to healthy nodes.

Metadata

Metadata

Assignees

No one assigned

    Labels

    area:architectureArchitecture and code structure changespriority:mediumMedium prioritystatus:readyReady to be worked ontype:enhancementNew features, capabilities, or significant additions

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions