fix(inference): prevent silent truncation of large streaming responses by johntmyers · Pull Request #834 · NVIDIA/OpenShell

johntmyers · 2026-04-14T17:53:18Z

🏗️ build-from-issue-agent

Summary

Fix the L7 inference proxy silently dropping tool_calls from large streaming responses. The proxy had three interacting bugs: an aggressive 30s per-chunk idle timeout that killed reasoning model "think" pauses, a reqwest total-request timeout that capped the entire body stream at 60s, and silent truncation that wrote a valid HTTP terminator on error paths — producing correct-looking but incomplete responses.

Related Issue

Closes #829

Changes

crates/openshell-router/src/backend.rs: Extract prepare_backend_request() helper sharing auth/header/body logic; create send_backend_request_streaming() that omits the total request timeout — streaming body liveness is now enforced by the sandbox per-chunk idle timeout
crates/openshell-router/src/lib.rs: Add connect_timeout(30s) to the reqwest Client builder
crates/openshell-sandbox/src/proxy.rs: Increase CHUNK_IDLE_TIMEOUT from 30s to 120s; inject SSE error events before chunked terminator on all truncation paths; wrap streaming relay in BufWriter to reduce per-chunk TLS flush overhead; bump OCSF severity from Low to Medium for truncation events
crates/openshell-sandbox/src/l7/inference.rs: Add format_sse_error() helper for generating parseable SSE error events
crates/openshell-router/tests/backend_integration.rs: Add tests verifying streaming proxy completes without total timeout and buffered proxy still enforces it
architecture/inference-routing.md: Document timeout model, SSE error signaling, and BufWriter behavior

Deviations from Plan

None — implemented as planned

Testing

cargo test --package openshell-router --package openshell-sandbox passes (499 tests)
cargo fmt --all -- --check passes
Unit tests added for format_sse_error() (valid SSE format, JSON escaping)
Integration tests added for streaming/buffered timeout behavior

Tests added:

Unit: format_sse_error_produces_valid_sse_json, format_sse_error_escapes_quotes_in_reason in l7/inference.rs
Integration: streaming_proxy_completes_despite_exceeding_route_timeout, buffered_proxy_enforces_route_timeout in backend_integration.rs

Checklist

Follows Conventional Commits
Architecture docs updated

Documentation updated:

architecture/inference-routing.md: Updated timeout model, response streaming, and truncation signaling sections

The L7 inference proxy silently dropped tool_calls from large streaming responses due to an aggressive 30s per-chunk idle timeout and a reqwest total-request timeout that capped the entire body stream. Reasoning models that pause during "thinking" phases triggered these timeouts, producing valid-looking but truncated HTTP responses with no client-visible error. - Extract prepare_backend_request() helper and create a streaming variant that omits the total request timeout; body stream liveness is now enforced solely by the per-chunk idle timeout - Add 30s connect_timeout to the reqwest Client builder - Increase CHUNK_IDLE_TIMEOUT from 30s to 120s for reasoning models - Inject SSE error events (proxy_stream_error) before the HTTP chunked terminator on all truncation paths so clients can detect data loss - Wrap the streaming relay in BufWriter to reduce per-chunk TLS flush overhead - Bump OCSF severity for streaming truncation from Low to Medium Closes #829

copy-pr-bot · 2026-04-14T17:53:22Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

The BufWriter introduced in the previous commit buffered SSE frames until the 16KB capacity filled or the stream ended, defeating incremental token delivery and potentially reintroducing client-visible timeouts on healthy streams. Revert to per-chunk write_all+flush but keep the single format_chunk() call that coalesces framing into one write. Also fix the streaming integration test: add a 3s mock delay that exceeds the 1s route timeout so the test actually validates that the streaming path omits the total request timeout. Previously the mock responded immediately, passing regardless of timeout behavior.

format_sse_error only escaped `\` and `"`, leaving two problems: 1. Control characters (`\n`, `\r`, `\t`, and all `\u0000-\u001F`) in `reason` produce output that fails `serde_json::from_str` — defeating NVIDIA#834's goal of giving clients a parseable SSE truncation signal. 2. An unescaped `\n\n` inside `reason` splits the single error event into two SSE frames, letting a misbehaving upstream inject a forged frame (e.g. a fake tool_calls delta) into the client's stream. Latent today since all in-tree callers pass static strings, but a footgun for any future caller passing upstream error text, and the function's docstring already invites dynamic reasons. Replace the manual escape with `serde_json::to_writer` (already a workspace dep of `openshell-sandbox`). Add unit tests for control character escaping and SSE event-boundary injection. Closes NVIDIA#840

format_sse_error only escaped `\` and `"`, leaving two problems: 1. Control characters (`\n`, `\r`, `\t`, and all `\u0000-\u001F`) in `reason` produce output that fails `serde_json::from_str` — defeating NVIDIA#834's goal of giving clients a parseable SSE truncation signal. 2. An unescaped `\n\n` inside `reason` splits the single error event into two SSE frames, letting a misbehaving upstream inject a forged frame (e.g. a fake tool_calls delta) into the client's stream. Latent today since all in-tree callers pass static strings, but a footgun for any future caller passing upstream error text, and the function's docstring already invites dynamic reasons. Replace the manual escape with `serde_json::to_writer` (already a workspace dep of `openshell-sandbox`). Add unit tests for control character escaping and SSE event-boundary injection. Closes NVIDIA#840 Signed-off-by: mjamiv <michael.commack@gmail.com>

format_sse_error only escaped `\` and `"`, leaving two problems: 1. Control characters (`\n`, `\r`, `\t`, and all `\u0000-\u001F`) in `reason` produce output that fails `serde_json::from_str` — defeating #834's goal of giving clients a parseable SSE truncation signal. 2. An unescaped `\n\n` inside `reason` splits the single error event into two SSE frames, letting a misbehaving upstream inject a forged frame (e.g. a fake tool_calls delta) into the client's stream. Latent today since all in-tree callers pass static strings, but a footgun for any future caller passing upstream error text, and the function's docstring already invites dynamic reasons. Replace the manual escape with `serde_json::to_writer` (already a workspace dep of `openshell-sandbox`). Add unit tests for control character escaping and SSE event-boundary injection. Closes #840 Signed-off-by: mjamiv <michael.commack@gmail.com>

## Summary Bumps the pinned OpenShell version range from `0.0.29` → `0.0.32` so fresh NemoClaw installs pick up sandbox hardening and TLS improvements from the last three OpenShell releases. ## Notable upstream changes **0.0.30** ([NVIDIA/OpenShell@v0.0.29...v0.0.30](NVIDIA/OpenShell@v0.0.29...v0.0.30)) - Network policy deny rules ([OpenShell#822](NVIDIA/OpenShell#822)) - Preserve ownership on existing `read_write` paths ([OpenShell#827](NVIDIA/OpenShell#827)) - Disable child core dumps ([OpenShell#821](NVIDIA/OpenShell#821)) - Escape control characters in SSE error formatting ([OpenShell#842](NVIDIA/OpenShell#842)) - Fix silent truncation of large streaming inference responses ([OpenShell#834](NVIDIA/OpenShell#834)) **0.0.31** ([NVIDIA/OpenShell@v0.0.30...v0.0.31](NVIDIA/OpenShell@v0.0.30...v0.0.31)) - Inference routed-request header allowlist ([OpenShell#826](NVIDIA/OpenShell#826)) **0.0.32** ([NVIDIA/OpenShell@v0.0.31...v0.0.32](NVIDIA/OpenShell@v0.0.31...v0.0.32)) - **Load system CA certificates for upstream TLS connections** ([OpenShell#862](NVIDIA/OpenShell#862)) - Publish standalone `openshell-gateway` binaries ([OpenShell#853](NVIDIA/OpenShell#853)) ## Changes - `nemoclaw-blueprint/blueprint.yaml`: `min_openshell_version` and `max_openshell_version` → `0.0.32` - `scripts/install-openshell.sh`: `MIN_VERSION` and `MAX_VERSION` → `0.0.32` (`PIN_VERSION` follows `MAX`) - `scripts/brev-launchable-ci-cpu.sh`: default `OPENSHELL_VERSION` → `v0.0.32` - `src/lib/onboard.ts`: blueprint-fallback min version → `0.0.32` - `test/onboard.test.ts`, `test/install-openshell-version-check.test.ts`: fixtures updated; "above MAX" test case moved from `0.0.30` to `0.0.33` Historical `m-dev` comments referencing `0.0.29` left in place — they describe a self-report quirk the sidecar fallback still handles. ## Why not 0.0.33+? `0.0.34` introduced incremental sandbox policy updates and L7 request-target canonicalization — changes with larger surface area against how NemoClaw delivers policy via gRPC. Worth a follow-up PR rather than bundling here. `0.0.35` released hours before this PR was cut — too fresh. ## Type of Change - [x] Code change for a new feature, bug fix, or refactor. ## Testing - [x] `npx vitest run test/install-openshell-version-check.test.ts` — 9 passed - [x] pre-commit hooks (prek) clean: shellcheck, commitlint, gitleaks, YAML validator, CLI test suite - [ ] Nightly E2E on this branch — will be kicked off after PR opens ## Notes - No user-facing CLI behavior changes — just the pinned version range. - Two pre-existing failures in `test/onboard.test.ts` reproduce on clean `main` and are unrelated to this bump. Signed-off-by: Prekshi Vyas <prekshiv@nvidia.com> 🤖 Generated with [Claude Code](https://claude.com/claude-code)  ## Summary by CodeRabbit * **Chores** * Updated OpenShell version constraints and default pinned version to v0.0.32 across configuration, install, and onboarding flows. * **Tests** * Updated test fixtures and expectations to match the new OpenShell version (v0.0.32).  Signed-off-by: Prekshi Vyas <prekshiv@nvidia.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

johntmyers requested a review from a team as a code owner April 14, 2026 17:53

johntmyers self-assigned this Apr 14, 2026

johntmyers mentioned this pull request Apr 14, 2026

L7 inference proxy silently drops tool_calls chunks on large streaming responses #829

Closed

3 tasks

johntmyers added the test:e2e Requires end-to-end coverage label Apr 14, 2026

pimlock previously approved these changes Apr 14, 2026

View reviewed changes

johntmyers dismissed pimlock’s stale review via 3fe2093 April 14, 2026 20:12

johntmyers merged commit 355d845 into main Apr 14, 2026
11 checks passed

johntmyers deleted the fix/829-streaming-proxy-tool-calls/johntmyers branch April 14, 2026 20:34

mjamiv mentioned this pull request Apr 15, 2026

inference proxy: format_sse_error escapes are incomplete (control chars + SSE event injection) #840

Closed

3 tasks

This was referenced Apr 15, 2026

fix(sandbox): escape control characters in format_sse_error #842

Merged

inference proxy: no end-to-end test coverage for the three truncation → SSE-error paths in route_inference_request #846

Open

vnicolici mentioned this pull request Apr 16, 2026

feat(sandbox): make L7 inference proxy CHUNK_IDLE_TIMEOUT configurable per route #866

Open

2 tasks

miyoungc mentioned this pull request Apr 16, 2026

docs: refresh user-facing docs for recent sandbox and inference changes #868

Merged

7 tasks

prekshivyas mentioned this pull request Apr 22, 2026

chore(install): bump OpenShell version to 0.0.32 NVIDIA/NemoClaw#2307

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(inference): prevent silent truncation of large streaming responses#834

fix(inference): prevent silent truncation of large streaming responses#834
johntmyers merged 2 commits intomainfrom
fix/829-streaming-proxy-tool-calls/johntmyers

johntmyers commented Apr 14, 2026

Uh oh!

copy-pr-bot Bot commented Apr 14, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

johntmyers commented Apr 14, 2026

Summary

Related Issue

Changes

Deviations from Plan

Testing

Checklist

Uh oh!

copy-pr-bot Bot commented Apr 14, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants