[pull] main from danny-avila:main by pull[bot] · Pull Request #112 · innFactory/agents

pull · 2026-06-17T22:43:13Z

See Commits and Changes for more details.

Created by pull[bot] (v2.0.0-alpha.4)

Can you help keep this open source service alive? 💖 Please sponsor : )

* feat: support Langfuse trace metadata config * fix: ignore empty Langfuse trace attributes * fix: satisfy Langfuse config lint * chore: import order in langfuse-config.test.ts --------- Co-authored-by: Danny Avila <danacordially@gmail.com>

* ⚡ feat: Single Tail Prompt-Cache Breakpoint Replace the rolling "last two user messages" prompt-cache strategy with a single breakpoint anchored on the conversation tail, mirroring the approach used by Claude Code. Anthropic/OpenRouter now place exactly one ephemeral cache_control marker on the last cacheable block of the final non-synthetic message; Bedrock places a single cachePoint via the new addBedrockTailCacheControl. Because the marker always rides the true tail, the whole prefix is written once and read back as history grows append-only, instead of re-writing large spans every step. - Add addTailCacheControl / addBedrockTailCacheControl (single tail marker), skipping thinking blocks and synthetic skill/meta messages as anchors and stripping all stale markers in one pass. - Wire Graph (Anthropic, OpenRouter, Bedrock), AgentContext system-runnable body path, and summarization to the tail strategy by default. - Keep legacy addCacheControl / addBedrockCacheControl exported for compatibility; update affected tests and add cache.tail.test.ts. * 🩹 fix: Hoist Bedrock cachePoint out of toolResult body for tail breakpoint The single tail prompt-cache breakpoint frequently anchors on a tool result, since agent-loop conversations end with a tool turn before the next model call. addBedrockTailCacheControl writes the cachePoint into the tool message content, but the Converse converter wrapped the entire content (cachePoint included) inside toolResult.content. A cachePoint is a message-level ContentBlock, not a ToolResultContentBlock. Bedrock does not reject the nested form — it silently drops the breakpoint (verified live: cache_creation/cache_read both stay 0), so the tail strategy produced ZERO caching for the most common agent-loop shape. Hoist any cachePoint out of toolResult.content to a message-level sibling after the toolResult block — the only position Bedrock honors. Live Bedrock Converse now shows the tool-result tail writing the prefix on turn 1 (cache_creation) and reading it back on turn 2 (cache_read), matching the Anthropic-direct behavior. - Hoist cachePoint(s) in convertToolMessageToConverseMessage. - Add toolResultCachePoint.test.ts (converter hoist + end-to-end). - Add cache.tail.test.ts case for a trailing string tool-result tail. * 🩹 fix: Keep tail cache breakpoint on a block that survives conversion Two edge cases dropped the single tail breakpoint before the model call, silently regressing to zero message caching (legacy marked human messages, which avoided both paths): 1. Foreign reasoning tail (Anthropic/OpenRouter): isTailCacheableBlock only excluded native `thinking`/`redacted_thinking`, so on a cross-provider handoff the marker could anchor on a `reasoning_content`/`reasoning`/ `think` block — which _convertMessagesToAnthropicPayload drops on assistant turns. The only breakpoint vanished. Now exclude foreign reasoning types from tail anchoring so the marker lands on a surviving text/tool block. 2. Thinking-fold ordering: the tail marker was placed before ensureThinkingBlockInMessages, which folds a trailing non-thinking AI→Tool chain into a `[Previous agent context]` HumanMessage whose builder copies text but not cache_control/cachePoint. Move the provider-specific tail cache insertion (Anthropic, Bedrock, OpenRouter) to run LAST — after thinking normalization and orphan sanitization — so it anchors on the final message list. Verified by inspecting the final _convertMessagesToAnthropicPayload output: the breakpoint now survives in both cases (and a guard test asserts the old mark-before-fold order loses it). - Exclude reasoning_content/reasoning/think in isTailCacheableBlock. - Reorder tail cache insertion after ensureThinkingBlock/sanitizeOrphan in Graph. - Add tailCacheConversion.test.ts and foreign-reasoning cases in cache.tail.test.ts. * 🩹 fix: Harden tail prompt-cache anchor against dropped/stripped tails Three more cases where the single tail breakpoint failed to reach the model; all stem from anchoring on a volatile tail that a later stage drops/rewrites. 1. input_json_delta anchor (Anthropic/OpenRouter): persisted partial tool-input deltas are dropped by _convertMessagesToAnthropicPayload (input is restored onto the tool_use block). Anchoring the marker there lost it. Excluded input_json_delta from tail anchoring (joins the reasoning types), renaming the set to NON_ANCHORABLE_BLOCK_TYPES. 2. toolOutputReferences annotation (functional regression): prompt caching rewrites a string ToolMessage tail into a text-block array to host its marker; annotateMessagesForLLM only applied the live `[ref: …]` annotation to STRING tool content, so the common tool-result tail silently lost its reference marker once cached. annotateMessagesForLLM now projects the live ref (and unresolved warning) onto array tool content too. 3. assistant-prefill strip (Claude 4.6+): stripUnsupportedAssistantPrefill pops a trailing assistant prefill right before the API call; if the only tail breakpoint rode it, message caching was lost. It now re-anchors the breakpoint onto the new tail (only when one was actually removed, so caching-off requests stay untouched), reusing addTailCacheControl to honor the same exclusions. Tests: stripPrefillCache.test.ts (re-anchor); array live-ref cases in annotateMessagesForLLM.test.ts; input_json_delta is covered by the NON_ANCHORABLE_BLOCK_TYPES exclusion. tsc + lint clean. * 🩹 fix: Hoist Anthropic tool_result cache_control onto the top-level block The single tail breakpoint frequently anchors on a tool result. For a string ToolMessage tail, addTailCacheControl rewrites it to a text-block array carrying cache_control, and _ensureMessageContents nests that block inside tool_result.content. The Anthropic API currently honors that nested marker — verified live with an isolated, system-prompt-free large tool result (control no-marker => cache_creation 0; nested marker => 10232 written then read) — so it is not broken today. But Anthropic documents the top-level messages.content block as the cacheable position and does not document sub-content caching, so relying on the nested form is fragile. Hoist any cache_control off the inner tool-result content onto the generated tool_result block itself (mirrors the Bedrock cachePoint hoist). Live-verified end to end: control no-marker => cache_creation 0; hoisted marker => 12354 written on turn 1, read on turn 2. - Add hoistToolResultCacheControl; apply it in _ensureMessageContents. - tailCacheConversion.test.ts now asserts the marker lands on the tool_result block, not nested. * 🩹 fix: Keep orphan sanitization enabled for prompt-cached sends Moving the tail cache marker to run after sanitizeOrphanToolBlocks (so the marker survives the thinking fold) had a side effect: the marker no longer reassigns finalMessages before the `needsOrphanSanitize` gate is evaluated. For a prompt-cached Anthropic/Bedrock send whose pruner returned the context unchanged (finalMessages === messagesToUse), the gate went false and orphaned AI/tool pairs from persisted history could reach the provider and fail structural validation — whereas the pre-move code always reassigned first. Compute the prompt-cache strategy up front and add `willAddTailCache` to the sanitize gate, so cached sends are cleaned before the marker is applied (restoring the pre-move guarantee). Collapses the cache-insertion branch to the same up-front booleans. * 🩹 fix: Orphan-sanitize system-runnable prompt-cached sends too The previous gate used "this node will add the marker" (which excludes the system-runnable path via !systemRunnable). But when a system runnable owns the system prompt, AgentContext still adds the body cache marker — so those are cached sends that must be orphan-sanitized as well. With prompt caching + system runnable + a pruner that returned the context unchanged, orphaned AI/tool pairs from persisted history could still reach the provider. Track two separate facts: `providerPromptCacheEnabled` (caching is on for the provider at all — drives orphan cleanup, system-runnable included) vs. the node-adds-the-marker condition (Anthropic/OpenRouter minus systemRunnable, or Bedrock — drives the insertion). The sanitize gate now uses the former. * 🩹 fix: Break import cycle from the prefill re-anchor The P3-1 re-anchor imported addTailCacheControl from @/messages/cache into the Anthropic converter, closing a cycle: messages/format.ts -> llm/anthropic/utils/message_inputs.ts -> messages/cache.ts -> messages/format.ts which the bundler's circular-dependency check (npm run build:dev) flags. Replace the cross-module reuse with a small local re-anchor that operates on the already-converted Anthropic payload. This is also more correct: at that stage the converter has already dropped foreign-reasoning / input_json_delta blocks, so only native thinking blocks need excluding, and the post-strip tail is always a user message. Live-reverified: turn1 cache_creation=6264, turn2 read=6264. * 📊 test: Live reproducible prompt-cache benchmark (tail vs legacy) Add a committed, live benchmark that empirically justifies the single tail breakpoint over the legacy "last two user messages" strategy, plus a doc with representative results. bench-prompt-cache.ts replays three realistic harness shapes (agent tool loop, multi-turn chat, realistic agent) under BOTH strategies over the same conversations in separate cache namespaces, against a real provider, and reports per-call cache token breakdowns. `fresh` (uncached, full-price input) is derived provider-agnostically from total_tokens-output_tokens minus the cache buckets, since Anthropic folds cache tokens into input_tokens while Bedrock reports them separately. Result (live, claude-sonnet-4-5): the tail strategy is cheaper in every scenario on both Anthropic and Bedrock. Legacy reprocesses tens of thousands of full-price tokens in any tool-bearing conversation (its lone user-message marker leaves the growing transcript uncached); tail reduces that to ~0 and reads the prefix back. Effective cost −30..−38% (Anthropic), −9..−15% (Bedrock); even legacy's best case (frequent user messages) ties-or-wins. - src/scripts/bench-prompt-cache.ts (excluded from build/CI; real paid calls) - npm run bench:cache [-- --provider bedrock|anthropic --rounds N --model id] - docs/prompt-cache-benchmark.md * 📊 test: Add post-compaction scenario to the prompt-cache benchmark Covers the two transcript-mutating harness behaviors raised in review: - Tool truncation: a non-issue for caching — applied once at tool-exec with a model-fixed (turn-invariant) cap by the already-tested, deterministic truncateToolResultContent, so a truncated result is a stable prefix block. Documented; no separate scenario needed (existing tool-loop already exercises tool results in the cached prefix). - Compaction (summarization): add a post-compaction scenario — a few pre- compaction tool rounds, a head→summary swap (one-time cache miss for any strategy), then continued tool rounds. Confirms the tail strategy re-establishes append-only caching on the new summary-headed prefix. Live result (claude-sonnet-4-5): tail wins 4/4 scenarios on BOTH Anthropic and Bedrock. Post-compaction is among the largest wins (Anthropic effective −41%, read +76%) because after compaction the summary is the only user message, so legacy re-sends all continued tool work uncached (fresh 63k → 42). docs/prompt-cache-benchmark.md updated with the 4-scenario tables and a truncation/compaction section.

upman and others added 2 commits June 17, 2026 13:34

pull Bot locked and limited conversation to collaborators Jun 17, 2026

pull Bot added the ⤵️ pull label Jun 17, 2026

pull Bot merged commit f32a9aa into innFactory:main Jun 17, 2026
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[pull] main from danny-avila:main#112

[pull] main from danny-avila:main#112
pull[bot] merged 2 commits into
innFactory:mainfrom
danny-avila:main

pull Bot commented Jun 17, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

pull Bot commented Jun 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

pull Bot commented Jun 17, 2026 •

edited

Loading