Recover the ReAct loop from MAX_TOKENS truncation mid-tool-call by AAraKKe · Pull Request #24260 · DataDog/integrations-core

AAraKKe · 2026-06-30T13:48:12Z

What does this PR do?

Makes the ReAct loop recover when a model turn stops on MAX_TOKENS while a tool call is still pending.

Previously the loop only fed tool results back when stop_reason == TOOL_USE. If a turn was truncated by the output token limit mid-tool-call, the loop exited leaving a dangling tool_use block in the agent's history. The next send() replayed that history and the provider rejected the request with a 400 (tool_use ids found without a following tool_result).

Changes:

The loop now continues while a tool call is pending (keying off tool_calls rather than the stop reason).
On a truncated turn, each pending tool_use is answered with a synthetic failure tool_result instead of being executed — the call's input is incomplete and unsafe to run. This keeps the conversation valid and prompts the model to retry with a smaller change.
A guard aborts after MAX_CONSECUTIVE_TRUNCATIONS back-to-back truncated turns to avoid an unrecoverable loop.
edit_file tool description nudges toward small, targeted edits to reduce the chance of hitting the limit in the first place.

Motivation

A ddev meta ai flow run failed with 400 ... tool_use ids were found without tool_result blocks. Root cause was a turn truncated at the 8192-output-token limit while emitting a large edit_file call: the unexecuted, dangling tool_use then poisoned the next request (the memory step). This is a harness-level bug independent of any particular flow.

Review checklist (to be filled by reviewers)

Feature or bugfix MUST have appropriate tests (unit, integration, e2e)
Add qa/required if this PR needs QA validation, or qa/skip-qa if it does not. Exactly one of the two is required.
If you need to backport this PR to another branch, you can add the backport/<branch-name> label to the PR and it will automatically open a backport PR once this one is merged

When a turn stops on MAX_TOKENS while a tool call is pending, the tool_use block is truncated and never executed. The loop previously exited on any non-TOOL_USE stop reason, leaving a dangling tool_use in history; the next send() then replays it and the provider rejects the request with a 400 (tool_use without a following tool_result). The loop now continues while a tool call is pending and, on a truncated turn, answers each pending tool_use with a synthetic failure result instead of executing it (the call's input is incomplete and unsafe to run). This repairs the conversation and prompts the model to retry with a smaller change. A guard aborts after repeated consecutive truncations to avoid an unrecoverable loop. Also nudges toward smaller edits in the edit_file tool description.

datadog-datadog-prod-us1 · 2026-06-30T13:51:21Z

Tests

🎉 All green!

🧪 All tests passed
❄️ No new flaky tests detected

🎯 Code Coverage (details)
• Patch Coverage: 98.13%
• Overall Coverage: 88.45%

_{This comment will be updated automatically if new data arrives.

🔗 Commit SHA: c237938 | Docs | Datadog PR Page | Give us feedback!}

native_tool_names returns an immutable tuple; assert against a tuple instead of a list.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 6927671c05

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-06-30T14:26:19Z

+            while response.tool_calls:
+                truncated = response.stop_reason == StopReason.MAX_TOKENS


Preserve pending max-token tool calls before compacting

When a follow-up response contains tool_calls with StopReason.MAX_TOKENS, this new loop keeps it pending, but the auto-compaction block below can still run before the synthetic tool_results are sent. If that response is over the compaction threshold, compact(response) treats it as a normal non-TOOL_USE turn and sends a summary request after an unresolved tool_use, which recreates the provider 400 this change is trying to avoid; this occurs on truncated retries or any tool-result turn that truncates while requesting another tool in a high-context conversation.

Useful? React with 👍 / 👎.

luisorofino

Thank you for the changes!

luisorofino · 2026-07-01T07:59:11Z

+    "maximum output token limit, so the tool call is incomplete. Retry with a smaller, more "
+    "targeted change — edit a single small unique region instead of rewriting a whole file, or "
+    "split the work across several sequential tool calls. For a full-file rewrite prefer "
+    "create_file over one huge edit_file."


Currently, we don't have any way of easily rewriting an entire file. The only way is using edit_file, since create_file only creates a file if it doesn't exist. Maybe, as a follow up, we could include a flag in create_file to allow overwriting if it already exists.

The problem still persist, it is that if it wants to edit (or create) a file that is too big it will run into max_tokens error. We could add an overwrite option to write a file but if we do not mention the token limit that might be an issue.

Maybe, since we know the max tokens set, we could include that into the prompt itself? Inejct it as a guideline for the agent not know not to do weird things?

Makes sense, saw your last commit and looks good. Thank you!

…d pending calls - compact() now checks for pending tool_calls instead of stop_reason == TOOL_USE, so a MAX_TOKENS turn with a dangling tool_use also preserves the last turn through compaction instead of replaying it unresolved and reproducing the provider 400 this recovery path exists to avoid - Add BaseTool.truncated_call_hint so edit_file, create_file, and append_file each give guidance suited to what they do when truncated, instead of one generic message; ToolRegistry.get() looks up the failing tool by name - Type tool_calls as list[ToolCall] instead of bare list

…gistry fixtures ToolRegistry has no external dependencies, so mocking it added nothing but a second, drifting reimplementation of its dispatch logic. Three files each had their own ad hoc FakeTool/MockToolRegistry variants; consolidate into one FakeTool plus fake_tool/tool_registry factory fixtures in tests/ai/conftest.py, used with the real ToolRegistry everywhere. - Add tests/ai/conftest.py: FakeTool double, fake_tool and tool_registry factory fixtures. FakeTool is not meant to be imported directly by tests. - test_process.py: drop MockToolRegistry/RaisingToolRegistry/PerToolRegistry, migrate every test to fake_tool/tool_registry with a real ToolRegistry - test_registry.py: drop its own FakeTool, use the shared fixture - test_anthropic_client.py: drop its own FakeTool, use the shared fixture - test_spawn_subagent.py: drop the MockToolRegistry(ToolRegistry) subclass; FakeProcessFactory now takes a real tool_registry instead of a canned result - Add truncated_call_hint tests for the real EditFileTool/CreateFileTool/ AppendFileTool, and ToolRegistry.get() tests - Add type annotations to every touched test signature

…ed factory ToolRegistry([...]) is a one-liner; wrapping it in a "tool_registry" fixture added a layer that did nothing. Call ToolRegistry directly wherever a registry is needed. fake_tool is now a typed factory function (not the bare class) with real default values, so fake_tool() alone produces a usable tool. Its callable signature is expressed as a Protocol (FakeToolFactory) so tests get proper named/defaulted parameter types instead of Callable[..., FakeTool]. Also fixes a real bug this refactor surfaced: tests annotating parameters with TYPE_CHECKING-only imports (FakeTool/FakeToolFactory) need `from __future__ import annotations`, otherwise the annotation is evaluated eagerly at def time and raises NameError since the import never executes at runtime. Added it to every file using this pattern, matching the existing precedent in anthropic_client.py. Also fixed three tests in test_anthropic_client.py that still referenced FakeTool directly in the test body instead of the fake_tool fixture.

dd-octo-sts · 2026-07-01T14:38:19Z

Validation Report

All 21 validations passed.

Show details

Validation	Description	Status
`agent-reqs`	Verify check versions match the Agent requirements file	✅
`ci`	Validate CI configuration and code coverage settings	✅
`codeowners`	Validate every integration has a CODEOWNERS entry	✅
`config`	Validate default configuration files against spec.yaml	✅
`dep`	Verify dependency pins are consistent and Agent-compatible	✅
`http`	Validate integrations use the HTTP wrapper correctly	✅
`imports`	Validate check imports do not use deprecated modules	✅
`integration-style`	Validate check code style conventions	✅
`jmx-metrics`	Validate JMX metrics definition files and config	✅
`labeler`	Validate PR labeler config matches integration directories	✅
`legacy-signature`	Validate no integration uses the legacy Agent check signature	✅
`license-headers`	Validate Python files have proper license headers	✅
`licenses`	Validate third-party license attribution list	✅
`metadata`	Validate metadata.csv metric definitions	✅
`models`	Validate configuration data models match spec.yaml	✅
`openmetrics`	Validate OpenMetrics integrations disable the metric limit	✅
`package`	Validate Python package metadata and naming	✅
`qa-label`	Validate the pull request declares whether it needs QA for the next Agent release	✅
`readmes`	Validate README files have required sections	✅
`saved-views`	Validate saved view JSON file structure and fields	✅
`version`	Validate version consistency between package and changelog	✅

View full run

AAraKKe added the qa/skip-qa Automatically skip this PR for the next QA label Jun 30, 2026

dd-octo-sts Bot added ddev and removed qa/skip-qa Automatically skip this PR for the next QA labels Jun 30, 2026

AAraKKe added the qa/skip-qa Automatically skip this PR for the next QA label Jun 30, 2026

Fix test_from_names_multiple_native comparing a Sequence to a list

6927671

native_tool_names returns an immutable tuple; assert against a tuple instead of a list.

AAraKKe force-pushed the aarakke/react-truncated-tool-call branch from 78a32fd to 6927671 Compare June 30, 2026 14:20

AAraKKe marked this pull request as ready for review June 30, 2026 14:22

AAraKKe requested a review from a team as a code owner June 30, 2026 14:22

dd-octo-sts Bot added the team/agent-integrations label Jun 30, 2026

chatgpt-codex-connector Bot reviewed Jun 30, 2026

View reviewed changes

luisorofino approved these changes Jul 1, 2026

View reviewed changes

AAraKKe added 3 commits July 1, 2026 14:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Recover the ReAct loop from MAX_TOKENS truncation mid-tool-call#24260

Recover the ReAct loop from MAX_TOKENS truncation mid-tool-call#24260
AAraKKe wants to merge 5 commits into
loa/openmetrics-ai-genfrom
aarakke/react-truncated-tool-call

AAraKKe commented Jun 30, 2026

Uh oh!

datadog-datadog-prod-us1 Bot commented Jun 30, 2026 •

edited

Loading

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Jun 30, 2026

Uh oh!

luisorofino left a comment

Uh oh!

luisorofino Jul 1, 2026

Uh oh!

AAraKKe Jul 1, 2026

Uh oh!

luisorofino Jul 1, 2026

Uh oh!

dd-octo-sts Bot commented Jul 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		while response.tool_calls:
		truncated = response.stop_reason == StopReason.MAX_TOKENS

Uh oh!

Conversation

AAraKKe commented Jun 30, 2026

What does this PR do?

Motivation

Review checklist (to be filled by reviewers)

Uh oh!

datadog-datadog-prod-us1 Bot commented Jun 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jun 30, 2026

Choose a reason for hiding this comment

Uh oh!

luisorofino left a comment

Choose a reason for hiding this comment

Uh oh!

luisorofino Jul 1, 2026

Choose a reason for hiding this comment

Uh oh!

AAraKKe Jul 1, 2026

Choose a reason for hiding this comment

Uh oh!

luisorofino Jul 1, 2026

Choose a reason for hiding this comment

Uh oh!

dd-octo-sts Bot commented Jul 1, 2026

Validation Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

datadog-datadog-prod-us1 Bot commented Jun 30, 2026 •

edited

Loading