Skip to content

Stop the AI flow when a worker or reviewer turn is incomplete#24257

Open
luisorofino wants to merge 1 commit into
loa/openmetrics-ai-genfrom
loa/incomplete-turn
Open

Stop the AI flow when a worker or reviewer turn is incomplete#24257
luisorofino wants to merge 1 commit into
loa/openmetrics-ai-genfrom
loa/incomplete-turn

Conversation

@luisorofino

@luisorofino luisorofino commented Jun 30, 2026

Copy link
Copy Markdown
Contributor

What does this PR do?

Adds an IncompleteResponseError exception and a require_complete flag to ReActProcess.start(). When require_complete=True, the method raises if the terminal stop reason is anything other than END_TURN (i.e. MAX_TOKENS or OTHER). The flag is opted in at the worker and reviewer call sites in AgenticPhase._start_task, _run_reviewer_once, and the worker-retry path in _drive_goal_loop. Subagents and the memory step are deliberately left on the default require_complete=False.

Motivation

During a demo run, a worker agent hit max_tokens mid-task — its output was cut off. Because AgenticPhase ignored the stop reason, the truncated result was handed straight to the goal reviewer. The reviewer (correctly) rejected the incomplete work, and the retry path then crashed the run.

The root issue is that a truncated turn is not a valid basis for goal validation or for advancing the flow — the model wanted to produce more output and was stopped. Rather than silently proceeding into the reviewer with half-finished work, we now fail fast with a clear, deliberate error.

Subagents keep their existing soft behavior ([SUBAGENT HIT MAX_TOKENS — RESPONSE MAY BE TRUNCATED] prefix) because a truncated subagent is a localized, recoverable degradation, not a reason to abort the whole run.

Review checklist (to be filled by reviewers)

  • Feature or bugfix MUST have appropriate tests (unit, integration, e2e)
  • Add qa/required if this PR needs QA validation, or qa/skip-qa if it does not. Exactly one of the two is required.
  • If you need to backport this PR to another branch, you can add the backport/<branch-name> label to the PR and it will automatically open a backport PR once this one is merged

@dd-octo-sts dd-octo-sts Bot added the ddev label Jun 30, 2026
@luisorofino luisorofino changed the title Add require complete to ReActProcess.start() Stop the AI flow when a worker or reviewer turn is incomplete Jun 30, 2026
@datadog-prod-us1-4

datadog-prod-us1-4 Bot commented Jun 30, 2026

Copy link
Copy Markdown
Contributor

Pipelines  Tests  Code Coverage

Fix all issues with BitsAI

⚠️ Warnings

🚦 9 Pipeline jobs failed

PR | test / test (linux, ubuntu-22.04, ddev, ddev on Linux) / ddev on Linux   View in Datadog   GitHub Actions

PR | test / test (windows, windows-2022, ddev, ddev on Windows) / ddev on Windows   View in Datadog   GitHub Actions

PR | test / test-minimum-base-package (linux, ubuntu-22.04, ddev, ddev on Linux) / minimum-base-package-ddev on Linux   View in Datadog   GitHub Actions

View all 9 failed jobs.

🧪 2 Tests failed in 1 job

PR | run   GitHub Actions

test_from_names_multiple_native from test_registry.py   View in Datadog
assert (&#39;web_search&#39;, &#39;web_fetch&#39;) == [&#39;web_search&#39;, &#39;web_fetch&#39;]
  
  Full diff:
  - [
  &#43; (
        &#39;web_search&#39;,
        &#39;web_fetch&#39;,
  - ]
  &#43; )
test_from_names_multiple_native from test_registry.py   View in Datadog
assert (&#39;web_search&#39;, &#39;web_fetch&#39;) == [&#39;web_search&#39;, &#39;web_fetch&#39;]
  
  Full diff:
  - [
  &#43; (
        &#39;web_search&#39;,
        &#39;web_fetch&#39;,
  - ]
  &#43; )

ℹ️ Info

No other issues found (see more)

❄️ No new flaky tests detected

🎯 Code Coverage (details)
Patch Coverage: 100.00%
Overall Coverage: 88.50%

Useful? React with 👍 / 👎

This comment will be updated automatically if new data arrives.
🔗 Commit SHA: b71179f | Docs | Datadog PR Page | Give us feedback!

@luisorofino luisorofino added the qa/skip-qa Automatically skip this PR for the next QA label Jun 30, 2026
@luisorofino

Copy link
Copy Markdown
Contributor Author

@codex

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b71179f44e

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +152 to +157
if require_complete and response.stop_reason != StopReason.END_TURN:
raise IncompleteResponseError(
f"{self._scope.owner_id} finished with stop_reason="
f"{response.stop_reason.value!r} (expected END_TURN); the turn is incomplete.",
stop_reason=response.stop_reason,
)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve token accounting for incomplete required turns

When require_complete=True and the terminal response is MAX_TOKENS/OTHER, this raises before returning a ReActResult or attaching the accumulated total_input/total_output. In the orchestrated phase path, _start_task only increments phase totals after process.start(...) returns, so a truncated worker/reviewer turn produces a failed checkpoint that omits the tokens spent on that API call; during goal retries it can also drop already accumulated reviewer/compaction tokens. Please carry the token totals on the exception or record them before raising so failure checkpoints remain accurate.

Useful? React with 👍 / 👎.

@luisorofino luisorofino marked this pull request as ready for review June 30, 2026 14:26
@luisorofino luisorofino requested a review from a team as a code owner June 30, 2026 14:26
@dd-octo-sts

dd-octo-sts Bot commented Jun 30, 2026

Copy link
Copy Markdown
Contributor

Validation Report

All 21 validations passed.

Show details
Validation Description Status
agent-reqs Verify check versions match the Agent requirements file
ci Validate CI configuration and code coverage settings
codeowners Validate every integration has a CODEOWNERS entry
config Validate default configuration files against spec.yaml
dep Verify dependency pins are consistent and Agent-compatible
http Validate integrations use the HTTP wrapper correctly
imports Validate check imports do not use deprecated modules
integration-style Validate check code style conventions
jmx-metrics Validate JMX metrics definition files and config
labeler Validate PR labeler config matches integration directories
legacy-signature Validate no integration uses the legacy Agent check signature
license-headers Validate Python files have proper license headers
licenses Validate third-party license attribution list
metadata Validate metadata.csv metric definitions
models Validate configuration data models match spec.yaml
openmetrics Validate OpenMetrics integrations disable the metric limit
package Validate Python package metadata and naming
qa-label Validate the pull request declares whether it needs QA for the next Agent release
readmes Validate README files have required sections
saved-views Validate saved view JSON file structure and fields
version Validate version consistency between package and changelog

View full run

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ddev qa/skip-qa Automatically skip this PR for the next QA team/agent-integrations

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant