Skip to content

Fix Claude Code provider response capture for subtype=success messages#263

Closed
santoshkumarradha wants to merge 3 commits into
mainfrom
feature/281ab230-claude-harness-response-fix
Closed

Fix Claude Code provider response capture for subtype=success messages#263
santoshkumarradha wants to merge 3 commits into
mainfrom
feature/281ab230-claude-harness-response-fix

Conversation

@santoshkumarradha
Copy link
Copy Markdown
Member

@santoshkumarradha santoshkumarradha commented Mar 13, 2026

Summary

Changes

  • Modified (line 92)
  • Added support for Claude Agent SDK's message format
  • Maintains backward compatibility with existing 'type=result' format

Test plan

  1. Run all provider tests: sss............................................s........................ [ 9%]
    ........................................................................ [ 19%]
    ........................................................................ [ 29%]
    ........................................................................ [ 39%]
    ........................................................................ [ 49%]
    ........................................................................ [ 58%]
    ........................................................................ [ 68%]
    ........................................................................ [ 78%]
    ........................................................................ [ 88%]
    ........................................................................ [ 98%]
    ............. [100%]
    =============================== warnings summary ===============================
    tests/test_harness_schema.py:23
    /workspaces/agentfield/sdk/python/tests/test_harness_schema.py:23: PytestCollectionWarning: cannot collect test class 'TestSchema' because it has a init constructor (from: tests/test_harness_schema.py)
    class TestSchema(BaseModel):

tests/test_agent_field_handler.py::test_register_with_agentfield_applies_discovery_payload
/usr/local/lib/python3.12/site-packages/pydantic/v1/json.py:12: RuntimeWarning: coroutine 'test_call_function_async..async_func' was never awaited
from pydantic.v1.color import Color
Enable tracemalloc to get traceback where the object was allocated.
See https://docs.pytest.org/en/stable/how-to/capture-warnings.html#resource-warnings for more info.

tests/test_agent_field_handler.py::test_register_with_agentfield_applies_discovery_payload
tests/test_agent_integration.py::test_agent_reasoner_routing_and_workflow
tests/test_agent_integration.py::test_callback_url_precedence_and_env
tests/test_agent_integration.py::test_callback_url_precedence_and_env
tests/test_agent_networking.py::test_build_callback_discovery_payload_marks_container
/workspaces/agentfield/sdk/python/agentfield/agent.py:1503: DeprecationWarning: datetime.datetime.utcnow() is deprecated and scheduled for removal in a future version. Use timezone-aware objects to represent datetimes in UTC: datetime.datetime.now(datetime.UTC).
"submitted_at": datetime.utcnow().isoformat() + "Z",

tests/test_client.py: 4 warnings
tests/test_client_auth.py: 5 warnings
tests/test_client_execution_paths.py: 12 warnings
/workspaces/agentfield/sdk/python/agentfield/client.py:1020: DeprecationWarning: datetime.datetime.utcnow() is deprecated and scheduled for removal in a future version. Use timezone-aware objects to represent datetimes in UTC: datetime.datetime.now(datetime.UTC).
metadata["timestamp"] = datetime.datetime.utcnow().isoformat()

tests/test_client_unit.py::test_generate_id_prefix_and_uniqueness
tests/test_client_unit.py::test_generate_id_prefix_and_uniqueness
/workspaces/agentfield/sdk/python/agentfield/client.py:169: DeprecationWarning: datetime.datetime.utcnow() is deprecated and scheduled for removal in a future version. Use timezone-aware objects to represent datetimes in UTC: datetime.datetime.now(datetime.UTC).
timestamp = datetime.datetime.utcnow().strftime("%Y%m%d_%H%M%S")

tests/test_did_manager.py::test_create_execution_context
/workspaces/agentfield/sdk/python/agentfield/did_manager.py:195: DeprecationWarning: datetime.datetime.utcnow() is deprecated and scheduled for removal in a future version. Use timezone-aware objects to represent datetimes in UTC: datetime.datetime.now(datetime.UTC).
timestamp=datetime.utcnow(),

tests/test_http_connection_manager.py::test_connection_manager_start_close
tests/test_http_connection_manager.py::test_connection_manager_context_manager
tests/test_http_connection_manager.py::test_connection_manager_double_start
tests/test_http_connection_manager.py::test_connection_manager_start_after_close
tests/test_http_connection_manager.py::test_connection_manager_get_session
tests/test_http_connection_manager.py::test_connection_manager_request_timeout
tests/test_http_connection_manager.py::test_connection_manager_batch_request
tests/test_http_connection_manager.py::test_connection_manager_health_check
tests/test_http_connection_manager.py::test_connection_manager_properties
/usr/local/lib/python3.12/site-packages/aiohttp/connector.py:993: DeprecationWarning: enable_cleanup_closed ignored because python/cpython#118960 is fixed in Python version sys.version_info(major=3, minor=12, micro=13, releaselevel='final', serial=0)
super().init(

tests/test_vc_generator.py::test_generate_execution_vc_success
/workspaces/agentfield/sdk/python/tests/test_vc_generator.py:37: DeprecationWarning: datetime.datetime.utcnow() is deprecated and scheduled for removal in a future version. Use timezone-aware objects to represent datetimes in UTC: datetime.datetime.now(datetime.UTC).
"created_at": datetime.utcnow().isoformat() + "Z",

tests/test_vc_generator.py::test_generate_execution_vc_success
tests/test_vc_generator.py::test_generate_execution_vc_disabled
/workspaces/agentfield/sdk/python/tests/test_vc_generator.py:16: DeprecationWarning: datetime.datetime.utcnow() is deprecated and scheduled for removal in a future version. Use timezone-aware objects to represent datetimes in UTC: datetime.datetime.now(datetime.UTC).
timestamp=datetime.utcnow(),

tests/test_vc_generator.py::test_create_workflow_vc
/workspaces/agentfield/sdk/python/tests/test_vc_generator.py:82: DeprecationWarning: datetime.datetime.utcnow() is deprecated and scheduled for removal in a future version. Use timezone-aware objects to represent datetimes in UTC: datetime.datetime.now(datetime.UTC).
"start_time": datetime.utcnow().isoformat() + "Z",

tests/test_vc_generator.py::test_create_workflow_vc
/workspaces/agentfield/sdk/python/tests/test_vc_generator.py:83: DeprecationWarning: datetime.datetime.utcnow() is deprecated and scheduled for removal in a future version. Use timezone-aware objects to represent datetimes in UTC: datetime.datetime.now(datetime.UTC).
"end_time": datetime.utcnow().isoformat() + "Z",

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html

---------- coverage: platform linux, python 3.12.13-final-0 ----------
Name Stmts Miss Cover Missing

agentfield/agent_field_handler.py 163 47 71% 142-148, 161, 193-221, 229, 234, 282-284, 333, 529-554
agentfield/client.py 788 263 67% 72, 76, 83-86, 98-99, 200, 202, 226, 239, 244, 249, 262-263, 285-286, 291-293, 334, 349, 352, 354, 361, 363, 365, 367, 421, 441, 445-453, 462, 464, 498, 506-509, 518, 522, 542, 548, 556-567, 585, 637, 677, 692-693, 696, 700-702, 777, 781, 785, 789, 811-812, 826-827, 854-855, 865-873, 906, 930-932, 949, 973-984, 993, 1016, 1018, 1028-1029, 1045-1048, 1099-1100, 1150-1151, 1194, 1235, 1251-1252, 1255-1268, 1273-1282, 1293-1303, 1333-1382, 1404-1423, 1449-1484, 1508-1528, 1551-1570, 1593-1614, 1631-1647, 1664-1679, 1723, 1734-1735, 1775-1776, 1847-1854
agentfield/execution_context.py 119 1 99% 79
agentfield/execution_state.py 213 24 89% 113, 117, 230, 262, 279-280, 348-349, 353, 450, 454-457, 461-464, 468-470, 474, 478, 482
agentfield/memory.py 254 35 86% 116-119, 127, 272, 284, 338-339, 342, 413, 424, 462, 491, 504, 608, 690-720, 844, 856, 870, 882, 894
agentfield/result_cache.py 207 36 83% 41, 47, 73, 143-144, 162, 173-174, 194-195, 229, 238, 265-269, 273-276, 310-311, 325-331, 404-405, 420-421, 429-430, 434-435

TOTAL 1849 406 78%

1 file skipped due to complete coverage.

=========================== short test summary info ============================
SKIPPED [1] tests/integration/test_agentfield_end_to_end.py:41: AgentField server sources not available in this checkout
SKIPPED [1] tests/integration/test_agentfield_end_to_end.py:74: AgentField server sources not available in this checkout
SKIPPED [1] tests/integration/test_agentfield_end_to_end.py:114: AgentField server sources not available in this checkout
SKIPPED [1] tests/test_agent_cli.py:286: Complex argparse mocking - functionality tested in integration
729 passed, 4 skipped, 12 deselected, 45 warnings in 11.40s
2. Verify 6 tests pass including 2 new test cases:

  • subtype='success' extraction
  • Mixed message format handling
  1. Manual verification:
  2. Confirm grep shows both conditions in the detection logic

Related Issues

Closes #252


🤖 Built with AgentField SWE-AF
🔌 Powered by AgentField


📋 PRD (Product Requirements Document)

PRD: Fix Issue #252 - Claude Code Provider Does Not Capture Response Text

Validated Description

The Claude Code Provider in sdk/python/agentfield/harness/providers/claude.py fails to capture the response text because it checks for msg_type == "result", but the Claude Agent SDK sends messages with subtype == "success" instead (no type field present). This causes HarnessResult.result to always be None because the extraction condition never matches.

Current State Analysis

  • File: /workspaces/agentfield/sdk/python/agentfield/harness/providers/claude.py
  • Line 91: Current logic checks if msg_type == "result":
  • Root Cause: The actual Claude Agent SDK message structure uses subtype: "success" to indicate the final result message, not type: "result"
  • Impact: All Claude Code provider executions return result: None in the RawResult, even when the SDK successfully completes

Required Changes

The message type detection logic at line 91 must be expanded to recognize both:

  1. Legacy format: type == "result"
  2. Actual SDK format: subtype == "success"

The result extraction logic (lines 92-106) must work correctly when triggered by either condition.

Test Coverage Requirements

Existing test at /workspaces/agentfield/sdk/python/tests/test_harness_provider_claude.py uses type: "result" mock. New test cases must verify:

  • SDK-style subtype: "success" messages correctly extract result
  • Legacy type: "result" messages continue working
  • Result fields (text, session_id, cost_usd, num_turns) are extracted from both formats
  • Mixed message streams with both formats work correctly

Scope Definitions

Must Have

  • Modify line 91 condition to check: if msg_type == "result" or msg_dict.get("subtype") == "success":
  • Add test case in test_harness_provider_claude.py covering subtype: "success" message format
  • Add test case verifying result extraction from SDK-style messages with all metadata fields
  • Ensure backward compatibility with existing type: "result" format
  • All existing tests pass without modification

Nice to Have

  • Add logging when result message is detected via subtype instead of type
  • Document the two message formats in code comments

Out of Scope

  • Changes to other providers (opencode, codex, gemini)
  • Changes to the RawResult or HarnessResult data structures
  • Adding retry logic or error handling for malformed messages
  • Performance optimizations

Assumptions

  1. The Claude Agent SDK consistently sends subtype: "success" for final result messages
  2. Result text is stored in either result or text field in the message dict
  3. Session ID, cost, and turn count fields use the same field names in both formats
  4. The fix requires minimal changes - only the condition check needs modification
  5. Both formats may coexist in the same message stream

Risks

  1. Risk: Claude Agent SDK may have other message variations not captured by this fix

    • Mitigation: Accept both type: "result" and subtype: "success" to maximize compatibility
    • Acceptance: Document this assumption; if SDK changes format later, new issue required
  2. Risk: Field names for result data may differ between formats

    • Mitigation: The current extraction code already checks multiple field names (result, text, session_id, cost_usd, num_turns)
    • Verification: Test cases validate all fields are extracted correctly
  3. Risk: Messages with subtype: "success" may not contain the expected result data structure

    • Mitigation: Existing null-safe extraction logic continues to work; if fields missing, result remains None gracefully

Success Metrics

  • Claude Code provider executions return non-None result when SDK completes successfully
  • All test assertions pass including new subtype: "success" test cases
  • Zero regressions in existing test suite
🏗️ Architecture

Architecture Document: Claude Code Provider Message Detection Fix

Summary

Fix message type detection in the Claude Code Provider to recognize both the legacy type: "result" format and the SDK-native subtype: "success" format. This ensures result extraction works correctly regardless of which message format the Claude Agent SDK returns.

Context

The Claude Code Provider (sdk/python/agentfield/harness/providers/claude.py) uses the native claude_agent_sdk to communicate with Claude. The provider streams messages from the SDK and extracts result text when it encounters a message indicating completion.

Currently, the provider only checks for msg_type == "result" (line 91). However, the Claude Agent SDK sends completion messages with subtype == "success" instead, causing the result extraction logic to never execute and HarnessResult.result to remain None.

Component Structure

Component: ClaudeCodeProvider (Modified)

File: sdk/python/agentfield/harness/providers/claude.py

Responsibility:
Execute prompts via Claude Code SDK and extract result text from streaming messages. Modified to support both message format variants.

Key Interface:

class ClaudeCodeProvider:
    async def execute(self, prompt: str, options: dict[str, object]) -> RawResult:
        """Execute a prompt via Claude Code SDK.
        
        Args:
            prompt: The prompt text to send to Claude
            options: Configuration options (model, cwd, max_turns, etc.)
            
        Returns:
            RawResult containing the extracted result and metadata
        """

Dependencies:

  • agentfield.harness._result.RawResult - Return type
  • agentfield.harness._result.Metrics - Metrics data structure
  • claude_agent_sdk (lazy import) - External SDK for Claude communication

Message Detection Logic (Line 91 modification):

Current condition:

msg_type = str(msg_dict.get("type", ""))
if msg_type == "result":

New condition:

msg_type = str(msg_dict.get("type", ""))
msg_subtype = str(msg_dict.get("subtype", ""))
if msg_type == "result" or msg_subtype == "success":

Result Extraction Block (Lines 92-106):

The existing extraction logic remains unchanged but now triggers on either condition:

  • Extracts result or text field as result_text
  • Extracts session_id for session tracking
  • Extracts cost_usd or total_cost_usd for billing
  • Extracts num_turns for turn counting

Data Flow Example:

Scenario A: Legacy format (type: "result")

Input: msg_dict = {
    "type": "result",
    "result": "The answer is 42",
    "session_id": "sess-123",
    "cost_usd": 0.05,
    "num_turns": 3
}

Processing:
  msg_type = "result"
  msg_subtype = ""
  Condition: "result" == "result" OR "" == "success" => True
  
Output: RawResult(
    result="The answer is 42",
    session_id="sess-123",
    total_cost_usd=0.05,
    num_turns=3
)

Scenario B: SDK format (subtype: "success")

Input: msg_dict = {
    "subtype": "success",
    "result": "The answer is 42",
    "session_id": "sess-123",
    "cost_usd": 0.05,
    "num_turns": 3
}

Processing:
  msg_type = ""
  msg_subtype = "success"
  Condition: "" == "result" OR "success" == "success" => True
  
Output: RawResult(
    result="The answer is 42",
    session_id="sess-123",
    total_cost_usd=0.05,
    num_turns=3
)

Scenario C: Mixed messages

Messages: [
    {"type": "assistant", "content": "..."},
    {"subtype": "success", "result": "Final answer", "session_id": "s1"},
]

Result: RawResult(result="Final answer", session_id="s1", ...)

Test Coverage

Component: ClaudeProviderTests (Extended)

File: sdk/python/tests/test_harness_provider_claude.py

Responsibility:
Verify Claude Code Provider behavior including new message format support.

New Test Cases:

  1. test_execute_extracts_result_from_subtype_success

    • Mocks SDK returning subtype='success' message
    • Asserts raw.result equals expected text
    • Verifies session_id, cost_usd, num_turns extraction
  2. test_execute_handles_mixed_message_formats

    • Mocks SDK returning both type: 'result' and subtype: 'success' messages
    • Asserts extraction works for both formats
    • Validates backward compatibility

Test Pattern (SDK message mock):

def fake_query(*, prompt: str, options: Any):
    return _AsyncStream([
        {"type": "assistant", "content": [{"type": "text", "text": "Working..."}]},
        {
            "subtype": "success",  # Note: no "type" field
            "result": "Final result text",
            "session_id": "session-abc-123",
            "cost_usd": 0.15,
            "num_turns": 5,
        },
    ])

Architectural Decisions

Decision 1: OR-based condition rather than separate handlers

Decision: Use a single conditional with OR logic rather than separate if blocks for each format.

Rationale:

  • Both formats share identical extraction logic (result, session_id, cost, turns)
  • Reduces code duplication and maintenance burden
  • Future format variants can be added by extending the condition

Alternative rejected: Separate if msg_type == "result": and if msg_subtype == "success": blocks would duplicate the extraction code.

Decision 2: Preserve backward compatibility

Decision: Keep existing type: "result" detection while adding subtype: "success" support.

Rationale:

  • Existing tests and integrations depend on current behavior
  • Provider may receive messages from different SDK versions
  • No breaking changes to public API

Decision 3: No changes to message data structures

Decision: Do not modify _result.py or _schema.py.

Rationale:

  • RawResult structure already supports all needed fields
  • Issue is detection logic, not data structure
  • Minimal surface area reduces regression risk

Decision 4: Subtype checked via .get() with default

Decision: Use msg_dict.get("subtype", "") rather than direct key access.

Rationale:

  • Messages may not have subtype field (backward compatibility)
  • Consistent with existing type field access pattern
  • Prevents KeyError on malformed messages

Error Handling

Error Path 1: Neither format detected

  • If neither type == "result" nor subtype == "success" matches
  • Result extraction block skipped
  • result_text remains None
  • Provider continues streaming, may extract from "assistant" message fallback

Error Path 2: SDK throws exception

  • Wrapped in try/except block (lines 71-149)
  • Returns RawResult with is_error=True and error message
  • Metrics populated with duration_api_ms

Error Path 3: Missing extraction fields

  • Uses .get() with sensible defaults
  • result falls back to text field
  • cost_usd falls back to total_cost_usd
  • Missing fields result in None/0 values, not exceptions

Performance Considerations

Budget: < 1ms additional overhead per message

Breakdown:

  • Additional .get("subtype") call: ~0.1μs (dict lookup)
  • Additional string comparison: ~0.01μs
  • Total overhead per message: < 1μs (negligible)

Optimization: None needed - change adds minimal overhead to existing hot path.

Module Dependency Graph

sdk/python/tests/test_harness_provider_claude.py
    ↓ imports
sdk/python/agentfield/harness/providers/claude.py
    ↓ imports
sdk/python/agentfield/harness/_result.py (RawResult, Metrics)
    ↓ imports (lazy)
claude_agent_sdk (external)

File Changes Summary

  1. sdk/python/agentfield/harness/providers/claude.py (Modified)

    • Line 91: Change condition from if msg_type == "result": to if msg_type == "result" or msg_dict.get("subtype") == "success":
    • No other logic changes
  2. sdk/python/tests/test_harness_provider_claude.py (Extended)

    • Add test_execute_extracts_result_from_subtype_success (~40 lines)
    • Add test_execute_handles_mixed_message_formats (~50 lines)
    • Existing 151 lines unchanged

Verification Checklist

  • pytest -xvs sdk/python/tests/test_harness_provider_claude.py::test_execute_maps_options_and_extracts_result passes
  • pytest -xvs sdk/python/tests/test_harness_provider_claude.py::test_execute_returns_error_result_on_query_failure passes
  • pytest -xvs sdk/python/tests/test_harness_provider_claude.py::test_execute_extracts_result_from_subtype_success passes (new)
  • pytest -xvs sdk/python/tests/test_harness_provider_claude.py::test_execute_handles_mixed_message_formats passes (new)
  • grep -n 'subtype.*success\|type.*result' sdk/python/agentfield/harness/providers/claude.py shows both conditions
  • Module imports correctly: python -c "from agentfield.harness.providers.claude import ClaudeCodeProvider"

@CLAassistant
Copy link
Copy Markdown

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.


SWE-AF seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You have signed the CLA already but the status is still pending? Let us recheck it.

@santoshkumarradha
Copy link
Copy Markdown
Member Author

⚠️ Ignore testing updates to https://github.com/Agent-Field/SWE-AF/

@github-actions
Copy link
Copy Markdown
Contributor

Performance

SDK Memory Δ Latency Δ Tests Status
Python 9.3 KB +4% 0.34 µs -3%

✓ No regressions detected

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Claude Code Provider Does Not Capture Response Text — HarnessResult.result is always 'None' due to Incorrect Message Type Check

2 participants