Fix Claude Code provider response capture for subtype=success messages by santoshkumarradha · Pull Request #263 · Agent-Field/agentfield

santoshkumarradha · 2026-03-13T03:33:15Z

Summary

Fixes issue Claude Code Provider Does Not Capture Response Text — HarnessResult.result is always 'None' due to Incorrect Message Type Check #252: Claude Code Provider Does Not Capture Response Text
Updates ClaudeCodeProvider.execute() to recognize both 'type': 'result' and 'subtype': 'success' message formats
Resolves HarnessResult.result always being None due to incorrect message type check

Changes

Modified (line 92)
Added support for Claude Agent SDK's message format
Maintains backward compatibility with existing 'type=result' format

Test plan

Run all provider tests: sss............................................s........................ [ 9%]
........................................................................ [ 19%]
........................................................................ [ 29%]
........................................................................ [ 39%]
........................................................................ [ 49%]
........................................................................ [ 58%]
........................................................................ [ 68%]
........................................................................ [ 78%]
........................................................................ [ 88%]
........................................................................ [ 98%]
............. [100%]
=============================== warnings summary ===============================
tests/test_harness_schema.py:23
/workspaces/agentfield/sdk/python/tests/test_harness_schema.py:23: PytestCollectionWarning: cannot collect test class 'TestSchema' because it has a init constructor (from: tests/test_harness_schema.py)
class TestSchema(BaseModel):

tests/test_agent_field_handler.py::test_register_with_agentfield_applies_discovery_payload
/usr/local/lib/python3.12/site-packages/pydantic/v1/json.py:12: RuntimeWarning: coroutine 'test_call_function_async..async_func' was never awaited
from pydantic.v1.color import Color
Enable tracemalloc to get traceback where the object was allocated.
See https://docs.pytest.org/en/stable/how-to/capture-warnings.html#resource-warnings for more info.

tests/test_agent_field_handler.py::test_register_with_agentfield_applies_discovery_payload
tests/test_agent_integration.py::test_agent_reasoner_routing_and_workflow
tests/test_agent_integration.py::test_callback_url_precedence_and_env
tests/test_agent_integration.py::test_callback_url_precedence_and_env
tests/test_agent_networking.py::test_build_callback_discovery_payload_marks_container
/workspaces/agentfield/sdk/python/agentfield/agent.py:1503: DeprecationWarning: datetime.datetime.utcnow() is deprecated and scheduled for removal in a future version. Use timezone-aware objects to represent datetimes in UTC: datetime.datetime.now(datetime.UTC).
"submitted_at": datetime.utcnow().isoformat() + "Z",

tests/test_client.py: 4 warnings
tests/test_client_auth.py: 5 warnings
tests/test_client_execution_paths.py: 12 warnings
/workspaces/agentfield/sdk/python/agentfield/client.py:1020: DeprecationWarning: datetime.datetime.utcnow() is deprecated and scheduled for removal in a future version. Use timezone-aware objects to represent datetimes in UTC: datetime.datetime.now(datetime.UTC).
metadata["timestamp"] = datetime.datetime.utcnow().isoformat()

tests/test_client_unit.py::test_generate_id_prefix_and_uniqueness
tests/test_client_unit.py::test_generate_id_prefix_and_uniqueness
/workspaces/agentfield/sdk/python/agentfield/client.py:169: DeprecationWarning: datetime.datetime.utcnow() is deprecated and scheduled for removal in a future version. Use timezone-aware objects to represent datetimes in UTC: datetime.datetime.now(datetime.UTC).
timestamp = datetime.datetime.utcnow().strftime("%Y%m%d_%H%M%S")

tests/test_did_manager.py::test_create_execution_context
/workspaces/agentfield/sdk/python/agentfield/did_manager.py:195: DeprecationWarning: datetime.datetime.utcnow() is deprecated and scheduled for removal in a future version. Use timezone-aware objects to represent datetimes in UTC: datetime.datetime.now(datetime.UTC).
timestamp=datetime.utcnow(),

tests/test_http_connection_manager.py::test_connection_manager_start_close
tests/test_http_connection_manager.py::test_connection_manager_context_manager
tests/test_http_connection_manager.py::test_connection_manager_double_start
tests/test_http_connection_manager.py::test_connection_manager_start_after_close
tests/test_http_connection_manager.py::test_connection_manager_get_session
tests/test_http_connection_manager.py::test_connection_manager_request_timeout
tests/test_http_connection_manager.py::test_connection_manager_batch_request
tests/test_http_connection_manager.py::test_connection_manager_health_check
tests/test_http_connection_manager.py::test_connection_manager_properties
/usr/local/lib/python3.12/site-packages/aiohttp/connector.py:993: DeprecationWarning: enable_cleanup_closed ignored because python/cpython#118960 is fixed in Python version sys.version_info(major=3, minor=12, micro=13, releaselevel='final', serial=0)
super().init(

tests/test_vc_generator.py::test_generate_execution_vc_success
/workspaces/agentfield/sdk/python/tests/test_vc_generator.py:37: DeprecationWarning: datetime.datetime.utcnow() is deprecated and scheduled for removal in a future version. Use timezone-aware objects to represent datetimes in UTC: datetime.datetime.now(datetime.UTC).
"created_at": datetime.utcnow().isoformat() + "Z",

tests/test_vc_generator.py::test_generate_execution_vc_success
tests/test_vc_generator.py::test_generate_execution_vc_disabled
/workspaces/agentfield/sdk/python/tests/test_vc_generator.py:16: DeprecationWarning: datetime.datetime.utcnow() is deprecated and scheduled for removal in a future version. Use timezone-aware objects to represent datetimes in UTC: datetime.datetime.now(datetime.UTC).
timestamp=datetime.utcnow(),

tests/test_vc_generator.py::test_create_workflow_vc
/workspaces/agentfield/sdk/python/tests/test_vc_generator.py:82: DeprecationWarning: datetime.datetime.utcnow() is deprecated and scheduled for removal in a future version. Use timezone-aware objects to represent datetimes in UTC: datetime.datetime.now(datetime.UTC).
"start_time": datetime.utcnow().isoformat() + "Z",

tests/test_vc_generator.py::test_create_workflow_vc
/workspaces/agentfield/sdk/python/tests/test_vc_generator.py:83: DeprecationWarning: datetime.datetime.utcnow() is deprecated and scheduled for removal in a future version. Use timezone-aware objects to represent datetimes in UTC: datetime.datetime.now(datetime.UTC).
"end_time": datetime.utcnow().isoformat() + "Z",

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html

---------- coverage: platform linux, python 3.12.13-final-0 ----------
Name Stmts Miss Cover Missing

agentfield/agent_field_handler.py 163 47 71% 142-148, 161, 193-221, 229, 234, 282-284, 333, 529-554
agentfield/client.py 788 263 67% 72, 76, 83-86, 98-99, 200, 202, 226, 239, 244, 249, 262-263, 285-286, 291-293, 334, 349, 352, 354, 361, 363, 365, 367, 421, 441, 445-453, 462, 464, 498, 506-509, 518, 522, 542, 548, 556-567, 585, 637, 677, 692-693, 696, 700-702, 777, 781, 785, 789, 811-812, 826-827, 854-855, 865-873, 906, 930-932, 949, 973-984, 993, 1016, 1018, 1028-1029, 1045-1048, 1099-1100, 1150-1151, 1194, 1235, 1251-1252, 1255-1268, 1273-1282, 1293-1303, 1333-1382, 1404-1423, 1449-1484, 1508-1528, 1551-1570, 1593-1614, 1631-1647, 1664-1679, 1723, 1734-1735, 1775-1776, 1847-1854
agentfield/execution_context.py 119 1 99% 79
agentfield/execution_state.py 213 24 89% 113, 117, 230, 262, 279-280, 348-349, 353, 450, 454-457, 461-464, 468-470, 474, 478, 482
agentfield/memory.py 254 35 86% 116-119, 127, 272, 284, 338-339, 342, 413, 424, 462, 491, 504, 608, 690-720, 844, 856, 870, 882, 894
agentfield/result_cache.py 207 36 83% 41, 47, 73, 143-144, 162, 173-174, 194-195, 229, 238, 265-269, 273-276, 310-311, 325-331, 404-405, 420-421, 429-430, 434-435

TOTAL 1849 406 78%

1 file skipped due to complete coverage.

=========================== short test summary info ============================
SKIPPED [1] tests/integration/test_agentfield_end_to_end.py:41: AgentField server sources not available in this checkout
SKIPPED [1] tests/integration/test_agentfield_end_to_end.py:74: AgentField server sources not available in this checkout
SKIPPED [1] tests/integration/test_agentfield_end_to_end.py:114: AgentField server sources not available in this checkout
SKIPPED [1] tests/test_agent_cli.py:286: Complex argparse mocking - functionality tested in integration
729 passed, 4 skipped, 12 deselected, 45 warnings in 11.40s
2. Verify 6 tests pass including 2 new test cases:

subtype='success' extraction
Mixed message format handling

Manual verification:
Confirm grep shows both conditions in the detection logic

Related Issues

Closes #252

🤖 Built with AgentField SWE-AF
🔌 Powered by AgentField

📋 PRD (Product Requirements Document)

PRD: Fix Issue #252 - Claude Code Provider Does Not Capture Response Text

Validated Description

The Claude Code Provider in sdk/python/agentfield/harness/providers/claude.py fails to capture the response text because it checks for msg_type == "result", but the Claude Agent SDK sends messages with subtype == "success" instead (no type field present). This causes HarnessResult.result to always be None because the extraction condition never matches.

Current State Analysis

File: /workspaces/agentfield/sdk/python/agentfield/harness/providers/claude.py
Line 91: Current logic checks if msg_type == "result":
Root Cause: The actual Claude Agent SDK message structure uses subtype: "success" to indicate the final result message, not type: "result"
Impact: All Claude Code provider executions return result: None in the RawResult, even when the SDK successfully completes

Required Changes

The message type detection logic at line 91 must be expanded to recognize both:

Legacy format: type == "result"
Actual SDK format: subtype == "success"

The result extraction logic (lines 92-106) must work correctly when triggered by either condition.

Test Coverage Requirements

Existing test at /workspaces/agentfield/sdk/python/tests/test_harness_provider_claude.py uses type: "result" mock. New test cases must verify:

SDK-style subtype: "success" messages correctly extract result
Legacy type: "result" messages continue working
Result fields (text, session_id, cost_usd, num_turns) are extracted from both formats
Mixed message streams with both formats work correctly

Scope Definitions

Must Have

Modify line 91 condition to check: if msg_type == "result" or msg_dict.get("subtype") == "success":
Add test case in test_harness_provider_claude.py covering subtype: "success" message format
Add test case verifying result extraction from SDK-style messages with all metadata fields
Ensure backward compatibility with existing type: "result" format
All existing tests pass without modification

Nice to Have

Add logging when result message is detected via subtype instead of type
Document the two message formats in code comments

Out of Scope

Changes to other providers (opencode, codex, gemini)
Changes to the RawResult or HarnessResult data structures
Adding retry logic or error handling for malformed messages
Performance optimizations

Assumptions

The Claude Agent SDK consistently sends subtype: "success" for final result messages
Result text is stored in either result or text field in the message dict
Session ID, cost, and turn count fields use the same field names in both formats
The fix requires minimal changes - only the condition check needs modification
Both formats may coexist in the same message stream

Risks

Risk: Claude Agent SDK may have other message variations not captured by this fix
- Mitigation: Accept both type: "result" and subtype: "success" to maximize compatibility
- Acceptance: Document this assumption; if SDK changes format later, new issue required
Risk: Field names for result data may differ between formats
- Mitigation: The current extraction code already checks multiple field names (result, text, session_id, cost_usd, num_turns)
- Verification: Test cases validate all fields are extracted correctly
Risk: Messages with subtype: "success" may not contain the expected result data structure
- Mitigation: Existing null-safe extraction logic continues to work; if fields missing, result remains None gracefully

Success Metrics

Claude Code provider executions return non-None result when SDK completes successfully
All test assertions pass including new subtype: "success" test cases
Zero regressions in existing test suite

🏗️ Architecture

Architecture Document: Claude Code Provider Message Detection Fix

Summary

Fix message type detection in the Claude Code Provider to recognize both the legacy type: "result" format and the SDK-native subtype: "success" format. This ensures result extraction works correctly regardless of which message format the Claude Agent SDK returns.

Context

The Claude Code Provider (sdk/python/agentfield/harness/providers/claude.py) uses the native claude_agent_sdk to communicate with Claude. The provider streams messages from the SDK and extracts result text when it encounters a message indicating completion.

Currently, the provider only checks for msg_type == "result" (line 91). However, the Claude Agent SDK sends completion messages with subtype == "success" instead, causing the result extraction logic to never execute and HarnessResult.result to remain None.

Component Structure

Component: ClaudeCodeProvider (Modified)

File: sdk/python/agentfield/harness/providers/claude.py

Responsibility:
Execute prompts via Claude Code SDK and extract result text from streaming messages. Modified to support both message format variants.

Key Interface:

class ClaudeCodeProvider:
    async def execute(self, prompt: str, options: dict[str, object]) -> RawResult:
        """Execute a prompt via Claude Code SDK.
        
        Args:
            prompt: The prompt text to send to Claude
            options: Configuration options (model, cwd, max_turns, etc.)
            
        Returns:
            RawResult containing the extracted result and metadata
        """

Dependencies:

agentfield.harness._result.RawResult - Return type
agentfield.harness._result.Metrics - Metrics data structure
claude_agent_sdk (lazy import) - External SDK for Claude communication

Message Detection Logic (Line 91 modification):

Current condition:

msg_type = str(msg_dict.get("type", ""))
if msg_type == "result":

New condition:

msg_type = str(msg_dict.get("type", ""))
msg_subtype = str(msg_dict.get("subtype", ""))
if msg_type == "result" or msg_subtype == "success":

Result Extraction Block (Lines 92-106):

The existing extraction logic remains unchanged but now triggers on either condition:

Extracts result or text field as result_text
Extracts session_id for session tracking
Extracts cost_usd or total_cost_usd for billing
Extracts num_turns for turn counting

Data Flow Example:

Scenario A: Legacy format (type: "result")

Input: msg_dict = {
    "type": "result",
    "result": "The answer is 42",
    "session_id": "sess-123",
    "cost_usd": 0.05,
    "num_turns": 3
}

Processing:
  msg_type = "result"
  msg_subtype = ""
  Condition: "result" == "result" OR "" == "success" => True
  
Output: RawResult(
    result="The answer is 42",
    session_id="sess-123",
    total_cost_usd=0.05,
    num_turns=3
)

Scenario B: SDK format (subtype: "success")

Input: msg_dict = {
    "subtype": "success",
    "result": "The answer is 42",
    "session_id": "sess-123",
    "cost_usd": 0.05,
    "num_turns": 3
}

Processing:
  msg_type = ""
  msg_subtype = "success"
  Condition: "" == "result" OR "success" == "success" => True
  
Output: RawResult(
    result="The answer is 42",
    session_id="sess-123",
    total_cost_usd=0.05,
    num_turns=3
)

Scenario C: Mixed messages

Messages: [
    {"type": "assistant", "content": "..."},
    {"subtype": "success", "result": "Final answer", "session_id": "s1"},
]

Result: RawResult(result="Final answer", session_id="s1", ...)

Test Coverage

Component: ClaudeProviderTests (Extended)

File: sdk/python/tests/test_harness_provider_claude.py

Responsibility:
Verify Claude Code Provider behavior including new message format support.

New Test Cases:

test_execute_extracts_result_from_subtype_success
- Mocks SDK returning subtype='success' message
- Asserts raw.result equals expected text
- Verifies session_id, cost_usd, num_turns extraction
test_execute_handles_mixed_message_formats
- Mocks SDK returning both type: 'result' and subtype: 'success' messages
- Asserts extraction works for both formats
- Validates backward compatibility

Test Pattern (SDK message mock):

def fake_query(*, prompt: str, options: Any):
    return _AsyncStream([
        {"type": "assistant", "content": [{"type": "text", "text": "Working..."}]},
        {
            "subtype": "success",  # Note: no "type" field
            "result": "Final result text",
            "session_id": "session-abc-123",
            "cost_usd": 0.15,
            "num_turns": 5,
        },
    ])

Architectural Decisions

Decision 1: OR-based condition rather than separate handlers

Decision: Use a single conditional with OR logic rather than separate if blocks for each format.

Rationale:

Both formats share identical extraction logic (result, session_id, cost, turns)
Reduces code duplication and maintenance burden
Future format variants can be added by extending the condition

Alternative rejected: Separate if msg_type == "result": and if msg_subtype == "success": blocks would duplicate the extraction code.

Decision 2: Preserve backward compatibility

Decision: Keep existing type: "result" detection while adding subtype: "success" support.

Rationale:

Existing tests and integrations depend on current behavior
Provider may receive messages from different SDK versions
No breaking changes to public API

Decision 3: No changes to message data structures

Decision: Do not modify _result.py or _schema.py.

Rationale:

RawResult structure already supports all needed fields
Issue is detection logic, not data structure
Minimal surface area reduces regression risk

Decision 4: Subtype checked via .get() with default

Decision: Use msg_dict.get("subtype", "") rather than direct key access.

Rationale:

Messages may not have subtype field (backward compatibility)
Consistent with existing type field access pattern
Prevents KeyError on malformed messages

Error Handling

Error Path 1: Neither format detected

If neither type == "result" nor subtype == "success" matches
Result extraction block skipped
result_text remains None
Provider continues streaming, may extract from "assistant" message fallback

Error Path 2: SDK throws exception

Wrapped in try/except block (lines 71-149)
Returns RawResult with is_error=True and error message
Metrics populated with duration_api_ms

Error Path 3: Missing extraction fields

Uses .get() with sensible defaults
result falls back to text field
cost_usd falls back to total_cost_usd
Missing fields result in None/0 values, not exceptions

Performance Considerations

Budget: < 1ms additional overhead per message

Breakdown:

Additional .get("subtype") call: ~0.1μs (dict lookup)
Additional string comparison: ~0.01μs
Total overhead per message: < 1μs (negligible)

Optimization: None needed - change adds minimal overhead to existing hot path.

Module Dependency Graph

sdk/python/tests/test_harness_provider_claude.py
    ↓ imports
sdk/python/agentfield/harness/providers/claude.py
    ↓ imports
sdk/python/agentfield/harness/_result.py (RawResult, Metrics)
    ↓ imports (lazy)
claude_agent_sdk (external)

File Changes Summary

sdk/python/agentfield/harness/providers/claude.py (Modified)
- Line 91: Change condition from if msg_type == "result": to if msg_type == "result" or msg_dict.get("subtype") == "success":
- No other logic changes
sdk/python/tests/test_harness_provider_claude.py (Extended)
- Add test_execute_extracts_result_from_subtype_success (~40 lines)
- Add test_execute_handles_mixed_message_formats (~50 lines)
- Existing 151 lines unchanged

Verification Checklist

pytest -xvs sdk/python/tests/test_harness_provider_claude.py::test_execute_maps_options_and_extracts_result passes
pytest -xvs sdk/python/tests/test_harness_provider_claude.py::test_execute_returns_error_result_on_query_failure passes
pytest -xvs sdk/python/tests/test_harness_provider_claude.py::test_execute_extracts_result_from_subtype_success passes (new)
pytest -xvs sdk/python/tests/test_harness_provider_claude.py::test_execute_handles_mixed_message_formats passes (new)
grep -n 'subtype.*success\|type.*result' sdk/python/agentfield/harness/providers/claude.py shows both conditions
Module imports correctly: python -c "from agentfield.harness.providers.claude import ClaudeCodeProvider"

…message format in ClaudeCodeProvider

…subtype-fix

CLAassistant · 2026-03-13T03:33:26Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.

SWE-AF seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

santoshkumarradha · 2026-03-13T03:34:22Z

⚠️ Ignore testing updates to https://github.com/Agent-Field/SWE-AF/

github-actions · 2026-03-13T03:34:36Z

Performance

SDK	Memory	Δ	Latency	Δ	Tests	Status
Python	9.3 KB	+4%	0.34 µs	-3%	✓	✓

✓ No regressions detected

SWE-AF added 3 commits March 13, 2026 03:25

issue/claude-provider-subtype-fix: Add support for subtype='success' …

93a9228

…message format in ClaudeCodeProvider

Merge issue/281ab230-01-claude-provider-subtype-fix: claude-provider-…

b7ba8e5

…subtype-fix

chore: finalize repo for handoff

9acda5a

santoshkumarradha closed this Mar 13, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix Claude Code provider response capture for subtype=success messages#263

Fix Claude Code provider response capture for subtype=success messages#263
santoshkumarradha wants to merge 3 commits into
mainfrom
feature/281ab230-claude-harness-response-fix

santoshkumarradha commented Mar 13, 2026 •

edited

Loading

Uh oh!

CLAassistant commented Mar 13, 2026

Uh oh!

santoshkumarradha commented Mar 13, 2026

Uh oh!

github-actions Bot commented Mar 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

santoshkumarradha commented Mar 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Test plan

---------- coverage: platform linux, python 3.12.13-final-0 ---------- Name Stmts Miss Cover Missing

Related Issues

PRD: Fix Issue #252 - Claude Code Provider Does Not Capture Response Text

Validated Description

Current State Analysis

Required Changes

Test Coverage Requirements

Scope Definitions

Must Have

Nice to Have

Out of Scope

Assumptions

Risks

Success Metrics

Architecture Document: Claude Code Provider Message Detection Fix

Summary

Context

Component Structure

Component: ClaudeCodeProvider (Modified)

Test Coverage

Component: ClaudeProviderTests (Extended)

Architectural Decisions

Decision 1: OR-based condition rather than separate handlers

Decision 2: Preserve backward compatibility

Decision 3: No changes to message data structures

Decision 4: Subtype checked via .get() with default

Error Handling

Performance Considerations

Module Dependency Graph

File Changes Summary

Verification Checklist

Uh oh!

CLAassistant commented Mar 13, 2026

Uh oh!

santoshkumarradha commented Mar 13, 2026

Uh oh!

github-actions Bot commented Mar 13, 2026

Performance

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

santoshkumarradha commented Mar 13, 2026 •

edited

Loading

---------- coverage: platform linux, python 3.12.13-final-0 ----------
Name Stmts Miss Cover Missing