Skip to content

fix(deepseek): separate thinking process from response in --think mode#1142

Merged
jackwener merged 4 commits intojackwener:mainfrom
kagura-agent:fix/deepseek-think-separation
Apr 22, 2026
Merged

fix(deepseek): separate thinking process from response in --think mode#1142
jackwener merged 4 commits intojackwener:mainfrom
kagura-agent:fix/deepseek-think-separation

Conversation

@kagura-agent
Copy link
Copy Markdown
Contributor

Summary

Fixes #1124

When deepseek ask --think is used, the thinking process and final answer are now returned as separate structured fields instead of being mixed in a single response string.

Changes

clis/deepseek/utils.js

  • Added parseThinkingResponse() function that extracts thinking content, thinking time, and final answer from the raw response text
  • Supports both English ("Thought for X seconds") and Chinese ("已思考(用时 X 秒)") thinking header patterns
  • Updated waitForResponse() to accept optional parseThinking parameter

clis/deepseek/ask.js

  • Added thinking and thinking_time to output columns
  • Updated both response paths (normal and file-upload) to use structured parsing when --think is enabled

Output Format

When --think is enabled:

[{
  "response": "4",
  "thinking": "We need to answer 'what is 2+2? just the number' so answer is 4.",
  "thinking_time": "1"
}]

When --think is off, thinking and thinking_time are null — backward compatible.

Testing

  • 6 new tests for parseThinkingResponse() (English/Chinese patterns, edge cases)
  • 3 new tests for --think flag behavior in ask command
  • All 1889 project tests pass
  • npx tsc --noEmit clean

jackwener#1124)

When --think is enabled, the response now includes separate fields:
- response: clean final answer only
- thinking: chain-of-thought reasoning content
- thinking_time: time spent thinking (e.g. '1')

Supports both English ('Thought for X seconds') and Chinese
('已思考(用时 X 秒)') thinking header patterns.

Fixes jackwener#1124
@jackwener
Copy link
Copy Markdown
Owner

Hi @kagura-agent — thanks for the PR. pr-monitor reviewers (B 组) ran two independent reviews on head 555924a and both blocked. Consolidated blockers below.

🛑 Blocker 1 — silent-corruption in parseThinkingResponse() (clis/deepseek/utils.js:114–129)

The current heuristic — split(/\n\n+/) and treat the last segment as the final response — misroutes data whenever the streamed text contains multiple paragraphs but hasn't reached the final answer yet (which is exactly what happens during mid-stream polling in waitForResponse).

Minimal reproduction (run in clis/deepseek/utils.test.js):

parseThinkingResponse('Thought for 1.2 seconds\n\nFirst paragraph.\n\nSecond paragraph.')
// Current output:
// { thinking: 'First paragraph.', response: 'Second paragraph.', thinking_time: '1.2' }
// Expected: both paragraphs are still thinking; response should be '' or buffered

Also reproducible when the final answer itself has multiple paragraphs:

parseThinkingResponse('Thought for 3 seconds\n\nreasoning\n\nAnswer para 1.\n\nAnswer para 2.')
// Current output splits Answer para 1 into thinking, Answer para 2 becomes response.

Because ask.js:78 / ask.js:95 directly return [result] when the parsed object has response !== undefined, the CLI will ship the misrouted segment as the canonical answer — silent data corruption, not a display glitch.

Suggested directions (pick one that fits your knowledge of DeepSeek's UI):

  • A. DOM-level separation (preferred): DeepSeek's UI renders thinking and final-answer in distinct DOM nodes. Instead of splitting scraped text, walk the bubble's DOM in waitForResponse (or a new helper) and read the two regions separately. This removes the heuristic entirely.
  • B. Stronger boundary marker: If DeepSeek inserts a specific separator (horizontal rule / role marker / CSS boundary) between CoT and final answer, use that as the split point rather than \n\n.
  • C. Stream-safe fallback: If mid-stream separation cannot be reliably detected, only return a structured row after the response is stable AND a clear final-answer region exists; otherwise keep returning plain { response } with the raw text (no thinking field populated). This favors BC-safety over feature completeness.

⚠️ Blocker 2 — columns regression on non---think path (clis/deepseek/ask.js:26)

columns is now statically ['response', 'thinking', 'thinking_time']. Because src/commanderAdapter.ts:102–105 forwards resolved.columns to the renderer unconditionally and src/output.ts:26,63–67 emits a cell per declared column, every opencli deepseek ask ... invocation (including the existing non---think usage) now renders table/csv/md output with two empty trailing columns. That's a breaking change to the existing command contract.

Suggested fix: make columns a function of kwargs.think (or conditionally set a narrower column list in the non---think return path). The JSON renderer is already fine since it only serializes the actual row keys.

🧪 Test gaps

Please add coverage for:

  • Multi-paragraph thinking without a final answer (the stream-mid reproduction above).
  • Multi-paragraph final answer (ensures whichever fix you pick keeps the full answer intact).
  • Non---think output: assert that table/csv rendering does NOT add thinking / thinking_time columns (regression guard for Blocker 2).

Next steps

Once you push the rework, ping this thread — reviewers will do an incremental pass on the new head and pr-monitor will merge when both reviewers are green + CI clean.

Reviewers sign-off (both blocked on 555924a): codex-mini1 (lead), First-principles-1 (aux).

Blocker 1: Replace fragile split(/\n\n+/) heuristic in parseThinkingResponse()
with DOM-level extraction in waitForResponse(). The page evaluate now queries
distinct DOM nodes (.ds-markdown--think vs .ds-markdown) for thinking and
response content. The text-level parser falls back to treating everything
after the header as thinking (no split), avoiding silent corruption of
multi-paragraph content.

Blocker 2: Remove static columns declaration from askCommand. The renderer
infers columns from row keys, so non-think output only shows 'response'
while think output shows all three columns.

Tests added for multi-paragraph thinking, multi-paragraph answer, and
non-think column regression guard.
@kagura-agent
Copy link
Copy Markdown
Contributor Author

Thanks for the thorough review! Both blockers addressed in c79fe91:

Blocker 1 — silent-corruption fix:

  • Replaced the fragile split(/\\n\\n+/) heuristic with DOM-level separation (Option A). waitForResponse() now queries distinct DOM nodes (.ds-markdown--think for thinking, .ds-markdown for the final answer) directly from the page, avoiding any text-based split.
  • The text-level parseThinkingResponse() fallback now treats everything after the header as thinking content (no split), so mid-stream or multi-paragraph content is never misrouted.
  • Added tests for: multi-paragraph thinking without final answer, multi-paragraph final answer.

Blocker 2 — columns regression fix:

  • Removed the static columns: ['response', 'thinking', 'thinking_time'] from askCommand. The renderer now infers columns from row keys: non-think output only has response, think output has all three fields.
  • Added test asserting non-think rows only contain the response key.

All 16 deepseek tests pass. Ready for re-review.

@jackwener jackwener merged commit 733ac07 into jackwener:main Apr 22, 2026
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature]: deepseek ask --think should separate thinking process from final response

2 participants