Overview
The Repository Quality Improvement Agent (repository-quality-improver.md) consumed 1,784,366 tokens in 30 turns on 2026-04-30, despite performing zero write actions. The run was flagged resource_heavy_for_domain (high severity) and partially_reducible (50% data-gathering turns). With 27 of 30 turns triggering blocked requests, the agent spent most of its budget on exploratory reads that could be replaced with deterministic pre-agent steps.
Analysis period: 2026-04-24 to 2026-04-30 · Runs audited: 1 · Run: §25167483584
Token Profile
| Metric |
Value |
| Total tokens |
1,784,366 |
| Effective tokens |
1,987,258 |
| Input tokens |
1,773,062 |
| Output tokens |
11,304 |
| Cache read tokens |
1,689,800 |
| Cache efficiency |
48.8% |
| Turns |
30 |
| Avg tokens/turn |
~59,100 |
| Write actions |
0 |
| Blocked requests |
27 / 30 turns |
| Duration |
9.1 min |
| Conclusion |
success |
Ranked Recommendations
1. 🔴 Move data-gathering phases to deterministic pre-agent bash steps (~450K tokens saved)
Estimated savings: ~450,000 tokens/run
Evidence: agentic_fraction=0.50, assessment partially_reducible — "About 50% of this run's turns appear to be data-gathering". With 30 turns × 59K tokens/turn, ~15 turns are pure shell reads (cache check, metrics collection, directory stats) that produce no LLM-only value.
Action: Extract Phase 0 (cache history check) and Phase 1 analysis commands into frontmatter steps: (pre-agent). Write collected metrics to /tmp/gh-aw/agent/analysis-context.md. The agent then reads the pre-computed file in turn 1 instead of running shell commands itself over multiple turns.
steps:
- name: Collect quality metrics
run: |
mkdir -p /tmp/gh-aw/agent
{
echo "## Cache History"
cat /tmp/gh-aw/cache-memory-focus-areas/history.json 2>/dev/null || echo "{}"
echo "## Code Metrics"
find . -type f -name "*.go" ! -name "*_test.go" ! -path "./.git/*" | xargs wc -l 2>/dev/null | sort -rn | head -20
echo "## Test Ratio"
...
} > /tmp/gh-aw/agent/analysis-context.md
2. 🔴 Trim the oversized prompt — extract report template to a shared import (~300K tokens saved)
Estimated savings: ~300,000 tokens/run
Evidence: The workflow prompt is ~850 lines. It embeds a full Markdown report template (~150 lines), multiple bash code blocks for every analysis category, and an exhaustive task-generation template. This entire prompt is echoed every turn, at ~59K tokens/turn × 30 turns.
Action: Extract the report template into shared/repository-quality-report-template.md and import it. Collapse per-category bash examples into a single compact reference table instead of full shell snippets. Target: reduce prompt to ≤300 lines (~65% reduction in repeated context).
imports:
- shared/repository-quality-report-template.md
3. 🟡 Cap turns with max-turns to enforce a ceiling (~100K tokens saved)
Estimated savings: ~100,000 tokens/run (guards against regressions)
Evidence: 30 turns for a read-only reporting task is high. After applying recommendations 1–2, the expected turn count drops to ~12–15. Adding an explicit ceiling prevents future prompt changes from silently ballooning costs.
Action: Add max-turns: 18 to the frontmatter (generous enough not to truncate a normal run, tight enough to catch regressions).
4. 🟡 Remove unused Serena MCP from cold-path invocations (~50K tokens saved)
Estimated savings: ~50,000 tokens/run
Evidence: tool_breadth: narrow, tool_types: 0. The workflow imports shared/mcp/serena-go.md and conditionally uses Serena for "deeper analysis". The single audited run performed no write actions and tool breadth was narrow, suggesting Serena was never invoked. Its ambient toolset description still inflates every turn's context.
Action: Gate the Serena import behind a workflow_dispatch input flag (serena: false by default). Scheduled runs skip Serena unless opted in.
Summary of Expected Savings
| Recommendation |
Tokens saved/run |
Confidence |
| Pre-agent data-gathering steps |
~450,000 |
High |
| Prompt trimming / template extraction |
~300,000 |
High |
max-turns ceiling |
~100,000 |
Medium (regression guard) |
| Conditional Serena import |
~50,000 |
Medium |
| Total |
~900,000 |
|
Caveats
- Only 1 run was available for analysis; savings estimates are proportional projections based on token-per-turn averages.
- The 48.8% cache efficiency indicates the large prompt is partially cached; prompt reduction will also improve cache hit ratio for remaining turns.
- Serena recommendation is based on absence of tool use in the single audited run — verify over 3+ runs before removing.
Run detail
| Field |
Value |
| Run ID |
25167483584 |
| Date |
2026-04-30 |
| Engine |
GitHub Copilot CLI (claude-sonnet-4.6) |
| Event |
schedule |
| Execution style |
exploratory |
| Tool breadth |
narrow |
| Actuation style |
read_only |
| Resource profile |
heavy |
| Agentic fraction |
0.50 |
| Blocked requests |
27 |
References:
Generated by Copilot Token Usage Optimizer · ● 1.1M · ◷
Overview
The Repository Quality Improvement Agent (
repository-quality-improver.md) consumed 1,784,366 tokens in 30 turns on 2026-04-30, despite performing zero write actions. The run was flaggedresource_heavy_for_domain(high severity) andpartially_reducible(50% data-gathering turns). With 27 of 30 turns triggering blocked requests, the agent spent most of its budget on exploratory reads that could be replaced with deterministic pre-agent steps.Analysis period: 2026-04-24 to 2026-04-30 · Runs audited: 1 · Run: §25167483584
Token Profile
Ranked Recommendations
1. 🔴 Move data-gathering phases to deterministic pre-agent bash steps (~450K tokens saved)
Estimated savings: ~450,000 tokens/run
Evidence:
agentic_fraction=0.50, assessmentpartially_reducible— "About 50% of this run's turns appear to be data-gathering". With 30 turns × 59K tokens/turn, ~15 turns are pure shell reads (cache check, metrics collection, directory stats) that produce no LLM-only value.Action: Extract Phase 0 (cache history check) and Phase 1 analysis commands into frontmatter
steps:(pre-agent). Write collected metrics to/tmp/gh-aw/agent/analysis-context.md. The agent then reads the pre-computed file in turn 1 instead of running shell commands itself over multiple turns.2. 🔴 Trim the oversized prompt — extract report template to a shared import (~300K tokens saved)
Estimated savings: ~300,000 tokens/run
Evidence: The workflow prompt is ~850 lines. It embeds a full Markdown report template (~150 lines), multiple bash code blocks for every analysis category, and an exhaustive task-generation template. This entire prompt is echoed every turn, at ~59K tokens/turn × 30 turns.
Action: Extract the report template into
shared/repository-quality-report-template.mdand import it. Collapse per-category bash examples into a single compact reference table instead of full shell snippets. Target: reduce prompt to ≤300 lines (~65% reduction in repeated context).3. 🟡 Cap turns with
max-turnsto enforce a ceiling (~100K tokens saved)Estimated savings: ~100,000 tokens/run (guards against regressions)
Evidence: 30 turns for a read-only reporting task is high. After applying recommendations 1–2, the expected turn count drops to ~12–15. Adding an explicit ceiling prevents future prompt changes from silently ballooning costs.
Action: Add
max-turns: 18to the frontmatter (generous enough not to truncate a normal run, tight enough to catch regressions).4. 🟡 Remove unused Serena MCP from cold-path invocations (~50K tokens saved)
Estimated savings: ~50,000 tokens/run
Evidence:
tool_breadth: narrow,tool_types: 0. The workflow importsshared/mcp/serena-go.mdand conditionally uses Serena for "deeper analysis". The single audited run performed no write actions and tool breadth was narrow, suggesting Serena was never invoked. Its ambient toolset description still inflates every turn's context.Action: Gate the Serena import behind a
workflow_dispatchinput flag (serena: falseby default). Scheduled runs skip Serena unless opted in.Summary of Expected Savings
max-turnsceilingCaveats
Run detail
References: