Skip to content

feat(buzz-acp,buzz-agent): publish NIP-AM kind 44200 turn metrics from goose and buzz-agent harnesses#1446

Draft
wpfleger96 wants to merge 7 commits into
paul/nip-am-agent-turn-metricsfrom
duncan/nip-am-goose-adapter
Draft

feat(buzz-acp,buzz-agent): publish NIP-AM kind 44200 turn metrics from goose and buzz-agent harnesses#1446
wpfleger96 wants to merge 7 commits into
paul/nip-am-agent-turn-metricsfrom
duncan/nip-am-goose-adapter

Conversation

@wpfleger96

@wpfleger96 wpfleger96 commented Jul 1, 2026

Copy link
Copy Markdown
Collaborator

Harness-side implementation of NIP-AM kind:44200 agent turn metric publishing.
Stacks on #1441 (NIP doc + relay/core).

Stack: #1441 → this PR

What this does

buzz-acp — goose adapter (crates/buzz-acp)

Extends the existing GooseUsageTracker (added in earlier commits) with an emit hook in pool.rs:

  • At turn completion in run_prompt_task, drains agent.acp.take_turn_usage() across all exit paths (Ok, AgentExited, IdleTimeout, HardTimeout, general error).
  • Builds AgentTurnMetricPayload (harness: "goose", session id, turn seq, per-turn and cumulative token/cost counts, deltaReliable, stop reason, RFC 3339 timestamp).
  • Encrypts via buzz_core::agent_turn_metric::encrypt_agent_turn_metric (NIP-44 v2, agent key → owner pubkey).
  • Signs and submits a kind:44200 event tagged ["p", owner_pubkey] + ["agent", agent_pubkey] via the existing rest_client.submit_event().
  • All of this is best-effort: errors and timeouts (3 s cap) log WARN, never fail the turn.
  • No-op when agent_owner_pubkey is unconfigured or goose emitted no usage notification.

buzz-agent — native adapter (crates/buzz-agent)

  • Adds output_tokens: Option<u64> to LlmResponse; populated from output_tokens (Anthropic, Responses API) and completion_tokens (OpenAI chat/Databricks) via the existing sum_usage helper.
  • Adds per-turn turn_input_tokens / turn_output_tokens accumulators to RunCtx, reset at turn start and summed across all LLM rounds.
  • Adds MetricPublisher (new src/metric.rs) built from BUZZ_PRIVATE_KEY / BUZZ_RELAY_URL / BUZZ_AGENT_OWNER_PUBKEY env vars; silent no-op when any are absent.
  • Adds a per-session monotonically increasing turn_seq counter; incremented in acquire_session before the prompt fires.
  • At session/prompt completion, publishes a kind:44200 metric event with the accumulated per-turn counts. deltaReliable is always true (all rounds tracked in-process); cumulative is None because buzz-agent has no cross-turn session totals.
  • NIP-98 HTTP auth is implemented inline in MetricPublisher; no relay WebSocket dependency added.

Tests added

buzz-acp:

  • test_acp_stop_to_core_maps_all_variants — all 5 ACP → NIP-AM stop reason mappings.
  • test_publish_agent_turn_metric_noop_on_no_usage — returns immediately when usage is None.
  • test_publish_agent_turn_metric_noop_on_no_owner — returns immediately when owner is unconfigured.
  • test_publish_agent_turn_metric_encrypts_with_owner — encrypt/sign path executes without panic.

buzz-agent:

  • parse_anthropic_output_tokensoutput_tokens field extracted.
  • parse_anthropic_output_tokens_missing_usage_is_none — absent usage → None.
  • parse_openai_output_tokens_from_completion_tokenscompletion_tokens mapped to output_tokens.
  • parse_openai_output_tokens_missing_usage_is_none — absent usage → None.
  • parse_responses_output_tokensoutput_tokens field extracted.
  • parse_responses_output_tokens_missing_usage_is_none — absent usage → None.
  • test_metric_publisher_noop_when_env_absent — publisher is no-op when env vars absent.
  • test_metric_publisher_configured_when_all_vars_present — publisher is active when all vars set.

buzz-acp: 424 tests, buzz-agent: 130 tests. All green.

wpfleger96 and others added 2 commits July 1, 2026 16:57
…edded-PG fetch (#1443)

Signed-off-by: Will Pfleger <pfleger.will@gmail.com>
…e 2 Task B)

Advertise `clientCapabilities._meta.goose.customNotifications: true` at
initialize so goose emits `_goose/unstable/session/update` notifications
carrying session-cumulative token counts at turn completion.

Add `GooseUsageTracker` (new `goose_usage.rs`) that:
- Deserializes the `_goose/unstable/session/update` wire payload
- Stores per-session cumulative state (`sessionId`, `turnSeq`, last snapshot)
- Computes per-turn deltas per NIP-AM rules: first-turn no-prior → null +
  deltaReliable:false; counter decrease → null + false; session restart
  (new sessionId) → treated as first turn
- Exposes a `GooseTurnUsage` record via `take()` for consumption by the
  TurnCompletionGuard emit hook (sequential next task)

Wire both dispatch arms (`read_until_response` and
`read_until_response_with_idle_timeout`) to handle the new method,
mirroring the existing `session/update` pattern. Non-goose harnesses are
unaffected: no capability advertised, no dispatch, no state kept.

References #1441 (NIP-AM spec)

Co-authored-by: Will Pfleger <pfleger.will@gmail.com>
Signed-off-by: Will Pfleger <pfleger.will@gmail.com>
npub1mn7jgtj4w2pd0g0zeuhxsa6jy6p0rewxz4kujt98my82ahfmp72sxjexk7 and others added 2 commits July 1, 2026 18:50
…unreliable gap

Two Thufir-flagged IMPORTANT fixes for PR #1446.

Turn scoping (setup usage misattributed to zero-update turn):
- Add in_flight_session: Option<String> field to GooseUsageTracker.
- Add begin_turn(session_id) method: sets in_flight_session and clears
  pending. Must be called before session/prompt is sent.
- record() now only sets pending when in_flight_session matches session_id.
  It ALWAYS updates the sessions baseline so the next real turn gets a
  correct delta even from setup notifications.
- take() clears in_flight_session after draining pending.
- Call goose_usage.begin_turn(session_id) at the top of
  session_prompt_blocks_with_idle_timeout, before sending the prompt.
- Setup notifications that arrive during session/new now correctly update
  the baseline without polluting the first real turn's pending record.
- New tests: setup_notification_before_begin_turn_returns_none (verifies
  baseline still feeds next delta), record_outside_in_flight_does_not_
  clobber_pending.

Cost counter decrease -> deltaReliable:false (Fix 2):
- When both snapshots have cost and current_cost < prev_cost, the computed
  delta would be negative — NIP-AM requires delta_reliable: false and all
  turn fields nulled (same as token-decrease path).
- The match arm now returns (None, false) for cost decrease; the outer
  if/else then overrides delta_reliable=false and nulls turn_input/output.
- Cost merely absent on either side stays as-is (null cost, reliable tokens).
- turn_seq still increments on cost-decrease turns (Thufir-endorsed).
- New tests: cost_decrease_sets_delta_unreliable_and_nulls_all_turn_fields,
  cost_absent_on_one_side_leaves_tokens_reliable.

Existing goose_usage unit tests and acp.rs integration tests updated to call
begin_turn() before record(), matching the real call flow.

Co-authored-by: Will Pfleger <pfleger.will@gmail.com>
Signed-off-by: Will Pfleger <pfleger.will@gmail.com>
Pure formatting pass — no logic changes. Fixes just fmt-check failure
in CI (Rust Lint job 84654119247). Line-length wrapping in acp.rs and
goose_usage.rs (record signature, assert! calls).

Co-authored-by: Will Pfleger <pfleger.will@gmail.com>
Signed-off-by: Will Pfleger <pfleger.will@gmail.com>
@wpfleger96 wpfleger96 changed the base branch from main to paul/nip-am-agent-turn-metrics July 2, 2026 14:25
npub1mn7jgtj4w2pd0g0zeuhxsa6jy6p0rewxz4kujt98my82ahfmp72sxjexk7 and others added 2 commits July 2, 2026 10:25
…e-adapter

Bring in consolidated #1441 base (NIP-AM doc + relay/core) so the goose
adapter and emit hook can build against buzz_core::agent_turn_metric.

Co-authored-by: Will Pfleger <pfleger.will@gmail.com>
Signed-off-by: Will Pfleger <pfleger.will@gmail.com>

* origin/paul/nip-am-agent-turn-metrics:
  chore(fmt): run rustfmt on NIP-AM kind 44200 relay changes
  fix(relay/core): plug COUNT existence-leak and StopReason forward-compat for NIP-AM
  fix(relay/core): close result-level read gate for kind:44200 (NIP-AM)
  feat(core/relay): add NIP-AM kind 44200 (agent turn metrics) with relay plumbing
  docs(nips): harden NIP-AM read gate and delta ordering semantics
  docs(nips): add NIP-AM draft for durable agent turn metrics
Wire emit hook into buzz-acp pool.rs: at turn completion, drain
take_turn_usage() and publish a kind 44200 NIP-AM metric event via
publish_agent_turn_metric(). Covers all exit paths (Ok, AgentExited,
IdleTimeout, HardTimeout, general error). Best-effort — failures log
WARN and never fail the turn.

Add native buzz-agent adapter: track per-turn input/output token
accumulators in RunCtx (summed across all LLM rounds), parse
output_tokens from all provider response formats (Anthropic, OpenAI,
Responses API), build MetricPublisher from BUZZ_PRIVATE_KEY /
BUZZ_RELAY_URL / BUZZ_AGENT_OWNER_PUBKEY env vars with NIP-98 auth,
publish at session/prompt completion.

Tests: acp_stop_to_core mapping, publish no-op on missing usage/owner,
encrypt+sign path executes; output_tokens parsing for all three
providers; MetricPublisher from_env noop/configured.

Co-authored-by: Will Pfleger <pfleger.will@gmail.com>
Signed-off-by: Will Pfleger <pfleger.will@gmail.com>
@wpfleger96 wpfleger96 changed the title feat(buzz-acp): add goose usage adapter for NIP-AM turn metrics feat(buzz-acp,buzz-agent): publish NIP-AM kind 44200 turn metrics from goose and buzz-agent harnesses Jul 2, 2026
…commits

Three IMPORTANT correctness fixes and one MINOR test-isolation fix:

1. Control-cancel paths in pool.rs now drain take_turn_usage() and call
   publish_agent_turn_metric before every send_prompt_result that returns
   early from the control-signal select arm. Covers all four cancel outcome
   variants (Ok/AgentExited/Timeout/Err) and the completed-before-control
   race. Uses Cancelled for the Ok arm and Error for all error variants;
   EndTurn for the race-1 completion path.

2. MetricPublisher::publish now returns early when both input_tokens and
   output_tokens are None, preventing all-null events that violate the
   NIP-AM prohibition on publishing turns with no observed usage.

3. buzz-agent MetricPublisher now mirrors the platform relay/auth contract:
   - Owner derived from BUZZ_AUTH_TAG via buzz_sdk::nip_oa::verify_auth_tag,
     falling back to BUZZ_AGENT_OWNER_PUBKEY only when absent.
   - BUZZ_RELAY_URL ws/wss normalized to http/https before use as HTTP URL.
   - Raw BUZZ_AUTH_TAG JSON forwarded as x-auth-tag header on /events so
     attested agents pass relay membership checks.
   - buzz-sdk added to buzz-agent dependencies (lightweight, no transport deps).

4. Tests rewritten to use injected MetricConfig instead of process-env
   mutation, eliminating the parallel test race flagged as a MINOR. New
   tests cover: ws/wss URL normalization, x-auth-tag config storage,
   no-usage early-return, and the Cancelled stop-reason path in pool.rs.

Co-authored-by: Will Pfleger <pfleger.will@gmail.com>
Signed-off-by: Will Pfleger <pfleger.will@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant