feat(buzz-acp,buzz-agent): publish NIP-AM kind 44200 turn metrics from goose and buzz-agent harnesses#1446
Draft
wpfleger96 wants to merge 7 commits into
Draft
Conversation
…edded-PG fetch (#1443) Signed-off-by: Will Pfleger <pfleger.will@gmail.com>
…e 2 Task B) Advertise `clientCapabilities._meta.goose.customNotifications: true` at initialize so goose emits `_goose/unstable/session/update` notifications carrying session-cumulative token counts at turn completion. Add `GooseUsageTracker` (new `goose_usage.rs`) that: - Deserializes the `_goose/unstable/session/update` wire payload - Stores per-session cumulative state (`sessionId`, `turnSeq`, last snapshot) - Computes per-turn deltas per NIP-AM rules: first-turn no-prior → null + deltaReliable:false; counter decrease → null + false; session restart (new sessionId) → treated as first turn - Exposes a `GooseTurnUsage` record via `take()` for consumption by the TurnCompletionGuard emit hook (sequential next task) Wire both dispatch arms (`read_until_response` and `read_until_response_with_idle_timeout`) to handle the new method, mirroring the existing `session/update` pattern. Non-goose harnesses are unaffected: no capability advertised, no dispatch, no state kept. References #1441 (NIP-AM spec) Co-authored-by: Will Pfleger <pfleger.will@gmail.com> Signed-off-by: Will Pfleger <pfleger.will@gmail.com>
…unreliable gap Two Thufir-flagged IMPORTANT fixes for PR #1446. Turn scoping (setup usage misattributed to zero-update turn): - Add in_flight_session: Option<String> field to GooseUsageTracker. - Add begin_turn(session_id) method: sets in_flight_session and clears pending. Must be called before session/prompt is sent. - record() now only sets pending when in_flight_session matches session_id. It ALWAYS updates the sessions baseline so the next real turn gets a correct delta even from setup notifications. - take() clears in_flight_session after draining pending. - Call goose_usage.begin_turn(session_id) at the top of session_prompt_blocks_with_idle_timeout, before sending the prompt. - Setup notifications that arrive during session/new now correctly update the baseline without polluting the first real turn's pending record. - New tests: setup_notification_before_begin_turn_returns_none (verifies baseline still feeds next delta), record_outside_in_flight_does_not_ clobber_pending. Cost counter decrease -> deltaReliable:false (Fix 2): - When both snapshots have cost and current_cost < prev_cost, the computed delta would be negative — NIP-AM requires delta_reliable: false and all turn fields nulled (same as token-decrease path). - The match arm now returns (None, false) for cost decrease; the outer if/else then overrides delta_reliable=false and nulls turn_input/output. - Cost merely absent on either side stays as-is (null cost, reliable tokens). - turn_seq still increments on cost-decrease turns (Thufir-endorsed). - New tests: cost_decrease_sets_delta_unreliable_and_nulls_all_turn_fields, cost_absent_on_one_side_leaves_tokens_reliable. Existing goose_usage unit tests and acp.rs integration tests updated to call begin_turn() before record(), matching the real call flow. Co-authored-by: Will Pfleger <pfleger.will@gmail.com> Signed-off-by: Will Pfleger <pfleger.will@gmail.com>
Pure formatting pass — no logic changes. Fixes just fmt-check failure in CI (Rust Lint job 84654119247). Line-length wrapping in acp.rs and goose_usage.rs (record signature, assert! calls). Co-authored-by: Will Pfleger <pfleger.will@gmail.com> Signed-off-by: Will Pfleger <pfleger.will@gmail.com>
…e-adapter Bring in consolidated #1441 base (NIP-AM doc + relay/core) so the goose adapter and emit hook can build against buzz_core::agent_turn_metric. Co-authored-by: Will Pfleger <pfleger.will@gmail.com> Signed-off-by: Will Pfleger <pfleger.will@gmail.com> * origin/paul/nip-am-agent-turn-metrics: chore(fmt): run rustfmt on NIP-AM kind 44200 relay changes fix(relay/core): plug COUNT existence-leak and StopReason forward-compat for NIP-AM fix(relay/core): close result-level read gate for kind:44200 (NIP-AM) feat(core/relay): add NIP-AM kind 44200 (agent turn metrics) with relay plumbing docs(nips): harden NIP-AM read gate and delta ordering semantics docs(nips): add NIP-AM draft for durable agent turn metrics
Wire emit hook into buzz-acp pool.rs: at turn completion, drain take_turn_usage() and publish a kind 44200 NIP-AM metric event via publish_agent_turn_metric(). Covers all exit paths (Ok, AgentExited, IdleTimeout, HardTimeout, general error). Best-effort — failures log WARN and never fail the turn. Add native buzz-agent adapter: track per-turn input/output token accumulators in RunCtx (summed across all LLM rounds), parse output_tokens from all provider response formats (Anthropic, OpenAI, Responses API), build MetricPublisher from BUZZ_PRIVATE_KEY / BUZZ_RELAY_URL / BUZZ_AGENT_OWNER_PUBKEY env vars with NIP-98 auth, publish at session/prompt completion. Tests: acp_stop_to_core mapping, publish no-op on missing usage/owner, encrypt+sign path executes; output_tokens parsing for all three providers; MetricPublisher from_env noop/configured. Co-authored-by: Will Pfleger <pfleger.will@gmail.com> Signed-off-by: Will Pfleger <pfleger.will@gmail.com>
…commits
Three IMPORTANT correctness fixes and one MINOR test-isolation fix:
1. Control-cancel paths in pool.rs now drain take_turn_usage() and call
publish_agent_turn_metric before every send_prompt_result that returns
early from the control-signal select arm. Covers all four cancel outcome
variants (Ok/AgentExited/Timeout/Err) and the completed-before-control
race. Uses Cancelled for the Ok arm and Error for all error variants;
EndTurn for the race-1 completion path.
2. MetricPublisher::publish now returns early when both input_tokens and
output_tokens are None, preventing all-null events that violate the
NIP-AM prohibition on publishing turns with no observed usage.
3. buzz-agent MetricPublisher now mirrors the platform relay/auth contract:
- Owner derived from BUZZ_AUTH_TAG via buzz_sdk::nip_oa::verify_auth_tag,
falling back to BUZZ_AGENT_OWNER_PUBKEY only when absent.
- BUZZ_RELAY_URL ws/wss normalized to http/https before use as HTTP URL.
- Raw BUZZ_AUTH_TAG JSON forwarded as x-auth-tag header on /events so
attested agents pass relay membership checks.
- buzz-sdk added to buzz-agent dependencies (lightweight, no transport deps).
4. Tests rewritten to use injected MetricConfig instead of process-env
mutation, eliminating the parallel test race flagged as a MINOR. New
tests cover: ws/wss URL normalization, x-auth-tag config storage,
no-usage early-return, and the Cancelled stop-reason path in pool.rs.
Co-authored-by: Will Pfleger <pfleger.will@gmail.com>
Signed-off-by: Will Pfleger <pfleger.will@gmail.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Harness-side implementation of NIP-AM kind:44200 agent turn metric publishing.
Stacks on #1441 (NIP doc + relay/core).
Stack: #1441 → this PR
What this does
buzz-acp — goose adapter (
crates/buzz-acp)Extends the existing
GooseUsageTracker(added in earlier commits) with an emit hook inpool.rs:run_prompt_task, drainsagent.acp.take_turn_usage()across all exit paths (Ok, AgentExited, IdleTimeout, HardTimeout, general error).AgentTurnMetricPayload(harness:"goose", session id, turn seq, per-turn and cumulative token/cost counts,deltaReliable, stop reason, RFC 3339 timestamp).buzz_core::agent_turn_metric::encrypt_agent_turn_metric(NIP-44 v2, agent key → owner pubkey).kind:44200event tagged["p", owner_pubkey]+["agent", agent_pubkey]via the existingrest_client.submit_event().agent_owner_pubkeyis unconfigured or goose emitted no usage notification.buzz-agent — native adapter (
crates/buzz-agent)output_tokens: Option<u64>toLlmResponse; populated fromoutput_tokens(Anthropic, Responses API) andcompletion_tokens(OpenAI chat/Databricks) via the existingsum_usagehelper.turn_input_tokens/turn_output_tokensaccumulators toRunCtx, reset at turn start and summed across all LLM rounds.MetricPublisher(newsrc/metric.rs) built fromBUZZ_PRIVATE_KEY/BUZZ_RELAY_URL/BUZZ_AGENT_OWNER_PUBKEYenv vars; silent no-op when any are absent.turn_seqcounter; incremented inacquire_sessionbefore the prompt fires.session/promptcompletion, publishes akind:44200metric event with the accumulated per-turn counts.deltaReliableis alwaystrue(all rounds tracked in-process);cumulativeisNonebecause buzz-agent has no cross-turn session totals.MetricPublisher; no relay WebSocket dependency added.Tests added
buzz-acp:
test_acp_stop_to_core_maps_all_variants— all 5 ACP → NIP-AM stop reason mappings.test_publish_agent_turn_metric_noop_on_no_usage— returns immediately when usage is None.test_publish_agent_turn_metric_noop_on_no_owner— returns immediately when owner is unconfigured.test_publish_agent_turn_metric_encrypts_with_owner— encrypt/sign path executes without panic.buzz-agent:
parse_anthropic_output_tokens—output_tokensfield extracted.parse_anthropic_output_tokens_missing_usage_is_none— absent usage → None.parse_openai_output_tokens_from_completion_tokens—completion_tokensmapped tooutput_tokens.parse_openai_output_tokens_missing_usage_is_none— absent usage → None.parse_responses_output_tokens—output_tokensfield extracted.parse_responses_output_tokens_missing_usage_is_none— absent usage → None.test_metric_publisher_noop_when_env_absent— publisher is no-op when env vars absent.test_metric_publisher_configured_when_all_vars_present— publisher is active when all vars set.buzz-acp: 424 tests, buzz-agent: 130 tests. All green.