feat(trace-sampler): add shared datadog-agent-trace-sampler crate#141
Draft
lucaspimentel wants to merge 1 commit into
Draft
feat(trace-sampler): add shared datadog-agent-trace-sampler crate#141lucaspimentel wants to merge 1 commit into
datadog-agent-trace-sampler crate#141lucaspimentel wants to merge 1 commit into
Conversation
Add a dependency-free leaf crate that ports the Go trace agent error sampler (ScoreSampler targeting ErrorTPS) so serverless agents can rescue error traces from an agent-side P0 drop, keeping error visibility under aggressive sampling. The crate takes primitives in (SpanView/TraceView) and returns a SampleDecision out, exposing no protobuf Span type, so consumers pinning different libdatadog revisions can share it. bottlecap consumes it via the existing serverless-components git dependency; SCL wiring follows later. Ports FNV-1a signatures, the 6x5s rolling-bucket TPS budget with cascade and 20% rate-increase cap, deterministic sample-by-rate, and cardinality shrink, with unit tests mirroring the Go table tests. APMSVLS-469 🤖
datadog-agent-trace-sampler crate
Contributor
There was a problem hiding this comment.
Pull request overview
Adds a new dependency-free leaf crate, datadog-agent-trace-sampler, intended to provide a shared Rust port of the Datadog Agent’s error “rescue” sampler so serverless agents can retain error visibility under aggressive sampling while keeping the public API protobuf-free.
Changes:
- Introduces the new crate with a minimal public API (
SpanView,TraceView,ErrorsSampler,SampleDecision,ErrorSamplerConfig). - Implements trace signature computation + deterministic
sample_by_rate, and ports the rolling-bucket TPS-drivenErrorsSamplerlogic with unit tests. - Wires the crate into the workspace (via
crates/*) and updatesCargo.lock.
Reviewed changes
Copilot reviewed 5 out of 6 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| crates/datadog-agent-trace-sampler/src/lib.rs | Defines the public API and crate-level documentation/doctest. |
| crates/datadog-agent-trace-sampler/src/signature.rs | Implements trace signature hashing and deterministic sample-by-rate helper + tests. |
| crates/datadog-agent-trace-sampler/src/score_sampler.rs | Implements the rolling TPS budget sampler and ErrorsSampler + tests. |
| crates/datadog-agent-trace-sampler/README.md | Documents crate purpose and basic usage. |
| crates/datadog-agent-trace-sampler/Cargo.toml | Adds the new crate manifest (no dependencies). |
| Cargo.lock | Adds the new crate entry to the lockfile. |
Comment on lines
+256
to
+261
| // A malformed chunk (empty, or root_index past the end) cannot be scored; | ||
| // do not rescue it. Guards the slice indexing in the signature computation. | ||
| if self.disabled || trace.root_index >= trace.spans.len() { | ||
| return SampleDecision::Drop; | ||
| } | ||
|
|
Comment on lines
+273
to
+280
| fn apply_sample_rate(&self, trace: &TraceView, rate: f64) -> SampleDecision { | ||
| let new_rate = trace.root_global_sample_rate * rate; | ||
| if sample_by_rate(trace.trace_id, new_rate) { | ||
| SampleDecision::Keep { errors_sr: rate } | ||
| } else { | ||
| SampleDecision::Drop | ||
| } | ||
| } |
Comment on lines
+127
to
+132
| fn get_signature_sample_rate(&self, sig: Signature) -> f64 { | ||
| match self.rates.get(&sig) { | ||
| Some(&rate) => rate * self.extra_rate, | ||
| None => self.default_rate(), | ||
| } | ||
| } |
Comment on lines
+290
to
+293
| let sampler = &self.sampler; | ||
| let allow_list = self | ||
| .shrink_allow_list | ||
| .get_or_insert_with(|| sampler.rates.keys().copied().collect()); |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
TL;DR
New dependency-free crate
datadog-agent-trace-sampler— a 1:1 Rust port of the Go trace agent's error sampler. It rescues error traces that would otherwise be dropped when the agent computes stats and drops P0s, keeping error visibility under aggressive sampling. Pure leaf crate (no new deps, no protobuf types in the API), shared by bottlecap now and SCL later. This PR is the crate only; consumer wiring is separate.What does this PR do?
Adds a new dependency-free leaf crate,
datadog-agent-trace-sampler, that ports the Go trace agent's error sampler (ScoreSamplertargetingErrorTPS) fromDataDog/datadog-agent.The error sampler is a rescue sampler: after an agent decides to drop a trace, it gets a second look, and if it contains an error it is kept, up to a budget of
target_tps(default 10) error traces/sec distributed fairly across distinct trace signatures. This guarantees error visibility even under aggressive sampling.The public API takes primitives in (
SpanView/TraceView) and returns aSampleDecisionout. It never exposes a protobufSpantype, so consumers pinning differentlibdatadogrevisions can share it without compiling incompatiblepb::Spantypes into their build graphs.The port includes:
signature.rs)SampleByRate)score_sampler.rs)computeTPSPerSig,zeroAndGetMax, rate increase/eviction, default rate, disable, shrink, target-TPS effectiveness)This crate is added up front so both serverless agents can consume it:
serverless-componentsgit dependency and wires the rescue into itslambda_extension_compute_statsP0-drop path (separate PR indatadog-lambda-extension).No new third-party dependencies, so
LICENSE-3rdparty.csvis unchanged.Motivation
Closes the "Error sampler (ScoreSampler)" agent-side sampling parity gap for the serverless agents: when the agent computes stats and drops P0 chunks, error chunks are currently lost. See APMSVLS-469 (bottlecap) and APMSVLS-472 (SCL).
The rare-sampler half of those tickets is intentionally not ported: it is deprecated in the Go trace agent and no longer enabled by default.
Additional Notes
datadog-trace-agent(which pulls in the full hyper/reqwest HTTP stack) orlibdatadog. This also sidesteps thelibdatadogrev drift between consumers (bottlecap pins48da0d82, SCL pinsa8206994) that a sharedpb::SpanAPI would break.now_unix_secsis passed intosample()rather than read from a clock, keeping the crate dependency-free and the rolling-window logic deterministically testable.weight_rootuses only the root's global sample rate (there is no agent pre-sampler in serverless, so the pre-sampler rate is always 1.0).Describe how to test/QA your changes
Automated unit + doc tests in the new crate:
20 tests pass (19 unit + 1 doctest); clippy and rustfmt are clean. The unit tests are 1:1 ports of the Go trace agent's sampler table tests, validating signature stability/collisions, the TPS budget cascade, bucket rotation/decay, rate-increase capping, cardinality shrink, and that survivors are stamped with
_dd.errors_sr.🤖