Skip to content

Fix assumptions about what a top_level span is#3916

Merged
bwoebi merged 3 commits into
masterfrom
bob/fix-span-stats
Jun 5, 2026
Merged

Fix assumptions about what a top_level span is#3916
bwoebi merged 3 commits into
masterfrom
bob/fix-span-stats

Conversation

@bwoebi

@bwoebi bwoebi commented May 25, 2026

Copy link
Copy Markdown
Collaborator

Otherwise span stats will be broken for nested services.

Also properly handling version propagation, its removal according to UST was too aggressive until now.

@bwoebi bwoebi requested review from a team as code owners May 25, 2026 15:26

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 531586746b

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread tracer/span_stats.c
@datadog-prod-us1-3

datadog-prod-us1-3 Bot commented May 25, 2026

Copy link
Copy Markdown

Pipelines  Tests

Fix all issues with BitsAI

⚠️ Warnings

🚦 7 Pipeline jobs failed

DataDog/apm-reliability/dd-trace-php | ASAN test_c with multiple observers: [8.3]   View in Datadog   GitLab

DataDog/apm-reliability/dd-trace-php | ASAN test_c with multiple observers: [8.5]   View in Datadog   GitLab

DataDog/apm-reliability/dd-trace-php | pecl tests: [7.0]   View in Datadog   GitLab

View all 7 failed jobs.

ℹ️ Info

No other issues found (see more)

🧪 All tests passed
❄️ No new flaky tests detected

🎯 Code Coverage (details)
Patch Coverage: 100.00%
Overall Coverage: 54.12% (+0.00%)

Useful? React with 👍 / 👎

This comment will be updated automatically if new data arrives.
🔗 Commit SHA: c01e428 | Docs | Datadog PR Page | Give us feedback!

@pr-commenter

pr-commenter Bot commented May 25, 2026

Copy link
Copy Markdown

Benchmarks [ tracer ]

Benchmark execution time: 2026-06-05 17:17:48

Comparing candidate commit c01e428 in PR branch bob/fix-span-stats with baseline commit 87f1683 in branch master.

Found 1 performance improvements and 13 performance regressions! Performance is the same for 180 metrics, 0 unstable metrics.

Explanation

This is an A/B test comparing a candidate commit's performance against that of a baseline commit. Performance changes are noted in the tables below as:

  • 🟩 = significantly better candidate vs. baseline
  • 🟥 = significantly worse candidate vs. baseline

We compute a confidence interval (CI) over the relative difference of means between metrics from the candidate and baseline commits, considering the baseline as the reference.

If the CI is entirely outside the configured SIGNIFICANT_IMPACT_THRESHOLD (or the deprecated UNCONFIDENCE_THRESHOLD), the change is considered significant.

Feel free to reach out to #apm-benchmarking-platform on Slack if you have any questions.

More details about the CI and significant changes

You can imagine this CI as a range of values that is likely to contain the true difference of means between the candidate and baseline commits.

CIs of the difference of means are often centered around 0%, because often changes are not that big:

---------------------------------(------|---^--------)-------------------------------->
                              -0.6%    0%  0.3%     +1.2%
                                 |          |        |
         lower bound of the CI --'          |        |
sample mean (center of the CI) -------------'        |
         upper bound of the CI ----------------------'

As described above, a change is considered significant if the CI is entirely outside the configured SIGNIFICANT_IMPACT_THRESHOLD (or the deprecated UNCONFIDENCE_THRESHOLD).

For instance, for an execution time metric, this confidence interval indicates a significantly worse performance:

----------------------------------------|---------|---(---------^---------)---------->
                                       0%        1%  1.3%      2.2%      3.1%
                                                  |   |         |         |
       significant impact threshold --------------'   |         |         |
                      lower bound of CI --------------'         |         |
       sample mean (center of the CI) --------------------------'         |
                      upper bound of CI ----------------------------------'

scenario:MessagePackSerializationBench/benchMessagePackSerialization-opcache

  • 🟩 execution_time [-4.065µs; -2.915µs] or [-3.672%; -2.632%]

scenario:PDOBench/benchPDOOverhead

  • 🟥 execution_time [+21.406µs; +25.573µs] or [+8.735%; +10.435%]

scenario:PDOBench/benchPDOOverhead-opcache

  • 🟥 execution_time [+13.592µs; +16.120µs] or [+5.569%; +6.604%]

scenario:PDOBench/benchPDOOverheadWithDBM

  • 🟥 execution_time [+20.536µs; +22.821µs] or [+8.334%; +9.261%]

scenario:PDOBench/benchPDOOverheadWithDBM-opcache

  • 🟥 execution_time [+12.712µs; +15.459µs] or [+5.201%; +6.326%]

scenario:PHPRedisBench/benchRedisOverhead

  • 🟥 execution_time [+97.153µs; +111.274µs] or [+10.056%; +11.518%]

scenario:PHPRedisBench/benchRedisOverhead-opcache

  • 🟥 execution_time [+61.094µs; +70.621µs] or [+6.133%; +7.089%]

scenario:SpanBench/benchDatadogAPI

  • 🟥 execution_time [+4.476µs; +6.742µs] or [+6.808%; +10.254%]

scenario:SpanBench/benchDatadogAPI-opcache

  • 🟥 execution_time [+4.176µs; +5.971µs] or [+6.368%; +9.106%]

scenario:SpanBench/benchOpenTelemetryInteroperability

  • 🟥 execution_time [+7.798µs; +10.794µs] or [+4.171%; +5.773%]

scenario:TraceFlushBench/benchFlushTrace

  • 🟥 execution_time [+61.074µs; +94.726µs] or [+5.075%; +7.872%]

scenario:TraceFlushBench/benchFlushTrace-opcache

  • 🟥 execution_time [+1.161ms; +1.191ms] or [+96.192%; +98.670%]

scenario:TraceSerializationBench/benchSerializeTrace

  • 🟥 execution_time [+72.321µs; +84.179µs] or [+17.213%; +20.035%]

scenario:TraceSerializationBench/benchSerializeTrace-opcache

  • 🟥 execution_time [+87.126µs; +98.774µs] or [+24.745%; +28.053%]

@bwoebi bwoebi force-pushed the bob/fix-span-stats branch 3 times, most recently from 9883729 to 2dc3d7b Compare May 26, 2026 12:28
@bwoebi bwoebi force-pushed the bob/fix-span-stats branch from 2dc3d7b to e3541c3 Compare June 5, 2026 14:48
@github-actions

github-actions Bot commented Jun 5, 2026

Copy link
Copy Markdown
Contributor

Snapshots difference summary

The following differences have been observed in committed snapshots. It is meant to help the reviewer.
The diff is simplistic, so please check some files anyway while we improve it.

If you need to update snapshots, please refer to CONTRIBUTING.md

2 occurrences of :

- "span.kind": "client"
+ "span.kind": "client"
+ "version": "1.0"

Signed-off-by: Bob Weinand <bob.weinand@datadoghq.com>
@bwoebi bwoebi force-pushed the bob/fix-span-stats branch from e3541c3 to 94ebf39 Compare June 5, 2026 15:09
@bwoebi bwoebi merged commit af16d47 into master Jun 5, 2026
2115 of 2133 checks passed
@bwoebi bwoebi deleted the bob/fix-span-stats branch June 5, 2026 16:50
@github-actions github-actions Bot added this to the 1.21.0 milestone Jun 5, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants