Alpha release#6
Merged
Merged
Conversation
# Conflicts: # src/main/java/com/datadoghq/trace/impl/DDTracer.java
Gpolaert/dev
…ice name in tracer
… is streaming JSON (optim)
…amplers (+tests & examples)
# Conflicts: # dd-trace/src/main/java/com/datadoghq/trace/resolver/DDTracerFactory.java
jordan-wong
added a commit
that referenced
this pull request
Mar 27, 2026
Generated by apm-instrumentation-toolkit using add-apm-integrations skill v2. This validates skill v2 fixes all Run #0 failures (R1-R5). Note: Run #0 used feign-core 10.0.0, this uses 10.8 (AsyncClient exists in both). The minor version difference doesn't affect the rule validation comparison. Layer 1 results: 6/6 pass (after codeNarc fix) - compileJava: PASS - spotlessCheck: PASS - muzzle: PASS (40+ versions validated) - test: PASS - latestDepTest: PASS - codenarcTest: PASS (after renaming EmptyPlaceholder1/2) Critical rules validation (vs Run #0 failures): ✓ R1 (no lambdas): Uses AsyncCompletionHandler class instead of lambda ✓ R2 (async future wrapping): @Advice.Return(readOnly=false) + assigns wrapped future back ✓ R3 (single module): One @autoservice with typeInstrumentations() list ✓ R4 (module naming): feign-10.8 (not feign-core) ✓ R5 (thread safety): CallDepthThreadLocalMap.reset() on same thread (not in async handler) ✓ R12 (reentrancy): CallDepthThreadLocalMap pattern present Files: - FeignClientModule: Single @autoservice with sync+async instrumentations (R3) - SyncClientInstrumentation: Sync Client.execute() instrumentation - AsyncClientInstrumentation: Async AsyncClient.execute() with proper future wrapping (R2, R5) - AsyncCompletionHandler: Completion handler class (no lambdas, R1) - FeignClientDecorator: HttpClientDecorator - RequestInjectAdapter: Header injection for context propagation Agent metrics: ~254 min, unknown turns, unknown cost 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
jandro996
added a commit
that referenced
this pull request
May 14, 2026
- ScaReachabilitySmokeTest: fix find() to look for the entry with reachability metadata — the same dep appears twice (once from DependencyService without metadata, once from SCA with CVE data) - TelemetryRequestBody.writeDependency(): write metadata:[] even when list is empty — null means SCA disabled, empty list means SCA active but no CVEs detected yet (RFC: all deps get metadata:[] at startup) - sca_cves.json: remove class-level symbol from snakeyaml — Spring Boot loads Yaml at startup causing registerCve+recordHit to fire in the same request, preventing the reached:[] heartbeat from being observed - ScaReachabilityPeriodicActionTest: add rfcFullHeartbeatFlow test covering Heartbeats #2-#6 from the RFC spec - TelemetryRequestBodyDependencyMetadataTest: update to reflect that metadata:[] is written (not suppressed) when list is empty
bm1549
added a commit
that referenced
this pull request
May 15, 2026
dougqh
added a commit
that referenced
this pull request
May 20, 2026
#4: PropertyCardinalityHandler and TagCardinalityHandler are only consumed within this package; drop `public` from the class declarations, constructors, and methods. They're package-private now. #6: Add tests that lock down the EMPTY-on-null contract that the rest of the package depends on: - CardinalityHandlerTest covers both handlers: register(null) -> EMPTY, and registering null repeatedly doesn't consume the cardinality budget. - AggregateEntryTest covers the entry shape: optional fields built from a snapshot with null inputs resolve to EMPTY; populated optional fields carry their value. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
jandro996
added a commit
that referenced
this pull request
Jun 6, 2026
- ScaReachabilitySmokeTest: fix find() to look for the entry with reachability metadata — the same dep appears twice (once from DependencyService without metadata, once from SCA with CVE data) - TelemetryRequestBody.writeDependency(): write metadata:[] even when list is empty — null means SCA disabled, empty list means SCA active but no CVEs detected yet (RFC: all deps get metadata:[] at startup) - sca_cves.json: remove class-level symbol from snakeyaml — Spring Boot loads Yaml at startup causing registerCve+recordHit to fire in the same request, preventing the reached:[] heartbeat from being observed - ScaReachabilityPeriodicActionTest: add rfcFullHeartbeatFlow test covering Heartbeats #2-#6 from the RFC spec - TelemetryRequestBodyDependencyMetadataTest: update to reflect that metadata:[] is written (not suppressed) when list is empty
gh-worker-dd-mergequeue-cf854d Bot
pushed a commit
that referenced
this pull request
Jun 9, 2026
…es and callsites via telemetry (#11352) Implement SCA Reachability: detect vulnerable library classes at runtime Adds a new SCA Reachability subsystem that reports which vulnerable library classes were actually loaded at runtime, reducing false positives from static dependency scanning. Gated on DD_APPSEC_SCA_ENABLED. Key components: - Gradle task downloads GHSA enrichments from sca-reachability-database and generates sca_cves.json bundled in the agent jar at build time - ClassFileTransformer (observation-only) detects when vulnerable classes are loaded, resolves JAR versions via pom.properties, and checks semver ranges using ComparableVersion (Maven semantics) - ScaReachabilityCollector bridges the transformer and telemetry without circular dependencies, following the WafMetricCollector pattern - ScaReachabilityPeriodicAction reports hits on each app-dependencies-loaded heartbeat by adding reachability metadata to existing dependency entries Commit sca_cves.json as versioned resource; update generateScaCvesJson task The Gradle task now writes to src/main/resources/ and runs only when -PrefreshSca is passed or the file is absent, so CI builds never need network access to the private sca-reachability-database repo. Fix Path B classpath scan for Java 9+: fall back to java.class.path On Java 9+, the system classloader (jdk.internal.loader.ClassLoaders$AppClassLoader) no longer extends URLClassLoader, so the URLClassLoader chain walk misses all main classpath entries. Add a fallback that reads java.class.path to cover this case, deduplicating with a HashSet<URL> to avoid scanning the same JAR twice. Add Java 9+ test for Path B classpath fallback; make method package-private Test verifies: (1) system classloader is not URLClassLoader on Java 9+, and (2) findArtifactVersionInClasspath finds artifacts via java.class.path fallback. Applies to Java 9 and all subsequent JDKs (permanent JDK design change). Implement method-level symbol detection with ASM bytecode injection When sca_cves.json contains symbols with method != null, the transformer injects a static callback at method entry using ASM. The callback fires the first time the method is called and reports via ScaReachabilityCallback (bootstrap classloader, accessible from any application class). Key changes: - ScaReachabilityCallback in agent-bootstrap: bootstrap-visible callback with runtime dedup (vulnId|artifact|methodName) and handler registration - ScaReachabilityTransformer: injectMethodCallbacks() uses ByteBuddy ASM to inject INVOKESTATIC at first line number of each target method; processClass() routes class-level vs method-level symbols separately - ScaReachabilityHit: adds symbolName + line fields; existing constructor defaults to <clinit>/1 for class-level hits (backward compatible) - ScaReachabilityPeriodicAction: buildMetadataValue() now uses hit.symbolName() and hit.line() instead of hardcoded values - 6 tests: ASM injection, callback fires on method call only, dedup, multiple methods, safe method not reported, class-level unaffected Retransform classes for method-level detection: already-loaded and version-unresolved Two cases required deferred retransformation: 1. Classes already loaded at startup (before transformer registered): the bytecode callback cannot be injected without retransformClasses() 2. Classes where DependencyResolver returned empty deps at load time (version not yet resolvable): empty results are now not cached to allow retries ScaReachabilityTransformer now stores Instrumentation and exposes performPendingRetransforms() called on each telemetry heartbeat via a Runnable callback in ScaReachabilityCollector.periodicWorkCallback. Classes are queued via: - pendingRetransform (Class<?> queue) from checkAlreadyLoadedClasses - pendingRetransformNames (String set) from processClass on empty deps Fix: remove incorrect dedup from injectCallbacks; update invariants retransformClasses() always starts from the ORIGINAL class file bytes, not from the previously-transformed bytes. A dedup check in injectCallbacks() that blocked re-injection on the second pass caused the callback to be removed (the class was returned to its original, un-instrumented state). The authoritative dedup for method-level hits is ScaReachabilityCallback.reported (bootstrap-side), which persists across retransformations regardless of how many times transform() is called on the same class. Also update .claude-invariants.md: retransformClasses is now used (for method-level only), the cache constraint clarified, and the dedup invariant documents the two-level approach (transformer for class-level, bootstrap for method-level). pr-review: fix null guard, encapsulate periodicWorkCallback, update Javadoc, add retransform tests - performPendingRetransforms(): early return when instrumentation is null (unit test safety) - ScaReachabilityCollector: encapsulate periodicWorkCallback as private with getter/setter - ScaReachabilityTransformer class Javadoc: update dedup description from (vulnId,artifact) pair to (vulnId,artifact,symbolName) tuple; document two-level dedup strategy - Add 3 tests for performPendingRetransforms(): no-op with null inst, retransformClasses called for pending queue, no-op when both queues empty Fix two Codex review issues: java.nio in premain and transitive JAR resolution P1: Replace StandardCharsets.UTF_8 with "UTF-8" string in ScaCveDatabase.load(). java.nio.* is forbidden during premain (bootstrap_design_guidelines.md) because it can trigger premature provider initialization before the app configures the runtime. P2: Add classpath fallback in resolveVersionForArtifact() for entries where the vulnerable artifact is an aggregator/starter POM whose watched classes live in a transitive dependency JAR (e.g., spring-boot-starter-web watches @controller but @controller is defined in spring-context.jar, not the starter). The new helper first checks the class's own JAR, then falls back to findArtifactVersionInClasspath with a hit cache (classpathArtifactCache). processPathA uses the same helper for consistency. Refactor: extract CLASS_LEVEL_SYMBOL constant and reportClassLevelHitIfPresent helper - Add CLASS_LEVEL_SYMBOL = "<clinit>" constant to avoid magic string repetition (appeared 3 times in the same class; a typo would silently produce wrong symbol names) - Extract reportClassLevelHitIfPresent(entry, version, internalClassName) helper to unify identical class-level symbol matching loops in processPathA, processPathB, and processClass — all three now delegate to the single helper Move CLASS_LEVEL_SYMBOL to ScaReachabilityHit; fix misleading comment Move CLASS_LEVEL_SYMBOL = "<clinit>" to ScaReachabilityHit (internal-api) as a public constant so both the transformer (appsec) and the telemetry payload builder share the canonical definition without cross-module string duplication. The convenience constructor also uses the constant now. ScaReachabilityTransformer delegates to ScaReachabilityHit.CLASS_LEVEL_SYMBOL. Fix misleading comment in processClass: "We enqueue via classBeingRedefined is null here" → explains that classBeingRedefined is null on first class load, preventing direct Class<?> queuing, so scheduleRetransformByName is used instead. Move java.nio comment to usage site; add tests for transitive JAR fallback - ScaCveDatabase: move "java.nio.* forbidden in premain" comment from the imports block to inline at the InputStreamReader construction site (comments in imports are unusual and smola flags verbose placement) - ScaReachabilityTransformer.resolveVersionForArtifact: make package-private for testing; add 4 tests covering the two-step fallback: (1) version from classJarDeps directly (2) classpath fallback when classJarDeps is empty (transitive JAR case) (3) classpathArtifactCache hit on second call (4) null for absent artifact Remove dead visitCode() override and redundant CLASS_LEVEL_SYMBOL alias - Remove empty visitCode() in MethodEntryInjector: the method only called super.visitCode() and its comment was misleading — the actual no-debug-info fallback injection is handled by ensureInjected() in the visitInsn/visitVarInsn/ visitMethodInsn/visitFieldInsn overrides, not here - Remove private CLASS_LEVEL_SYMBOL alias in ScaReachabilityTransformer: the constant is used in exactly one place (reportClassLevelHitIfPresent) and ScaReachabilityHit.CLASS_LEVEL_SYMBOL is self-documenting at that site; the alias added a private field with no benefit after the constant was moved to ScaReachabilityHit in a previous commit Capture callsite for method-level hits (mirrors Python tracer) Per the RFC and Python implementation (dd-trace-py#17156), the telemetry payload path/symbol/line for method-level hits must report the APPLICATION FRAME that called the vulnerable method (the callsite), not the vulnerable method itself. ScaReachabilityCallback.onMethodHit() now walks Thread.getStackTrace() to find the first non-agent, non-JDK frame after the vulnerable class: ScaReachabilityCallback.onMethodHit (skip - us) com.foo.VulnerableClass.method (skip - vulnerable class) com.myapp.UserService.processRequest (CALLSITE - report this) The dotClassName/methodName params are still baked into the bytecode and used only for deduplication (vulnId|artifact|methodName key). The handler receives the callsite's class/method/line for telemetry. Fallback: if no application frame is found (e.g. called from JDK internals), reports the vulnerable symbol itself so the backend knows it was reached. Class-level hits (<clinit>) are unchanged — no callsite at class load time. Move callsite detection from bootstrap to ScaReachabilitySystem handler ScaReachabilityCallback (bootstrap) must stay minimal — complex logic does not belong there. Move findCallsite() to ScaReachabilitySystem which has access to internal-api utilities. The handler runs synchronously so the full call stack is still present: ScaReachabilitySystem handler ScaReachabilityCallback.onMethodHit <vulnerable method> <application callsite> ← reported Uses the same class-prefix predicate as AbstractStackWalker. isNotDatadogTraceStackElement (package-private, so replicated inline) to skip agent/JDK frames, consistent with the IAST trie-based filtering infrastructure used elsewhere in the codebase. Use AbstractStackWalker.isNotDatadogTraceStackElement for callsite filtering Make isNotDatadogTraceStackElement public in AbstractStackWalker so SCA Reachability can use the existing predicate directly rather than duplicating the 3 class-prefix conditions inline. Add tests for ScaReachabilitySystem.findCallsite(); document fallback path ScaReachabilitySystemCallsiteTest covers: - findCallsite returns null when vulnerable class is not on the stack (triggers fallback: handler reports the vulnerable symbol itself) - findCallsite skips the vulnerable class frame and returns the first non-agent frame above it (using java.lang.Thread as a non-agent class guaranteed to be at the top of getStackTrace()) Note on the method-level integration test: TargetClass is in com.datadog.appsec.sca.* (agent namespace) so AbstractStackWalker filters it as agent code and findCallsite() returns null. The test now documents this fallback behaviour explicitly. In production the vulnerable class is always a 3rd-party library (e.g. com.fasterxml.jackson.*) and the happy path fires correctly — verified by ScaReachabilitySystemCallsiteTest. Update ScaReachabilityHit Javadoc to reflect dual callsite/symbol semantics Move findCallsite() after start() — helpers after main public method Use ConcurrentHashMap.newKeySet() instead of verbose newSetFromMap idiom Lazy entryHasMethodLevelSymbol check — avoid stream alloc on normal path The stream().anyMatch() for detecting method-level symbols was computed for every entry unconditionally. It is only needed when version == null (deps not yet resolvable). Moving the check inside the version==null branch eliminates the stream allocation on the common path where the version resolves successfully. Remove Path B from startup scan — JDK symbols are false positive indicators JDK classes (e.g. java.sql.PreparedStatement, protectionDomain==null) are loaded by ANY app that uses JDBC, regardless of which driver is present. Using their presence to infer that a specific library (e.g. PostgreSQL) is "reachable" produces classpath-presence false positives, not runtime reachability signals. Entries that list JDK symbols (e.g. the PostgreSQL advisory) also include library-specific classes (e.g. org.postgresql.ds.PGSimpleDataSource) that Path A correctly detects when those classes are actually loaded. In checkAlreadyLoadedClasses(), classes with no code source (JDK/bootstrap) are now skipped silently. The invariants and KB are updated accordingly. Remove dead processPathB() — never called after Path B removal Fix dedup key to include class name for method-level hits The key vulnId|artifact|methodName collapses hits for different classes in the same artifact that share a method name (e.g. ClassA.parse and ClassB.parse would both map to the same key, suppressing the second). Fix: add dotClassName to the key → vulnId|artifact|dotClassName|methodName so each class+method combination is tracked independently. Add regression test that verifies both hits are reported when the same method name exists in two different vulnerable classes of the same artifact. Implement stateful RFC heartbeat model for SCA telemetry Replace stateless ScaReachabilityCollector (simple hit queue) with ScaReachabilityDependencyRegistry (stateful per-dependency CVE tracking) to comply with the RFC heartbeat specification: 1. When a class from a vulnerable version loads: registerCve() creates a CVE entry with reached=[] and marks the dependency as pending, so the next heartbeat reports metadata:[{"type":"reachability","value":"...reached:[]"}] — signalling the backend that SCA is monitoring this CVE before any symbol is called. 2. When a vulnerable method is called: recordHit() stores the first callsite (RFC: single occurrence sufficient) and marks the dependency as pending. 3. On each heartbeat: ScaReachabilityPeriodicAction drains pending dependencies and re-reports ALL CVEs for each dependency together (both with and without callsites), then clears pending. Empty heartbeat otherwise. Key invariant: whenever any CVE state changes, ALL CVEs for the same dependency are re-reported together so the backend has a complete picture. Add smoke test for SCA Reachability telemetry (APPSEC-62260) Verifies the RFC stateful heartbeat model end-to-end: - jackson-databind:2.6.0 (vulnerable, range < 2.6.7.3) appears in app-dependencies-loaded with metadata reachability entries - GHSA identifier present in the metadata value - reached[] contains a callsite after ObjectMapper is loaded at startup Uses the existing springboot smoke test app (already has jackson-databind:2.6.0) with DD_APPSEC_SCA_ENABLED=true added to JVM args. Add method-level symbols for jackson-databind deserialization CVEs For the 7 jackson-databind CVE entries, adds method-level symbols for the deserialization entry points that actually trigger gadget chains when polymorphic typing is enabled with untrusted input: ObjectMapper.readValue — primary deserialization entry point ObjectMapper.readValues — multiple-value deserialization ObjectReader.readValue — reader-based deserialization variant ObjectReader.readValues — reader-based multiple values Class-level symbols (method=null) are kept alongside the new method-level ones: class load detection signals the library is present; method detection signals the vulnerable code path was actually invoked. 26 method-level symbols added across 7 entries (ObjectMapper + ObjectReader × readValue + readValues × 7 GHSA entries). Add method-level symbols for xstream, log4j, snakeyaml, jackson-mapper-asl Adds method-level detection for 24 entries across 4 libraries where the deserialization/injection entry point is 100% certain: XStream (17 entries): fromXML — THE entry point for all XStream CVEs; triggers gadget chains when deserializing untrusted XML log4j-core (4 entries): info, error, warn, debug, trace, fatal, log — Log4Shell (GHSA-jfh8-c2jp-5v3q) triggers JNDI lookup when log messages contain ${jndi:...} patterns; any Logger method is an entry point snakeyaml (1 entry): load, loadAll — unsafe YAML deserialization; instantiates arbitrary Java classes from untrusted YAML input jackson-mapper-asl (1 entry): readValue, readValues — same deserialization pattern as jackson-databind, applies to the legacy 1.x mapper 56 method-level symbols added. Class-level symbols (method=null) are kept alongside the new method-level ones for dual detection coverage. Fix SCA smoke test, RFC compliance and add heartbeat flow tests - ScaReachabilitySmokeTest: fix find() to look for the entry with reachability metadata — the same dep appears twice (once from DependencyService without metadata, once from SCA with CVE data) - TelemetryRequestBody.writeDependency(): write metadata:[] even when list is empty — null means SCA disabled, empty list means SCA active but no CVEs detected yet (RFC: all deps get metadata:[] at startup) - sca_cves.json: remove class-level symbol from snakeyaml — Spring Boot loads Yaml at startup causing registerCve+recordHit to fire in the same request, preventing the reached:[] heartbeat from being observed - ScaReachabilityPeriodicActionTest: add rfcFullHeartbeatFlow test covering Heartbeats #2-#6 from the RFC spec - TelemetryRequestBodyDependencyMetadataTest: update to reflect that metadata:[] is written (not suppressed) when list is empty fix(smoke): add braces to if statement to satisfy CodeNarc IfStatementBraces rule fix(spotbugs): make periodicWorkCallback private, expose via getter refactor: replace Map<?,?> casts with typed Moshi DTOs in ScaCveDatabase cleanup: remove stale Path A/B terminology after Path B was removed chore: remove .claude-invariants.md from tracking, add to .gitignore fix(forbiddenapis): replace String#split() with pre-compiled Pattern.split() fix: remove class-level symbols from all xstream entries Same issue as snakeyaml: class-level hit fires when XStream loads and registers <clinit> as callsite. First-hit-wins then blocks fromXML() method-level callbacks. Keeping only fromXML gives real application callsite info. feat: emit metadata:[] for all deps in DependencyPeriodicAction when SCA enabled Per RFC: when DD_APPSEC_SCA_ENABLED=true every dependency must be reported with metadata:[] as the initial SCA-active signal. ScaReachabilityPeriodicAction later re-emits the same dependency with actual CVE metadata. The smoke test already handles the resulting duplicate entries by filtering for the entry that carries reachability metadata. fix(spotbugs): replace URL collections with URI to avoid DNS lookups (DMI_COLLECTION_OF_URLS) chore: remove dead ScaReachabilityCollector, fix stale Javadoc, drop redundant parameter refactor: unify dep reporting into ScaReachabilityPeriodicAction when SCA enabled - DependencyPeriodicAction reverts to original (SCA-unaware) - ScaReachabilityPeriodicAction now accepts DependencyService and takes over all app-dependencies-loaded emission when SCA is enabled: 1. Drains registry → map by artifact@version 2. Drains DependencyService → merges with CVE state or emits metadata:[] 3. Emits remaining registry snapshots (existing deps with state changes) - TelemetrySystem registers ScaReachabilityPeriodicAction instead of DependencyPeriodicAction when SCA is enabled — no duplicates possible - Tests cover all three merge paths fix(techdebt): static imports, remove inline java.util refs, replace em-dashes with hyphens fix: restore ScaReachabilityPeriodicActionTest; fix raw type Class[0] to Class<?>[0] fix(techdebt): move pendingRetransformNames to field section; json-escape buildMetadataValue; deduplicate recordHit fix(techdebt): extract depKey helper; delete empty test stub ScaReachabilityTransformerTest fix(thread-safety): use AtomicReference.compareAndSet for first-hit-wins in CveState Replace volatile ScaReachabilityHit hit with AtomicReference<ScaReachabilityHit>. The volatile check-then-assign (if hit == null -> hit = newHit) is not atomic: two threads calling recordHit for the same CVE via different methods could both observe null and both write, with the second overwriting the first. compareAndSet guarantees exactly one callsite is stored regardless of thread interleaving. Add ScaReachabilityDependencyRegistryTest with a 20-thread concurrency test that verifies exactly one callsite is recorded under simultaneous competing writes. fix: remove stale .claude-invariants.md reference from ScaReachabilityTransformer Javadoc fix: wrap onMethodHit in try/catch to prevent exception propagation to application code; fix assertTrue usage in test fix: use knownDeps to enrich CVE snapshots with source/hash from prior DependencyService drains When a CVE fires (class loaded + method called) and DependencyService has not yet resolved the dep in the same heartbeat, Step 3 previously emitted the dep without source/hash. The backend requires source/hash to correlate the entry with a known dependency, so it would silently ignore the CVE data. Fix: ScaReachabilityPeriodicAction now maintains a persistent knownDeps map across heartbeats. Each dep drained from DependencyService is stored there. Step 3 looks up knownDeps to enrich CVE snapshots; if the dep is not yet known (JAR still resolving), re-marks it as pending and retries next heartbeat instead of emitting without source/hash. Add ScaReachabilityDependencyRegistry.markPending() to re-mark a dep as pending without resetting CVE state, used by the retry logic. Tests cover: CVE-after-dep-resolved (uses knownDeps), CVE-before-dep-resolved (retries until dep known), simultaneous dep+CVE (Step 2 merge still works). fix: emit CVE data immediately in Step 3, use knownDeps only for source/hash enrichment The retry-until-resolved approach blocked emission entirely when DependencyService had not yet resolved a JAR, causing system tests to timeout (25s window). The backend can correlate entries by name:version alone, so blocking is not needed. New behavior: always emit in Step 3. Use knownDeps to enrich with source/hash when available (dep was resolved in a prior heartbeat); otherwise emit without source/hash. Subsequent emissions (e.g., after a method hit) will be enriched once knownDeps is populated from DependencyService, giving the backend the correlation data it needs. fix(sca): force snakeyaml class load in smoke test via PostConstruct The smoke test app uses application.properties (not application.yml), so Spring Boot never triggers snakeyaml loading. ScaReachabilityInit instantiates Yaml in @PostConstruct to ensure class-level CVE detection at startup. Also removes the unused ScaReachabilityCollector draft and reverts TelemetrySystem to register both DependencyPeriodicAction and ScaReachabilityPeriodicAction (restoring dual-action mode). Smoke test now passes in ~4s with the default 30s timeout. ci: retrigger CI refactor(sca): remove dead markPending, inline scheduleRetransformByName, fix recordHit TOCTOU - Remove unused markPending() - the retry-via-re-mark flow was superseded by the emit-immediately approach in Step 3 of ScaReachabilityPeriodicAction - Inline private single-statement scheduleRetransformByName() at its only call site - Fix non-atomic containsKey+get in recordHit: use computeIfAbsent directly, which eliminates the race and avoids the unintended pendingReport=true side effect that unconditional registerCve() would have introduced - Update stale Javadoc in ScaReachabilityPeriodicAction referencing the old re-mark-pending retry strategy fix(sca): register only ScaReachabilityPeriodicAction when SCA enabled (Option C) When DD_APPSEC_SCA_ENABLED=true, ScaReachabilityPeriodicAction handles all app-dependencies-loaded emission by merging DependencyService drains with CVE registry state. DependencyPeriodicAction is skipped to avoid: - Duplicate entries per dep per heartbeat - DependencyService queue being drained before ScaReachabilityPeriodicAction can see new JARs (which prevented knownDeps from being populated and caused CVE entries to always emit without source/hash) The previous attempt (commit b0421ab) failed system-tests due to a markPending retry in Step 3 that could delay CVE emission by one heartbeat cycle. That was fixed in 6925358 (immediate emit). The current code is safe. revert: restore pre-existing em dashes in GatewayBridge, ObjectIntrospection, StandardizedLogging fix: hoist dotClassName conversion outside inner loop in processClass fix(sca): deduplicate index entries per class when entry has multiple symbols for same class An entry with symbols [Yaml.load, Yaml.loadAll] indexed once per symbol causing the same ScaEntry to appear twice for org/yaml/snakeyaml/Yaml. processClass then iterated the duplicate and injected bytecode callbacks twice per method, resulting in redundant ScaReachabilityCallback.onMethodHit() calls on every invocation. Fix: track seen class names per entry with a HashSet; add regression test. fix(sca): include version in hit dedup keys to isolate multi-version classloader scenarios ScaReachabilityCallback.reported and ScaReachabilityTransformer.reportedHits keyed hits without the artifact version. In a multi-classloader setup (multiple WARs, OSGi) where two versions of the same library coexist, the first version's hit suppressed all subsequent versions for the same CVE and symbol. Both keys now include version: vulnId|artifact|version|class|method. fix(sca): skip intermediate library frames in callsite detection When the call chain is client -> lib_wrapper -> vulnerable_method, findCallsite was reporting lib_wrapper instead of the client frame. Add sca_stack_exclusion.trie (generated as ScaStackExclusionTrie) and apply it after the vulnerable-class boundary to skip known library packages, matching the approach used by IAST SinkModuleBase. Refactor findCallsite to a testable overload that accepts an explicit StackTraceElement[], and rewrite tests with synthetic stacks. fix(sca): add TODO for inner-class format in GhsaEnrichmentParser Document that replace('.', '/') would produce wrong internal names for inner classes if GHSA uses dot notation (e.g. Outer.Inner instead of Outer$Inner). Current database has no inner class entries so safe for now; track for when database team defines method-symbol format. refactor(sca): defer class processing off class-loading thread; cleanup Addresses Manuel's review comment: all heavyweight processing now runs on the telemetry heartbeat thread instead of inline during class loading. Changes in ScaReachabilityTransformer: - transform() on first load (classBeingRedefined == null) only enqueues a PendingClass(className, jarURL) record and returns null immediately. No JAR I/O (DependencyResolver.resolve), no version lookup, no hit reporting on the class-loading thread. Matches the pattern used by LocationsCollectingTransformer. - New processPendingClassEvents(): drains the queue on the telemetry heartbeat, performs all heavyweight work (JAR reads, version resolution, class-level hit reporting), and populates pendingRetransformNames for method-level symbols. Must run before performPendingRetransforms() so classes queued in one heartbeat are retransformed in the same cycle. - transform() on retransform (classBeingRedefined != null) takes the existing inline path (processClass 4-arg) to inject method-level bytecode - required by the ClassFileTransformer contract. - Tradeoff vs old inline approach: ~10s window at startup during which method calls go undetected. No behavioral difference for class-level (the only active symbols today). - performPendingRetransforms(): use contains()+removeAll() instead of remove() inside the getAllLoadedClasses() loop. Spring Boot fat JARs create multiple LaunchedURLClassLoader instances; the old remove() only matched the first instance, so the application classloader was never retransformed and method-level callbacks never fired. All classloader instances of the same class name now get the callback bytecode injected. Additional cleanups in the same files: - Rename void processClass(3-arg) to reportClassLevelHits to disambiguate it from the byte[]-returning processClass(4-arg) that injects bytecode. - Extract duplicated method-level symbol stream expression into private static helper hasMethodLevelSymbolForClass(List<ScaEntry>, String). - ScaReachabilitySystem.start() periodic work callback updated to call processPendingClassEvents() before performPendingRetransforms(). - ASM injection: replace SIPUSH with visitLdcInsn(line) so line numbers > 32767 (valid in generated code) produce correct bytecode instead of a truncated signed-short value. - Defer dotClassName allocation to the method-level symbol path where it is actually needed. - Update processClass(4-arg) Javadoc to reflect two-phase design: only called on retransform for method-level symbols; class-level hits were already reported by reportClassLevelHits in processPendingClassEvents. - Two new tests in ScaReachabilityMethodLevelTest verify the structural invariants: first load enqueues and returns null; retransform does not re-enqueue. refactor(sca): remove unused 4-arg convenience constructor from ScaReachabilityHit test(sca): add regression tests for multi-classloader retransform fix ScaReachabilityRetransformTest verifies that performPendingRetransforms() retransforms ALL classloader instances of a class, not just the first one. Simulates Spring Boot's multiple LaunchedURLClassLoader instances by returning the same Class<?> twice from getAllLoadedClasses() — with the old remove() approach only one entry was passed to retransformClasses(); with the fix (contains()+removeAll()) both are collected. Also tests the re-queue path: on retransformClasses() failure, all collected classes must be added back to pendingRetransform for the next heartbeat retry. Adds mockito to appsec test dependencies. Makes pendingRetransformNames package-private (consistent with pendingRetransform and pendingClassEvents). fix(sca): widen catch to Throwable in performPendingRetransforms InternalError and other JVM Errors from retransformClasses() would escape a catch(Exception) block and kill the telemetry thread silently. Matches the existing catch(Throwable) in transform() for the same reason. ci: retrigger pipeline fix(sca): log swallowed exceptions in ScaReachabilityCallback at debug level refactor(sca): move generateScaCvesJson into ScaEnrichmentsPlugin in buildSrc Move the inline generateScaCvesJson task from appsec/build.gradle into a proper Gradle plugin (ScaEnrichmentsPlugin) registered as 'dd-trace-java.sca-enrichments'. The plugin is simpler to test and removes ~90 lines of scripting from the build script. The plugin and committed sca_cves.json are a temporary solution — the symbol database will be delivered via Remote Config in a future iteration, at which point both will be removed. test(sca): add ScaEnrichmentsPluginTest; fix processResources wiring Use pluginManager.withPlugin("java") to defer the processResources dependency until after the java plugin has registered the task, avoiding a configuration-time failure when the plugin is applied before java. Add 4 integration tests using GradleFixture/TestKit: - task is SKIPPED when file exists and -PrefreshSca is absent - task attempts to run (not SKIPPED) when -PrefreshSca is set - task attempts to run (not SKIPPED) when file is absent - processResources depends on generateScaCvesJson fix(sca): scope processResources JSON minification to sca_cves.json only fix(sca): address jpbempel review comments - Replace FQN types with proper imports (Set, ConcurrentHashMap in ScaReachabilityCallback; Reader in ScaCveDatabase; Map.Entry in ScaReachabilityDependencyRegistry) - Narrow catch (Throwable) to catch (Exception) in onMethodHit: OOMError and StackOverflowError are JVM conditions that must propagate to the application; catch (Throwable) in performPendingRetransforms stays because retransformClasses() can throw InternalError by spec - Use Strings.getClassName() instead of manual replace('/', '.') - Guard getAllLoadedClasses() loops against null items (class unloading race) nit(sca): use project VisibleForTesting annotation in ScaReachabilityTransformer datadog.trace.api.internal.VisibleForTesting is available transitively via components:annotations (already on classpath via internal-api). Replace package-private comments with the annotation. nit(sca): add @VisibleForTesting to remaining package-private methods resolveVersionForArtifact and injectMethodCallbacks were still using the comment form; align with the annotation applied to the other testing-only members. perf(sca): use StackWalkerFactory for lazy stack evaluation in findCallsite Replace Thread.currentThread().getStackTrace() with StackWalkerFactory.INSTANCE.walk() so that on JDK9+ the stack is evaluated lazily: frames are walked one by one and evaluation stops as soon as the application callsite is found, without materializing the full stack array. Agent frame filtering is now handled by the walker itself (AbstractStackWalker.doFilterStack) rather than manually in the loop. The StackTraceElement[] overload used in tests is preserved; it delegates to the shared findCallsiteInStream helper with the agent filter applied manually on the array. fix(sca): address bric3 review comments on PR #11352 - Extract ASM injection into ScaMethodCallbackInjector (separate concerns) - Replace URLClassLoader walk with java.class.path scan (dead code on agent threads where AgentThreadFactory sets null context classloader; also not a URLClassLoader on Java 9+); add unit tests documenting both constraints - Replace stream with plain loop in hasMethodLevelSymbolForClass (avoids spliterator allocation on each class-load check) - Normalize class-level vs method-level symbol coexistence in ScaCveDatabase.toScaEntry(): drop class-level (method=null) symbol when method-level symbols exist for the same class in the same entry, preventing first-hit-wins hitRef from discarding the more specific callsite - Guard performPendingRetransforms with isModifiableClass() to avoid infinite retry loop for non-modifiable classes (JDK classes, primitive wrappers, etc.) - Disable SCA when telemetry or dependency collection is disabled (Agent.java): ScaReachabilityPeriodicAction is never registered in those cases, so pendingClassEvents would grow unbounded with no drain - Fix generateScaCvesJson up-to-date check: use upToDateWhen to force re-run when -PrefreshSca is set (onlyIf alone can be bypassed by Gradle's up-to-date check) - Add custom URL support to ScaEnrichmentsPlugin via -PscaEnrichmentsUrl - Add upper bound for ScaReachabilityDependencyRegistry (DD_APPSEC_SCA_MAX_TRACKED_DEPENDENCIES, default 1000) with warning log when cap is reached nit(sca): extract isCapExceeded helper, fix resetForTesting, use Strings.getClassName - Extract duplicated cap-exceeded guard into private isCapExceeded(String key) helper, called from both registerCve and recordHit - Add periodicWorkCallback = null to resetForTesting() to prevent test state leakage - Replace inline replace('/', '.') with Strings.getClassName for consistency with processClass test(sca): add registry unit tests; add @VisibleForTesting to addKnownDepForTesting - ScaReachabilityDependencyRegistryTest: add registerCve, drain semantics, cap enforcement (two tests: new keys rejected, existing keys still updated), and resetForTesting clears periodicWorkCallback - ScaReachabilityPeriodicAction: annotate addKnownDepForTesting with @VisibleForTesting fix(sca): prevent metadata:[] overwrite when JAR resolves after CVE was drained When DependencyService resolves a JAR in heartbeat N+1 but the CVE was already drained in heartbeat N (pendingReport=false), Step 2 previously emitted metadata:[] for that dep. If the backend uses last-write-wins semantics, this would silently erase the CVE state reported in HB N. Fix: add peekSnapshot() to ScaReachabilityDependencyRegistry (reads CVE state without clearing pendingReport), and call it in Step 2 when the JAR is newly resolved but no pending snapshot exists for that dep. Class-level-only CVEs are permanently affected (no method hit to re-trigger pendingReport), so the fix is necessary for correctness. Adds regression test: cveRegisteredBeforeDepResolved_step2PeeksCveStateInLaterHeartbeat nit(sca): remove dead isEmpty() guards and add @VisibleForTesting to resetForTesting entriesForClass() always returns a non-empty list or null (never an empty list), so the || entries.isEmpty() checks in transform(), checkAlreadyLoadedClasses(), and processPendingClassEvents() are dead code that could mislead readers. Also annotate resetForTesting() with @VisibleForTesting, consistent with the rest of the project's convention for package-private test helpers. Merge branch 'master' into alejandro.gonzalez/sca-reachability nit(sca): add TODO for future trie consideration in ScaCveDatabase.entriesForClass Co-authored-by: devflow.devflow-routing-intake <devflow.devflow-routing-intake@kubernetes.us1.ddbuild.io>
gh-worker-dd-mergequeue-cf854d Bot
pushed a commit
that referenced
this pull request
Jul 1, 2026
Use cached span.kind ordinal in metrics producer; drop tag-map lookup
JFR profiling showed ~21% of producer CPU time spent in tag-map lookups
during ClientStatsAggregator.publish. One of those lookups -- span.kind --
is redundant because DDSpanContext already caches the kind as a byte
ordinal that resolves to a String via a small array.
- Add CoreSpan.getSpanKindString() with a default that falls back to the
tag map for non-DDSpan impls; DDSpan overrides to delegate to the
context's cached resolution.
- Hoist schema.names array out of the capturePeerTagValues loop.
- Avoid an unnecessary toString() in isSynthetic by declaring
SYNTHETICS_ORIGIN as String and using contentEquals.
Benchmark (ClientStatsAggregatorDDSpanBenchmark):
before: 2.410 us/op
after: 1.995 us/op (~17% improvement)
vs. master baseline (6.428 us/op): now ~3.2x faster.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Add client metrics pipeline design doc
Captures the producer/consumer split, the canonical-key trick that makes
cardinality-blocking actually save space, the once-per-trace peer-tag
schema sync, the role of each file in datadog.trace.common.metrics, and
the rationale behind the redesign from ConflatingMetricsAggregator.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Add unit tests for Hashtable and LongHashingUtils
LongHashingUtilsTest (14 cases):
- hashCodeX null sentinel + non-null pass-through
- all primitive hash() overloads match the boxed Java hashCodes
- hash(Object...) 2/3/4/5-arg overloads match the chained addToHash
formula they are documented to constant-fold to
- addToHash(long, primitive) overloads match the Object-version
- linear-accumulation invariant (31 * h + v) holds across a sequence
- iterable / deprecated int[] / deprecated Object[] variants match
chained addToHash
- intHash treats null as 0 (observable via hash(null, "x"))
HashtableTest (24 cases across 5 nested classes):
- D1: insert/get/remove/insertOrReplace/clear/forEach, in-place value
mutation, null-key handling, hash-collision chaining with disambig-
uating equals, remove-from-collided-chain leaves siblings intact
- D2: pair-key identity, remove(pair), insertOrReplace matches on
both parts, forEach
- Support: capacity rounds up to a power of two, bucketIndex stays
in range across a wide hash sample, clear nulls every slot
- BucketIterator: walks only matching-hash entries in a chain, throws
NoSuchElementException when exhausted
- MutatingBucketIterator: remove from head-of-chain unlinks, replace
swaps the entry while preserving chain, remove() without prior
next() throws IllegalStateException
Tests live in internal-api/src/test/java/datadog/trace/util and use the
already-present JUnit 5 setup.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Apply spotless formatting to Hashtable and LongHashingUtils
Bring the new util/ files in line with google-java-format
(tabs → spaces, line wrapping, javadoc list markup) so
spotlessCheck passes in CI.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Add JMH benchmarks for Hashtable.D1 and D2
Compares Hashtable.D1 and Hashtable.D2 against equivalent HashMap
usage for add, update, and iterate operations. Each benchmark thread
owns its own map (Scope.Thread), but @Threads(8) is used so the
allocation/GC pressure that Hashtable is designed to avoid surfaces
in the throughput numbers.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Add benchmark results to HashtableBenchmark header
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Address review feedback on Hashtable
- Guard Support.sizeFor against overflow and use Integer.highestOneBit;
reject capacities above 1 << 30 instead of looping forever.
- Add braces around single-statement while bodies in BucketIterator.
- Split HashtableBenchmark into HashtableD1Benchmark / HashtableD2Benchmark.
- Add regression tests for Support.sizeFor bounds.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Fix dropped argument in HashingUtils 5-arg Object hash
The 5-arg Object overload was forwarding only obj0..obj3 to the int
overload, silently dropping obj4. Also align LongHashingUtils.hash 3-arg
signature with its 2/4/5-arg siblings (int parameters) and strengthen
the 5-arg HashingUtilsTest to detect the missing-arg regression.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Address review feedback on Hashtable
- Split D1Tests and D2Tests into HashtableD1Test and HashtableD2Test;
extract shared test entry classes into HashtableTestEntries.
- Reduce visibility of LongHashingUtils.hash(int...) chaining overloads
to package-private; they are internal building blocks.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Drop reflection in iterator tests via package-private D1.buckets
The iterator tests need a populated Hashtable.Entry[] to drive
Support.bucketIterator / mutatingBucketIterator. Relaxing D1.buckets
from private to package-private lets the same-package tests read it
directly, removing the reflection helper.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Resize previousCounts for inbox-full health metric
The new reason:inbox_full reportIfChanged call advances countIndex to 51,
but previousCounts was still sized for 51 counters (max index 50), so the
metric never emitted and the resize warning fired every flush. Bump the
array to 52 and add a regression test that exercises the flush path.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Fold AggregateMetric into AggregateEntry
The label fields and the mutable counters/histograms are 1:1 with each
entry; carrying them on a separate object meant one extra allocation per
unique key plus an indirection on every hot-path update. Merging them
puts the counters directly on AggregateEntry, drops the entry.aggregate
hop, and consolidates ERROR_TAG / TOP_LEVEL_TAG onto the same class the
consumer uses to decode them.
AggregateTable.findOrInsert now returns AggregateEntry. Callers in
Aggregator and SerializingMetricWriter updated. Migrated
AggregateMetricTest.groovy to AggregateEntryTest.java per project policy.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Avoid capturing lambda in Aggregator.report
Add a context-passing forEach(T, BiConsumer) overload to AggregateTable,
mirroring TagMap's pattern. Aggregator.report now hands the writer in as
context to a static BiConsumer so no fresh Consumer is allocated each
report cycle.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Add context-passing forEach to Hashtable.D1 and D2
Mirrors the TagMap pattern: pairs the existing forEach(Consumer) with a
forEach(T context, BiConsumer<T, TEntry>) overload so callers can hand
side-band state to a non-capturing lambda and avoid the
fresh-Consumer-per-call allocation.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Move forEach loop body to Support helper
Factors the unchecked (TEntry) cast out of D1.forEach / D2.forEach (and
the BiConsumer variants) into Support.forEach(buckets, ...). The cast
now lives in one place, mirroring how Entry.next() handles it, and the
D1/D2 methods become one-liners. Downstream higher-arity tables built
on Support gain the same helper.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Merge branch 'dougqh/util-hashtable' into dougqh/optimize-metric-key
Delegate AggregateTable.forEach to Support.forEach
Now that Hashtable.Support exposes the parameterized forEach helpers,
AggregateTable's own forEach methods can drop their duplicated loop body
and the (AggregateEntry) cast.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Move bucket-head cast to Support.bucket helper
Adds Support.bucket(buckets, keyHash) which returns the bucket head
already cast to the caller's concrete entry type. D1.get and D2.get
now drop the raw-Entry intermediate variable and walk the chain via
Entry.next() directly. The unchecked cast lives in one place,
consistent with Entry.next() and Support.forEach.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Merge branch 'dougqh/util-hashtable' into dougqh/optimize-metric-key
Use Support.bucket and type chain walks as AggregateEntry
- findOrInsert: walks via Support.bucket(buckets, keyHash) instead of
Hashtable.Entry + intermediate cast; bucketIndex is only computed on
the miss path now.
- evictOneStale / expungeStaleAggregates: chain variables typed as
AggregateEntry from the head down, leveraging Entry.next()'s generic
inference, so the per-iteration getHitCount() checks drop their
(AggregateEntry) cast.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Drop d1_/d2_ prefix from per-table benchmark methods
Holdover from when both lived in a shared HashtableBenchmark; redundant
now that each lives in its own class.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Add DDAgentFeaturesDiscovery.peerTagsRevision()
Monotonically increases each time the discovered peerTags Set differs from
the previous one. Lets callers detect peer-tag config changes with a long
compare instead of a Set.equals (or leaning on Set-identity, which was an
implementation accident, not part of the public contract).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Move peer-tag schema cache from PeerTagSchema statics to ClientStatsAggregator
PeerTagSchema previously held its current schema + last-synced-Set in static
volatile fields with a synchronized rebuild. The "is it stale?" signal was
an identity check on the Set instance returned by features.peerTags() -- a
correct but indirect reading of a DDAgentFeaturesDiscovery invariant.
Replace that with:
- ClientStatsAggregator keeps its own (volatile PeerTagSchema, volatile long
cachedPeerTagsRevision) cache pair, rebuilt under synchronized when the
revision returned by features.peerTagsRevision() doesn't match.
- PeerTagSchema becomes a pure data holder: static factory PeerTagSchema.of,
the INTERNAL singleton, and an instance resetCardinalityHandlers(). The
static CURRENT, LAST_SYNCED_INPUT, and the synchronized rebuild block are
gone.
- Aggregator gains a Runnable onResetCardinality hook fired right after
AggregateEntry.resetCardinalityHandlers(). ClientStatsAggregator wires it
to reset its cached schema's handlers each report cycle.
- AggregateEntry.resetCardinalityHandlers() resets PeerTagSchema.INTERNAL
directly instead of the removed PeerTagSchema.resetAll().
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Add Hashtable.Support helpers: MAX_RATIO, insertHeadEntry, MutatingTableIterator
Three consumer-facing helpers that callers building higher-arity tables on
top of Hashtable.Support kept open-coding:
- MAX_RATIO_NUMERATOR / _DENOMINATOR: the 4/3 multiplier for sizing a
bucket array from a target working-set under a 75% load factor.
- insertHeadEntry(buckets, bucketIndex, entry): the (setNext + array-store)
pair for splicing a new entry at the head of a bucket chain.
- MutatingTableIterator + Support.mutatingTableIterator(buckets): walks
every entry in the table (not filtered by hash) with remove() support,
for sweeps like eviction and expunge that aren't keyed to a specific
hash. Sibling of MutatingBucketIterator.
Tests cover the table-wide iterator at head-of-bucket and mid-chain
removal, empty buckets between live entries, exhaustion, and
remove-without-next.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Merge branch 'dougqh/util-hashtable' into dougqh/optimize-metric-key
Simplify AggregateTable via new Hashtable.Support helpers
- Constructor sizing now uses Support.MAX_RATIO_NUMERATOR / _DENOMINATOR
instead of an open-coded * 4 / 3.
- findOrInsert delegates the chain-head splice to Support.insertHeadEntry.
- evictOneStale and expungeStaleAggregates both rewritten in terms of
Support.mutatingTableIterator. Drops the bespoke head-vs-mid-chain
branching that read as more complicated than the operation actually is.
Net -28 lines in AggregateTable.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Swap MAX_RATIO numerator/denominator pair for a single float + scaled create()
Replace Support.MAX_RATIO_NUMERATOR / _DENOMINATOR with a single float
MAX_RATIO constant, and add a Support.create(int, float) overload that
takes a scale factor. Callers now write Support.create(n, MAX_RATIO)
instead of stitching together the int arithmetic at the call site.
The scaled size is truncated (not ceiled) before going through sizeFor.
sizeFor already rounds up to the next power of two, so truncation just
absorbs float fuzz that would otherwise push a result like 12 * 4/3 =
16.0000005f past 16 and double the bucket array size for no reason.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Merge branch 'dougqh/util-hashtable' into dougqh/optimize-metric-key
Address second-round review on AggregateTable / Aggregator
- AggregateTable: switch to Support.create(maxAggregates, Support.MAX_RATIO)
now that the load-factor scaling is a Support concern.
- AggregateTable: replace open-coded "keyHash == X && matches(s)" with a
new AggregateEntry.matches(long keyHash, SpanSnapshot) overload that
bundles the hash gate.
- AggregateTable: rename local iterator var "it" -> "iter".
- Aggregator: drop WRITE_AND_CLEAR static field, inline as a non-capturing
lambda; the JIT reuses non-capturing lambdas, no need for the static
until a profile says otherwise.
- Aggregator: comment the ClearSignal branch with the thread-safety
rationale (single-writer invariant for AggregateTable).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Tighten Hashtable docs + rename MAX_CAPACITY to MAX_BUCKETS
Five small cleanups from a design re-review pass:
1. Support javadoc: drop the stale "methods are package-private" sentence;
most of them were made public in earlier commits for higher-arity
callers. Also drop the "nested BucketIterator" framing (iterators are
peers of Support inside Hashtable, not nested inside Support).
2. MAX_RATIO javadoc: drop the Math.ceil recommendation; create(int, float)
deliberately truncates and is the canonical pathway.
3. Document the null-hash treatment on D1.Entry.hash and D2.Entry.hash so
the behavior difference is explicit: D1 uses Long.MIN_VALUE as a
sentinel that's collision-free against any int-valued hashCode(); D2
has no such sentinel and relies on matches() to resolve null/null vs
hash-0 collisions.
4. Rename Support.MAX_CAPACITY -> MAX_BUCKETS and sizeFor's parameter to
requestedSize. The cap is on the bucket-array length, not entry count;
the new name reflects that. Error messages updated to match.
5. Drop the `abstract` modifier on Hashtable in favor of `final` with a
private constructor. Nothing actually subclasses Hashtable -- the
abstract was a namespace device that read as "intended for extension."
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Dedupe chain-head splice in D1/D2 via keyHash insertHeadEntry overload
- Add Support.insertHeadEntry(buckets, long keyHash, entry) overload that
derives the bucket index itself. Callers that already have a hash but
not the index (the common case) now avoid the redundant bucketIndex(...)
hop.
- D1.insert, D1.insertOrReplace, D2.insert, D2.insertOrReplace: use the
new overload, drop the (thisBuckets local, bucketIndex compute,
setNext, store) sequence at each call site.
- D2.buckets: drop the `private` modifier to match D1.buckets. Both are
package-private so iterator tests in the same package can drive
Support.bucketIterator against the table's bucket array. Added a short
comment on both fields documenting the rationale.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Tighten Entry.next encapsulation; doc hasNext; add D1/D2 getOrCreate
Three follow-ups from the design review:
- Make Hashtable.Entry.next private. All same-package readers
(BucketIterator) already had a next() accessor; the leftover direct
field reads now route through it. Closes the "mixed encapsulation"
gap where some readers used the accessor and same-package ones
reached for the field.
- BucketIterator and MutatingBucketIterator now document that chain-walk
work happens in next() (and the constructor for the first match);
hasNext() is an O(1) field read.
- Add D1.getOrCreate(K, Function) and D2.getOrCreate(K1, K2, BiFunction).
Both reuse the lookup hash for the insert on miss, avoiding the
double-hash that "get; if null then insert" callers would otherwise
pay.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Merge branch 'dougqh/util-hashtable' into dougqh/optimize-metric-key
Use keyHash insertHeadEntry overload in AggregateTable.findOrInsert
Picks up the Support.insertHeadEntry(buckets, long keyHash, entry)
overload added on the util-hashtable branch; saves the redundant
Support.bucketIndex(buckets, keyHash) hop at the call site.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Merge dougqh/optimize-metric-key into dougqh/control-tag-cardinality
Pulls in the util-hashtable design pass (MAX_RATIO, insertHeadEntry
keyHash overload, MutatingTableIterator, Entry.next privatization,
getOrCreate, MAX_BUCKETS rename, doc tightening, etc.) and the
AggregateTable simplifications that came with it.
Reconciliation notes:
- AggregateEntry / AggregateMetric: this branch keeps the split design
(AggregateEntry holds labels + an AggregateMetric counter ref). The
optimize-metric-key branch had collapsed them into a single
AggregateEntry. Resolution: keep the split (HEAD's design) -- it's
more recent and supports the cardinality-canonicalization layer.
- AggregateEntryTest.java (new in optimize-metric-key, exercises the
collapsed design) deleted; AggregateMetricTest.groovy stays as the
counter-side coverage for the split design.
- AggregateTable: apply the optimize-metric-key cleanups on top of the
Canonical-pattern findOrInsert -- Support.create(n, MAX_RATIO),
Support.bucket for the chain head, Support.insertHeadEntry(keyHash),
Support.mutatingTableIterator for evictOneStale and
expungeStaleAggregates, Support.forEach for forEach. Also add the
context-passing forEach overload to match the BiConsumer the
Aggregator already uses.
- Aggregator.report: keep the BiConsumer + context lambda
(non-capturing); body adapted to entry.aggregate.clear() for the
split design.
- Aggregator.Drainer: keep AggregateMetric as the findOrInsert return
type (matches the table's actual signature).
- SerializingMetricWriter, SerializingMetricWriterTest,
ClientStatsAggregatorTest, AggregateTableTest, SpanSnapshot,
MetricWriter: restore HEAD's versions where the auto-merge had taken
the optimize-metric-key shape (counters via entry.* vs
entry.aggregate.*) -- HEAD's versions match this branch's design.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Fold AggregateMetric into AggregateEntry
Adopts the optimize-metric-key design choice: one entry type that holds
both the canonical label fields and the counter / histogram state. The
prior split (AggregateMetric for counters, AggregateEntry for labels)
required every counter read to hop through entry.aggregate -- ~30 sites
across SerializingMetricWriter, the Aggregator, and the test suites.
- AggregateEntry now owns ERROR_TAG, TOP_LEVEL_TAG, the okLatencies and
errorLatencies histograms, hitCount/errorCount/topLevelCount/duration
counters, and the recordOneDuration / recordDurations / clear methods
that used to live on AggregateMetric.
- AggregateMetric.java and AggregateMetricTest.groovy deleted.
- AggregateTable.findOrInsert now returns AggregateEntry (not the inner
AggregateMetric); Canonical.toEntry no longer takes an AggregateMetric
arg.
- Aggregator.Drainer reverts to AggregateEntry; the report lambda calls
entry.clear() directly.
- SerializingMetricWriter, ClientStatsAggregator imports, and all three
test files updated to read counters from entry.* (not entry.aggregate.*).
- AggregateEntryTest.java added with the recordOneDuration /
recordDurations / clear coverage that AggregateMetricTest.groovy used
to provide.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Remove accidentally-staged .claude/worktrees entries
Replace // nullable comments with @Nullable annotations on AggregateEntry
Use javax.annotation.Nullable (the codebase's convention -- see DDSpan,
TagInterceptor, ScopeContext, etc.) on the four nullable label fields
(serviceSource, httpMethod, httpEndpoint, grpcStatusCode), their
getters, and the corresponding parameters of AggregateEntry.of.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Drop redundant load-factor comment from AggregateTable ctor
Support.MAX_RATIO and the scaled create(int, float) overload already
convey the 75% load-factor intent at the call site -- the inline
comment was duplicating their self-documentation.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Import java.util.Objects in AggregateEntry instead of fully qualifying
Style nit -- the equals() method had eight fully-qualified references
to java.util.Objects.equals; add the import and drop the qualifier.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Document evictOneStale cost and disable() best-effort offer
Two design-review trade-offs that won't change in this PR but should be
explicit at the call sites:
- AggregateTable.evictOneStale: O(N) scan per call (vs LRUCache's O(1)),
acceptable because the new policy drops the *new* key on cap-overrun
rather than evicting an established one -- so eviction is expected to
be rare. Cursor-caching is the future optimization if a workload runs
persistently at cap.
- ConflatingMetricsAggregator.disable: single inbox.offer(CLEAR) is
best-effort. If the inbox is full the clear is dropped, but the
system self-heals (supportsMetrics() is already false, the next
report-sink-rejection retries disable). Worst case is one extra cycle
of stale data, not a leak.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Skip SpanSnapshot allocation when the inbox is already at capacity
publish() previously did all of the tag extraction (peer-tag pairs,
HTTP method/endpoint, span kind, gRPC status) and the SpanSnapshot
allocation before calling inbox.offer; on a full inbox the offer
failed and everything became garbage.
Early-out with an approximate size() vs capacity() check up front. The
jctools MPSC queue's size() is best-effort but that's fine: under-
estimation falls through to the existing offer-as-source-of-truth
path, over-estimation drops a snapshot that would have fit (and
onStatsInboxFull was about to fire on the next span anyway).
error is computed first so the force-keep return is correct whether
or not the snapshot is built.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Address review on AggregateEntry nullables + PeerTagSchema revision
- Replace `// nullable` comments on AggregateEntry's 4 nullable label
fields (entry + Canonical scratch buffer) with `@Nullable`
annotations. Also annotate the matching getters and of(...) factory
parameters.
- Move the cache revision into PeerTagSchema as a final field
(peerTagsRevision), built via PeerTagSchema.of(names, revision). One
field on the schema carries the cache key, so the hot path is a
single volatile read + long compare against schema.peerTagsRevision
-- no separate cachedPeerTagsRevision field on ClientStatsAggregator.
When peer tags are unconfigured the cache stores an empty schema
(size 0) carrying the revision rather than null, so subsequent
publishes still short-circuit on the fast path. peerTagSchemaFor
treats `schema.size() == 0` as "skip peer-agg processing" for
client/producer/consumer kinds.
INTERNAL is built with a -1L sentinel revision.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Consolidate cardinality-handler reset behind one entry point
Reset was split between two owners: Aggregator.report called
AggregateEntry.resetCardinalityHandlers (static handlers + INTERNAL)
then ran a separate onResetCardinality callback that ClientStats
wired up to reset its cached non-INTERNAL peer-agg schema. Anyone
adding a new handler had to know which side to put it on.
Make the callback the only entry point. ClientStatsAggregator.
resetCardinalityHandlers (renamed from resetCachedPeerAggSchema) now
calls AggregateEntry.resetCardinalityHandlers() itself plus the
cached peer-agg schema reset. Aggregator.report just runs the
callback -- it no longer knows about AggregateEntry's static state.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Parameterize PropertyCardinalityHandler on T extends CharSequence
Each handler is now typed to its SpanSnapshot field type, so the
HashMap's key class has well-defined equals/hashCode rather than the
abstract CharSequence interface. For String-typed fields (service,
spanKind, httpMethod, httpEndpoint, grpcStatusCode) the cache hits
reliably. For CharSequence-typed fields (resource, operationName,
serviceSource, type) consistency still depends on the producer
returning a single concrete class per field -- a pre-existing runtime
contract -- but the type system now prevents call sites from
accidentally passing a different shape.
registerOrEmpty is now generic so it threads T through.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Add long-lived LRU cache to PropertyCardinalityHandler
Previously, reset() cleared the only cache, so every reporting cycle
re-allocated UTF8BytesString instances for every property value seen
again. Sustained allocations on the aggregator thread proportional to
the sum of per-field cardinality limits, ~bytes/sec, on every reset.
Split the state in two:
- seenThisCycle (HashSet<T>): consumed-budget tracking, cleared on
reset().
- utf8Cache (LinkedHashMap in access-order, 2x cardinalityLimit):
long-lived; survives reset; LRU eviction once full.
Workloads with stable value sets pay zero UTF8 allocations after the
first cycle. The reused instances also short-circuit downstream equals
to identity comparisons.
Drops the TODO at the prior allocation site.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Centralize per-field cardinality limits in MetricCardinalityLimits
The 9 property limits and the peer-tag value limit were sprinkled
inline. Pull them into a single class with per-field javadoc so the
sizing rationale lives in one place. Six values change from the
DDCache-inherited defaults based on workload analysis:
- RESOURCE 32 -> 128 (highest-cardinality field; tight today)
- HTTP_ENDPOINT 32 -> 64 (same shape as RESOURCE for HTTP-heavy)
- TYPE 8 -> 16 (DDSpanTypes catalogue is ~30)
- HTTP_METHOD 8 -> 16 (WebDAV/custom verbs push past 8)
- SPAN_KIND 16 -> 8 (OTel defines exactly 5 standard kinds)
- GRPC_STATUS 32 -> 24 (gRPC spec has exactly 17 codes)
SERVICE, OPERATION, SERVICE_SOURCE, and PEER_TAG_VALUE keep their
current values. Net worst-case memory delta: roughly +90 KB driven by
the RESOURCE and HTTP_ENDPOINT bumps.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Reimplement cardinality handlers as open-addressed flat arrays
Replaces the previous LinkedHashMap-based design for PropertyCardinality
Handler (and the HashMap-based TagCardinalityHandler) with parallel
Object[] / UTF8BytesString[] arrays and linear-probing open addressing.
Two tables per handler, "current cycle" and "prior cycle":
- Capacity is the next power of two >= 2 * cardinalityLimit, so the
linear-probing load factor stays <= 0.5 even when the budget is full.
- Current tracks values that have consumed a slot of the cardinality
budget this cycle.
- Prior holds the just-completed cycle's entries verbatim. A first-time-
this-cycle value that hits in prior reuses its UTF8BytesString
instance -- no re-allocation. Implements the cross-reset reuse that
the prior commit's LinkedHashMap LRU provided, with less overhead.
Reset swaps the table pointers (just-completed cycle -> prior; the
2-cycles-ago tables get nulled out and become the new empty current).
One O(capacity) pass, half the work of a copy-then-null.
Wins:
- No per-entry Node allocations (HashMap / LinkedHashMap) and no
access-order linked-list maintenance per get.
- Smaller working set: two Object[] + two UTF8BytesString[] per handler
vs HashMap + HashSet + LinkedHashMap heap shapes.
- Stable workloads pay zero UTF8BytesString allocations after the first
cycle and produce identical references across cycles, so downstream
equals short-circuits to ==.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Drop parallel keys array in PropertyCardinalityHandler
The stored UTF8BytesString can serve as the slot's identity on its own:
its hashCode() returns the underlying String.hashCode (content-stable
with whatever shape the input takes), and equality is checked via
stored.toString().contentEquals(value) -- the JDK's content-equality
routine that fast-paths to String.equals when the input is a String.
Halves the per-handler array footprint: one UTF8BytesString[] per
cycle (current + prior) instead of one Object[] + one
UTF8BytesString[] per cycle. No behavior change.
TagCardinalityHandler keeps the parallel-arrays shape because its
stored UTF8 is "tag:value" and cannot be compared directly against the
bare incoming value.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Drop type parameter from PropertyCardinalityHandler
The type parameter was load-bearing when slot identity went through a
parallel Object[] keys array (where T determined the runtime class
whose equals/hashCode the HashMap used). The single-array shape probes
via UTF8BytesString.hashCode() (content-stable with the underlying
String) and stored.toString().contentEquals(value), so any CharSequence
input -- String, UTF8BytesString, anything else with a content-stable
hash -- collapses to the right slot.
register(CharSequence value) is enough. AggregateEntry's 9 static
handler declarations and the registerOrEmpty helper lose their type
parameters too.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Guard cardinality-handler ctor against pathological inputs
- Both handlers now reject cardinalityLimit > 2^29 to prevent overflow
in the (cardinalityLimit * 2 - 1) capacity calc. Practical limits are
8..512 so this is well beyond any realistic configuration.
- TagCardinalityHandler's keys array is now String[] (was Object[]) to
match the actual contract -- minor clarity win.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Make EMPTY the universal absent sentinel for AggregateEntry UTF8 fields
PropertyCardinalityHandler.register(null) now returns UTF8BytesString
.EMPTY. All AggregateEntry UTF8 fields are non-null. Callers stop
checking for null at every site.
- AggregateEntry: drop @Nullable on serviceSource/httpMethod/
httpEndpoint/grpcStatusCode (both the entry fields and the
Canonical scratch buffer). Drop @Nullable on getters and on the of
factory parameters. Drop the unused registerOrEmpty helper.
- Canonical.populate: each field is now this.field = HANDLER.register
(s.field) -- no inline conditionals.
- of() factory: drop the value == null ? null : createUtf8(value)
pattern; createUtf8 already returns EMPTY on null.
- SerializingMetricWriter: switch the four presence checks from !=
null to != EMPTY (identity comparison on the singleton).
Net win: nine identically-shaped call sites in Canonical.populate
and a smaller null surface across the package.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Use EMPTY consistently for absent values in peer-tag canonicalization
- TagCardinalityHandler.register now mirrors PropertyCardinalityHandler:
null input returns UTF8BytesString.EMPTY.
- Canonical.populatePeerTags now calls register for every schema slot
and tests the result against EMPTY rather than the input against null.
The wire-format buffer still holds only present peer tags (EMPTY is
elided), but the check is now consistent with how AggregateEntry's
scalar UTF8 fields handle absence.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Tighten handler visibility + add tests for EMPTY-on-null contract
#4: PropertyCardinalityHandler and TagCardinalityHandler are only
consumed within this package; drop `public` from the class declarations,
constructors, and methods. They're package-private now.
#6: Add tests that lock down the EMPTY-on-null contract that the rest
of the package depends on:
- CardinalityHandlerTest covers both handlers: register(null) -> EMPTY,
and registering null repeatedly doesn't consume the cardinality
budget.
- AggregateEntryTest covers the entry shape: optional fields built
from a snapshot with null inputs resolve to EMPTY; populated
optional fields carry their value.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Notify on peer-tag cardinality blocks
Adds a per-cycle one-shot warn log + HealthMetrics counter
(`stats.tag_cardinality_blocked` with `tag:<name>`) when a peer-tag
value gets collapsed to the `blocked_by_tracer` sentinel because its
cardinality budget is exhausted. Implemented as a `register(int i,
String value)` method on `PeerTagSchema` that does the post-block
notification work; `TagCardinalityHandler` exposes `blockedSentinel()`
so the schema can identity-compare and stays free of logger / health
metric coupling. Warn-once gating uses a `Set<String>` of names seen
this cycle, cleared by `resetCardinalityHandlers()`.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Address PR #11387 review: dual-role docs, rename, @Nullable, consumer-side reconcile
- PropertyCardinalityHandler / TagCardinalityHandler: header comment
explaining the limiter-and-cache dual role and the prior-cycle reuse
trick that preserves UTF8 caching across resets.
- ClientStatsAggregator: rename peerAggSchema -> peerTagSchema across
field, method, and parameter; disambiguate the inner per-span local as
spanPeerTagSchema (return of peerTagSchemaFor).
- SpanSnapshot: replace prose "or null" docstrings with
javax.annotation.@Nullable on peerTagSchema/peerTagValues fields and
their constructor params.
- Consumer-side peer-tag reconciliation:
* DDAgentFeaturesDiscovery: drop State.peerTagsRevision + bump logic +
peerTagsRevision() accessor. Expose getLastTimeDiscovered().
* PeerTagSchema: rename peerTagsRevision -> lastTimeDiscovered, drop
final (consumer-thread-only mutation), add hasSameTagsAs(Set).
* ClientStatsAggregator: producer hot path is now a single volatile
read with a one-time synchronized bootstrap; resetCardinalityHandlers
runs reconcilePeerTagSchema first, which fast-paths on timestamp
equality and either bumps in place (preserving warm handlers when
the tag set is unchanged) or swaps in a fresh schema. The schema's
timestamp field no longer needs to be volatile because mutation is
confined to the aggregator thread.
Note: the @Nullable annotations on AggregateEntry's errorLatencies and
related fields only apply after the downstream lazy-init / Canonical
buffer work; those land in a separate commit on the downstream branches.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Lock in cardinality-handler prior-cycle UTF8 reuse with explicit tests
Addresses PR #11387 review (test coverage gap):
- Fix misleading comment in propertyResetRefreshesBudget ("the previous
instances aren't reused") -- they ARE reused; the test only passed
because it asserted on .toString() content rather than identity.
- Add propertyPriorCycleInstancesAreReusedAcrossReset: explicit
assertSame check that registering the same value after a reset returns
the SAME UTF8BytesString instance from the prior cycle. This is the
"dual role as cache" property the canonical-key lookup depends on.
- Add propertyPriorCycleReuseSurvivesOneResetButNotTwo: nails down the
reuse window depth (one cycle, not two).
- Add tagPriorCycleInstancesAreReusedAcrossReset mirroring the property
handler test for the tag handler (cached "tag:value" UTF8BytesString).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Hashtable: add missing braces and detach removed/replaced entries
Addresses PR #11409 review comments:
- #3267164119 / #3267165525: wrap every single-line if/break body in
braces (7 sites across BucketIterator, MutatingBucketIterator, and the
full-table Iterator).
- #3275947761 / #3275948108 (sarahchen6): null out the removed/replaced
entry's next pointer after splicing it out of the chain in
MutatingBucketIterator.remove / .replace. Applied the same fix to the
full-table Iterator.remove for consistency.
Rationale: detaching prevents accidental traversal through a removed
entry via a stale reference and lets the GC reclaim a chain tail that
the removed entry was the last referrer to.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Merge branch 'dougqh/util-hashtable' into dougqh/optimize-metric-key
Add Hashtable and LongHashingUtils to datadog.trace.util
Two general-purpose utilities used by the client-side stats aggregator
work (PR #11382 and follow-ups), extracted into their own change so the
metrics-specific PRs can build on a smaller, reviewable foundation.
- Hashtable: a generic open-addressed-ish bucket table abstraction
keyed by a 64-bit hash, with a public abstract Entry type so client
code can subclass it for higher-arity keys. The metrics aggregator
uses it to back its AggregateTable.
- LongHashingUtils: chained 64-bit hash combiners with primitive
overloads (boolean, short, int, long, Object). Used in place of
varargs combiners to avoid Object[] allocation and boxing on the
hot path.
No callers within internal-api itself yet -- the metrics aggregator PR
will introduce the first usages.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Add unit tests for Hashtable and LongHashingUtils
LongHashingUtilsTest (14 cases):
- hashCodeX null sentinel + non-null pass-through
- all primitive hash() overloads match the boxed Java hashCodes
- hash(Object...) 2/3/4/5-arg overloads match the chained addToHash
formula they are documented to constant-fold to
- addToHash(long, primitive) overloads match the Object-version
- linear-accumulation invariant (31 * h + v) holds across a sequence
- iterable / deprecated int[] / deprecated Object[] variants match
chained addToHash
- intHash treats null as 0 (observable via hash(null, "x"))
HashtableTest (24 cases across 5 nested classes):
- D1: insert/get/remove/insertOrReplace/clear/forEach, in-place value
mutation, null-key handling, hash-collision chaining with disambig-
uating equals, remove-from-collided-chain leaves siblings intact
- D2: pair-key identity, remove(pair), insertOrReplace matches on
both parts, forEach
- Support: capacity rounds up to a power of two, bucketIndex stays
in range across a wide hash sample, clear nulls every slot
- BucketIterator: walks only matching-hash entries in a chain, throws
NoSuchElementException when exhausted
- MutatingBucketIterator: remove from head-of-chain unlinks, replace
swaps the entry while preserving chain, remove() without prior
next() throws IllegalStateException
Tests live in internal-api/src/test/java/datadog/trace/util and use the
already-present JUnit 5 setup.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Apply spotless formatting to Hashtable and LongHashingUtils
Bring the new util/ files in line with google-java-format
(tabs → spaces, line wrapping, javadoc list markup) so
spotlessCheck passes in CI.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Add JMH benchmarks for Hashtable.D1 and D2
Compares Hashtable.D1 and Hashtable.D2 against equivalent HashMap
usage for add, update, and iterate operations. Each benchmark thread
owns its own map (Scope.Thread), but @Threads(8) is used so the
allocation/GC pressure that Hashtable is designed to avoid surfaces
in the throughput numbers.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Add benchmark results to HashtableBenchmark header
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Address review feedback on Hashtable
- Guard Support.sizeFor against overflow and use Integer.highestOneBit;
reject capacities above 1 << 30 instead of looping forever.
- Add braces around single-statement while bodies in BucketIterator.
- Split HashtableBenchmark into HashtableD1Benchmark / HashtableD2Benchmark.
- Add regression tests for Support.sizeFor bounds.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Fix dropped argument in HashingUtils 5-arg Object hash
The 5-arg Object overload was forwarding only obj0..obj3 to the int
overload, silently dropping obj4. Also align LongHashingUtils.hash 3-arg
signature with its 2/4/5-arg siblings (int parameters) and strengthen
the 5-arg HashingUtilsTest to detect the missing-arg regression.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Address review feedback on Hashtable
- Split D1Tests and D2Tests into HashtableD1Test and HashtableD2Test;
extract shared test entry classes into HashtableTestEntries.
- Reduce visibility of LongHashingUtils.hash(int...) chaining overloads
to package-private; they are internal building blocks.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Drop reflection in iterator tests via package-private D1.buckets
The iterator tests need a populated Hashtable.Entry[] to drive
Support.bucketIterator / mutatingBucketIterator. Relaxing D1.buckets
from private to package-private lets the same-package tests read it
directly, removing the reflection helper.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Add context-passing forEach to Hashtable.D1 and D2
Mirrors the TagMap pattern: pairs the existing forEach(Consumer) with a
forEach(T context, BiConsumer<T, TEntry>) overload so callers can hand
side-band state to a non-capturing lambda and avoid the
fresh-Consumer-per-call allocation.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Move forEach loop body to Support helper
Factors the unchecked (TEntry) cast out of D1.forEach / D2.forEach (and
the BiConsumer variants) into Support.forEach(buckets, ...). The cast
now lives in one place, mirroring how Entry.next() handles it, and the
D1/D2 methods become one-liners. Downstream higher-arity tables built
on Support gain the same helper.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Move bucket-head cast to Support.bucket helper
Adds Support.bucket(buckets, keyHash) which returns the bucket head
already cast to the caller's concrete entry type. D1.get and D2.get
now drop the raw-Entry intermediate variable and walk the chain via
Entry.next() directly. The unchecked cast lives in one place,
consistent with Entry.next() and Support.forEach.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Drop d1_/d2_ prefix from per-table benchmark methods
Holdover from when both lived in a shared HashtableBenchmark; redundant
now that each lives in its own class.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Add Hashtable.Support helpers: MAX_RATIO, insertHeadEntry, MutatingTableIterator
Three consumer-facing helpers that callers building higher-arity tables on
top of Hashtable.Support kept open-coding:
- MAX_RATIO_NUMERATOR / _DENOMINATOR: the 4/3 multiplier for sizing a
bucket array from a target working-set under a 75% load factor.
- insertHeadEntry(buckets, bucketIndex, entry): the (setNext + array-store)
pair for splicing a new entry at the head of a bucket chain.
- MutatingTableIterator + Support.mutatingTableIterator(buckets): walks
every entry in the table (not filtered by hash) with remove() support,
for sweeps like eviction and expunge that aren't keyed to a specific
hash. Sibling of MutatingBucketIterator.
Tests cover the table-wide iterator at head-of-bucket and mid-chain
removal, empty buckets between live entries, exhaustion, and
remove-without-next.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Swap MAX_RATIO numerator/denominator pair for a single float + scaled create()
Replace Support.MAX_RATIO_NUMERATOR / _DENOMINATOR with a single float
MAX_RATIO constant, and add a Support.create(int, float) overload that
takes a scale factor. Callers now write Support.create(n, MAX_RATIO)
instead of stitching together the int arithmetic at the call site.
The scaled size is truncated (not ceiled) before going through sizeFor.
sizeFor already rounds up to the next power of two, so truncation just
absorbs float fuzz that would otherwise push a result like 12 * 4/3 =
16.0000005f past 16 and double the bucket array size for no reason.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Tighten Hashtable docs + rename MAX_CAPACITY to MAX_BUCKETS
Five small cleanups from a design re-review pass:
1. Support javadoc: drop the stale "methods are package-private" sentence;
most of them were made public in earlier commits for higher-arity
callers. Also drop the "nested BucketIterator" framing (iterators are
peers of Support inside Hashtable, not nested inside Support).
2. MAX_RATIO javadoc: drop the Math.ceil recommendation; create(int, float)
deliberately truncates and is the canonical pathway.
3. Document the null-hash treatment on D1.Entry.hash and D2.Entry.hash so
the behavior difference is explicit: D1 uses Long.MIN_VALUE as a
sentinel that's collision-free against any int-valued hashCode(); D2
has no such sentinel and relies on matches() to resolve null/null vs
hash-0 collisions.
4. Rename Support.MAX_CAPACITY -> MAX_BUCKETS and sizeFor's parameter to
requestedSize. The cap is on the bucket-array length, not entry count;
the new name reflects that. Error messages updated to match.
5. Drop the `abstract` modifier on Hashtable in favor of `final` with a
private constructor. Nothing actually subclasses Hashtable -- the
abstract was a namespace device that read as "intended for extension."
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Dedupe chain-head splice in D1/D2 via keyHash insertHeadEntry overload
- Add Support.insertHeadEntry(buckets, long keyHash, entry) overload that
derives the bucket index itself. Callers that already have a hash but
not the index (the common case) now avoid the redundant bucketIndex(...)
hop.
- D1.insert, D1.insertOrReplace, D2.insert, D2.insertOrReplace: use the
new overload, drop the (thisBuckets local, bucketIndex compute,
setNext, store) sequence at each call site.
- D2.buckets: drop the `private` modifier to match D1.buckets. Both are
package-private so iterator tests in the same package can drive
Support.bucketIterator against the table's bucket array. Added a short
comment on both fields documenting the rationale.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Tighten Entry.next encapsulation; doc hasNext; add D1/D2 getOrCreate
Three follow-ups from the design review:
- Make Hashtable.Entry.next private. All same-package readers
(BucketIterator) already had a next() accessor; the leftover direct
field reads now route through it. Closes the "mixed encapsulation"
gap where some readers used the accessor and same-package ones
reached for the field.
- BucketIterator and MutatingBucketIterator now document that chain-walk
work happens in next() (and the constructor for the first match);
hasNext() is an O(1) field read.
- Add D1.getOrCreate(K, Function) and D2.getOrCreate(K1, K2, BiFunction).
Both reuse the lookup hash for the insert on miss, avoiding the
double-hash that "get; if null then insert" callers would otherwise
pay.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Hashtable: add missing braces and detach removed/replaced entries
Addresses PR #11409 review comments:
- #3267164119 / #3267165525: wrap every single-line if/break body in
braces (7 sites across BucketIterator, MutatingBucketIterator, and the
full-table Iterator).
- #3275947761 / #3275948108 (sarahchen6): null out the removed/replaced
entry's next pointer after splicing it out of the chain in
MutatingBucketIterator.remove / .replace. Applied the same fix to the
full-table Iterator.remove for consistency.
Rationale: detaching prevents accidental traversal through a removed
entry via a stale reference and lets the GC reclaim a chain tail that
the removed entry was the last referrer to.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Rename LongHashingUtils.hashCodeX(Object) to hash(Object) for API consistency
Addresses PR #11409 review comment #3276167001. The method parallels the
primitive hash(boolean) / hash(int) / hash(long) / ... family, so naming
it hash(Object) -- with null collapsing to Long.MIN_VALUE as a sentinel
distinct from any real hashCode -- matches the rest of the public surface.
Test call sites that pass a literal null now disambiguate against
hash(int[]) / hash(Object[]) / hash(Iterable) via an (Object) cast.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Merge remote-tracking branch 'origin/master' into dougqh/optimize-metric-key
Merge branch 'dougqh/util-hashtable' into dougqh/optimize-metric-key
Merge branch 'dougqh/optimize-metric-key' into dougqh/control-tag-cardinality
Merge remote-tracking branch 'origin/master' into dougqh/conflating-metrics-background-work
Introduce slim PeerTagSchema; capture peer-tag values not pairs
Addresses sarahchen6's review comment on ConflatingMetricsAggregator
extractPeerTagPairs: replaces the worst-case-allocation + trim-and-copy
flat-pairs layout with a parallel-array carrier.
- New PeerTagSchema: minimal carrier of String[] names. Two flavors -- a
static INTERNAL singleton (one entry: base.service) for internal-kind
spans, and per-discovery built schemas for client/producer/consumer
spans. Deliberately no cardinality limiters or per-cycle state; that
layers on top in a later PR.
- ConflatingMetricsAggregator: caches the peer-aggregation schema keyed
on reference equality of features.peerTags() -- a single volatile read
+ a long compare on the steady-state producer hot path, no allocation.
The producer now captures only a String[] of values parallel to the
schema's names; the schema reference is carried on SpanSnapshot. The
prior "build worst-case pairs then trim" code is gone.
- SpanSnapshot: replaces String[] peerTagPairs with PeerTagSchema +
String[] peerTagValues. Producer drops the schema reference if no
values fired so the consumer short-circuits on null.
- Aggregator.materializePeerTags: now reads name/value pairs at the same
index from (schema.names, snapshot.peerTagValues). Counts hits once
for exact-size allocation; preserves the singletonList fast path for
the common one-entry case (e.g. internal-kind base.service).
Producer-side cost goes from "allocate String[2n] + walk + maybe trim"
to "single volatile read + walk + lazy String[n] only on first hit".
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Merge branch 'dougqh/conflating-metrics-background-work' into dougqh/optimize-metric-key
Merge branch 'dougqh/optimize-metric-key' into dougqh/control-tag-cardinality
Address PR #11381 review (round 2)
- Aggregator.materializePeerTags: fold the firstHit-discovery nested if
into a single guarded post-increment (amarziali, #3279243138). One
body line: `if (values[i] != null && hitCount++ == 0) firstHit = i;`.
- Drop redundant isKind(SpanKindFilter) overrides in both
TraceGenerator.groovy files (amarziali, #3279264553 / #3279382648).
CoreSpan.java:84 already supplies a default implementation that reads
the same span.kind tag.
- Bump TRACER_METRICS_MAX_PENDING default from 2048 -> 131072 to address
the capacity regression amarziali flagged (#3279378375). Without
producer-side conflation, the inbox now holds 1 SpanSnapshot per
metrics-eligible span instead of 1 conflated Batch per ~64 spans;
restoring effective capacity parity (~2048 * ~64 = 131072) prevents a
~64x rise in inbox-full drops at the same span rate. ~100 B per
SpanSnapshot puts the worst-case heap floor at ~13 MB -- bounded.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Cover inbox-full fast-path in ConflatingMetricsAggregator.publish
Addresses PR #11381 review (amarziali, #3279325340 -- "Are the existing
tests covering this case?").
New ConflatingMetricsAggregatorInboxFullTest constructs the aggregator
with a small inbox (queueSize=8), deliberately does NOT call start() so
the consumer thread never drains, then publishes enough spans to
overflow the inbox. Verifies that healthMetrics.onStatsInboxFull() is
called at least once -- the fast-path's `inbox.size() >= inbox.capacity()`
short-circuit triggers when the producer-side queue is at capacity.
Test is Java + JUnit 5 + Mockito per the project convention for new
tests; uses a CoreSpan Mockito mock rather than the SimpleSpan Groovy
fixture so we don't depend on Groovy-then-Java compile order from the
test source set.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Reconcile PeerTagSchema once per reporting cycle on the aggregator thread
Addresses amarziali's review comment #3279340181 ("It would be more
efficient to trigger from the other side"). The producer-side reference
compare on every publish goes away; the aggregator thread reconciles
the cached schema against feature discovery once per reporting cycle.
- DDAgentFeaturesDiscovery: expose getLastTimeDiscovered() so callers
can detect a discovery refresh without copying the peerTags Set.
- PeerTagSchema: add `long lastTimeDiscovered` (plain, aggregator-only)
and `hasSameTagsAs(Set)`. of(Set, long) takes the timestamp; INTERNAL
uses a -1L sentinel since it's never reconciled.
- ConflatingMetricsAggregator:
* Drop the cachedPeerTagsSource volatile and the per-publish reference
compare.
* Producer fast path is now `cachedPeerTagSchema` volatile read +
null-check; first publish takes the one-time synchronized bootstrap.
* Add reconcilePeerTagSchema() that runs once per cycle on the
aggregator thread: fast-path timestamp compare, slow-path set
compare, bump-in-place when the set is unchanged.
- Aggregator: new `Runnable onReportCycle` constructor parameter, run at
the start of report() (before the flush, so any test awaiting
writer.finishBucket() observes the schema in its post-reconcile state
and so the next publish sees the new schema without a handoff).
- Update "should create bucket for each set of peer tags" to drive two
reporting cycles separated by a report() that triggers reconcile. The
old test relied on per-publish reference detection, which the new
design intentionally doesn't preserve -- the schema is now stable
within a cycle.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Merge branch 'dougqh/conflating-metrics-background-work' into dougqh/optimize-metric-key
Merge branch 'dougqh/optimize-metric-key' into dougqh/control-tag-cardinality
Add bootstrap + reconcile coverage for PeerTagSchema
Addresses round-3 review nice-to-haves on PR #11381.
- PeerTagSchemaTest: unit coverage for hasSameTagsAs() (the predicate
that drives the reconcile fast/slow path split), the of(Set, long)
factory, and the INTERNAL singleton. The hasSameTagsAs cases include
same-content-different-Set-reference (the case the reconcile fast path
relies on after a discovery refresh) and content-mismatch in either
direction.
- ConflatingMetricsAggregatorBootstrapTest: integration coverage for
the producer-side bootstrap + aggregator-thread reconcile flow.
* bootstrapHappensOnceOnFirstPublish -- three publishes against an
un-started aggregator (no consumer thread, no reconciles); verifies
features.peerTags() and features.getLastTimeDiscovered() are each
called exactly once.
* reconcileSkipsDeepCompareWhenTimestampMatches -- two cycles with
constant features.getLastTimeDiscovered(); each post-report
reconcile short-circuits on the timestamp fast path, so peerTags()
is called only by bootstrap (1 total).
* reconcileSurvivesTimestampBumpWhenTagsUnchanged -- timestamps bump
every reconcile, forcing the slow set-compare path; the tag set
stays identical, so the schema is preserved and continues to flush
buckets correctly across cycles.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Merge branch 'dougqh/conflating-metrics-background-work' into dougqh/optimize-metric-key
Use writer.finishBucket() count in bootstrap test for cascade compatibility
The verify(writer).add(MetricKey, AggregateMetric) signature is unique
to #11381; downstream branches use AggregateEntry. Switching to
verify(writer, times(2)).finishBucket() keeps the same behavioral
guarantee (both cycles flushed) across the stack.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Use writer.finishBucket() count in bootstrap test for cascade compatibility
The verify(writer).add(MetricKey, AggregateMetric) signature is unique
to #11381; downstream branches use AggregateEntry. Switching to
verify(writer, times(2)).finishBucket() keeps the same behavioral
guarantee (both cycles flushed) across the stack.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Merge branch 'dougqh/optimize-metric-key' into dougqh/control-tag-cardinality
Rename bootstrap test to ClientStatsAggregator + adapt PeerTagSchemaTest
#11387's ClientStatsAggregator renames ConflatingMetricsAggregator; the
test file's name and class refs need to match. PeerTagSchemaTest's
PeerTagSchema.of() calls need the (Set, long, HealthMetrics) signature
this branch introduced.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Merge branch 'master' into dougqh/conflating-metrics-background-work
Merge branch 'dougqh/conflating-metrics-background-work' into dougqh/optimize-metric-key
Preserve TRACER_METRICS_MAX_PENDING semantic + drop stale imports
TRACER_METRICS_MAX_PENDING previously counted conflating Batch slots
(~64 spans each). The inbox now holds 1 SpanSnapshot per slot, so
multiply the configured value by LEGACY_BATCH_SIZE (64) to keep
pre-existing customer overrides delivering the same effective
span-throughput capacity. Default stays at 2048 logical -> 131072
snapshot slots, identical to the prior 2048 batches * 64 spans.
Also drops two unused datadog.trace.core.SpanKindFilter imports left
behind in TraceGenerator.groovy after the isKind() override was removed
in favor of the CoreSpan default implementation.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Merge branch 'dougqh/conflating-metrics-background-work' into dougqh/optimize-metric-key
Add AdversarialMetricsBenchmark for capacity-bound stress testing
Ports the adversarial JMH benchmark from #11402 down to this branch so
we can compare #11381 vs master on a high-cardinality, high-throughput
workload. Adapted to use ConflatingMetricsAggregator (pre-rename) and
the FixedAgentFeaturesDiscovery / NullSink helpers already in
ConflatingMetricsAggregatorBenchmark.
8 producer threads hammer publish() with unique (service, operation,
resource, peer.hostname) per op so the aggregate cache fills+evicts
continuously and the inbox saturates. tearDown prints the drop
counters (inboxFull vs aggregateDropped) so the test verifies the
subsystem stayed bounded under attack.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Merge branch 'dougqh/conflating-metrics-background-work' into dougqh/optimize-metric-key
Trim AdversarialMetricsBenchmark counters and clarify printout
Drop traceComputedCalls / totalSpansCounted: under 8-way contention
the volatile-long ++/+= pattern was losing ~20% of updates (296M
counted vs 245M reported), and the numbers duplicate signal JMH's
ops/s already provides.
Switch inboxFull / aggregateDropped to LongAdder so the printed drop
shape (the order-of-magnitude story the bench is built to tell) is
accurate under contention.
Replace the stale "both forks combined for this run" string with text
that matches the actual @Fork(value=1) config and notes that counters
accumulate across warmup + measurement.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Merge branch 'dougqh/conflating-metrics-background-work' into dougqh/optimize-metric-key
Close PeerTagSchema reconcile race + cover the swap branch
buildPeerTagSchema previously read features.peerTags() before
features.getLastTimeDiscovered(). DDAgentFeaturesDiscovery exposes
those as two separate accessors against its volatile State -- a
state-swap interleaving could leave the cached schema tagged with a
NEWER timestamp than its names, after which the next reconcile
short-circuits on the timestamp compare and misses the tag-set update
until the next discovery refresh (~minute later).
Swap the read order so timestamp is captured first. With this
ordering, an interleaving leaves the schema OLDER than its names
instead -- the next reconcile sees a timestamp mismatch, runs the
deep compare, and self-heals on the very next cycle.
Also adds reconcileSwapsSchemaWhenTagSetChanges, which closes the
test gap on the slow-path swap branch
(cachedPeerTagSchema = PeerTagSchema.of(...)). End-to-end check via
the writer's captured MetricKeys: pre-swap snapshot carries only
peer.hostname, post-swap snapshot carries both peer.hostname and
peer.service.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Merge branch 'dougqh/conflating-metrics-background-work' into dougqh/optimize-metric-key
Adapt reconcileSwapsSchemaWhenTagSetChanges to AggregateEntry shape
#11382 collapses MetricWriter.add(MetricKey, AggregateMetric) into
add(AggregateEntry). Re-target the captor and accessors on this branch
so the test compiles and the same end-to-end peer-tag verification
holds.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Clarify materializePeerTags hit-counting loop
Splits the `if (values[i] != null && hitCount++ == 0)` conjunction
into nested ifs. Same semantics, no codegen impact after JIT --
just visibly says what the loop is doing rather than relying on
post-increment-inside-conjunction. Closes amarziali's review thread
on this block.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Merge branch 'dougqh/conflating-metrics-background-work' into dougqh/optimize-metric-key
# Conflicts:
# dd-trace-core/src/main/java/datadog/trace/common/metrics/Aggregator.java
Fix MetricsIntegrationTest entry recording call site
AggregateEntry consolidated MetricKey + AggregateMetric so recordDurations
lives directly on AggregateEntry now. The previous entry1.aggregate.
recordDurations(...) form compiles under Groovy's dynamic dispatch but
would throw MissingPropertyException at runtime since there is no
`aggregate` property. Resolves chatgpt-codex-connector's review comment.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Make ConflatingMetricAggregatorTest counter checks actually verify
The `1 * writer.add(value) >> { closure }` pattern treats the closure
as a stubbed return value -- Spock evaluates it but discards the
result, so `e.getHitCount() == X && ...` was a silent no-op across
31 occurrences. Wrapping the expression in `assert` makes Groovy's
power-assert throw on mismatch, which Spock surfaces as a real
failure. Resolves chatgpt-codex-connector's review comment.
All 41 tests still pass, so the previously-unverified assertions
happened to hold.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Drop dead recordDurations(int, AtomicLongArray) batch API
This method was a vestige of master's Batch design where multiple
producer threads wrote into an AtomicLongArray slot concurrently and
the aggregator drained ~64 durations per Batch in one call. The new
producer/consumer split publishes one SpanSnapshot per span, so
production only ever calls recordOneDuration(long).
Migrate the three remaining callers (AggregateEntryTest,
SerializingMetricWriterTest, MetricsIntegrationTest) to a loop of
recordOneDuration(long) calls, then delete the batched method and its
AtomicLongArray imp…
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Changelog: