Add FFE evaluation metrics by leoromanovsky · Pull Request #3911 · DataDog/dd-trace-php

leoromanovsky · 2026-05-22T20:28:30Z

Motivation

PHP FFE evaluation metrics need to follow the native runtime architecture from the shared design doc: https://docs.google.com/document/d/1NvMfTpZWLBlFmEFNjdnlMyeVpy5l7KD8qujGFco6w2w/edit?tab=t.0

#3906 gave PHP a real Remote Config-backed evaluator. This PR adds metric delivery without a PHP OTLP writer, encoder, or transport. PHP records a typed final evaluation result; native/libdatadog sidecar code owns buffering, aggregation, encoding, and OTLP submission.

Local system-test validation also exposed a runtime bug in the evaluator config store: RC could be delivered and ACKed on one native thread while a PHP request evaluated on another thread and still saw provider_not_ready. The UFC config now lives in synchronized process-wide Rust state instead of thread_local!, so RC-loaded config is visible to later evaluations in the same process. This is the right shape for PHP-FPM/ZTS because the stored value is Rust Configuration data, not PHP request-local state.

Changes

This PR targets current master and advances the libdatadog submodule to 091c98df6 from DataDog/libdatadog#2076. The companion libdatadog change lets the sidecar session own the OTLP metrics endpoint and preserves request-replayer test behavior through libdatadog endpoint handling.

Native DDTrace\ffe_evaluate records evaluation metrics by default. The PHP 8 OpenFeature path disables automatic native recording for the raw evaluation call and then records the final OpenFeature-aware outcome once through the typed EvaluationMetricRecorder adapter.

FFE metric buffering and flushing live in tracer/ffe.c / tracer/ffe.h, matching the tracer/common-extension split requested in review. The flush path sends metric batches to the sidecar without passing an endpoint per flush; the sidecar session is configured once with the resolved OTLP metrics endpoint.

Endpoint resolution now goes through ext/otel_config.*. Unix DD_TRACE_AGENT_URL values are normalized through libdatadog's UDS helper so /var/run/datadog/apm.socket becomes a valid unix://<hex-socket-path>/v1/metrics endpoint.

The FFE evaluator config store in components-rs/ffe.rs is now process-wide LazyLock<RwLock<...>> state. This fixes the observed RC ACK vs evaluation-thread split and adds a regression test asserting config loaded on one thread is visible on another.

Decisions

Metric delivery remains separate from exposure logging. This PR intentionally expects OTLP evaluation metrics and no EVP exposure stream; #3910 owns exposure delivery and DataDog/libdatadog#2026 owns the exposure-only sidecar work.

The PHP layer is a typed adapter only. It does not aggregate metrics, encode OTLP, or submit HTTP payloads. Users do not need to call anything extra; metrics are recorded by the Datadog provider/evaluator path.

The evaluator config is intentionally not request-local or PHP module-global. FFE config comes from RC and must persist across requests and be visible across native/PHP threads in the same process. The shared state is protected by Rust RwLock and stores no PHP zval, zend_string, or request-owned memory.

Validation

Code style:

git diff --check

Result: passed locally.

Focused Rust regression tests:

RUSTC_BOOTSTRAP=1 cargo test -p datadog-php ffe::tests -- --nocapture

Result: 2 passed; includes configuration_state_is_process_wide and empty_targeting_key_is_not_dropped.

ZTS build proof:

./tooling/bin/build-debug-artifact gnu-aarch64-8.2-zts <output-dir>

Result: passed locally; produced dd-library-php-1.20.0-aarch64-linux-gnu.tar.gz from the ZTS target.

Local system-tests validation through DataDog/system-tests#7033:

# Build local debug artifact. The local VERSION was temporarily stamped
# as 1.21.0-dev so system-tests#7033 activates the PHP metric tests.
./tooling/bin/build-debug-artifact gnu-aarch64-8.2-nts <system-tests>/binaries

SYSTEM_TEST_BUILD_TIMEOUT=1800 ./build.sh php -w php-fpm-8.2

TEST_LIBRARY=php \
WEBLOG_VARIANT=php-fpm-8.2 \
./run.sh +l php FEATURE_FLAGGING_AND_EXPERIMENTATION tests/ffe/test_flag_eval_metrics.py

Before the process-wide FFE config fix, the active run failed 12/17 with emitted metrics tagged as error.type:provider_not_ready even though RC delivery was ACKed. After the fix, result was:

17 passed in 81.31s

FFE dogfooding transport proof:

cd ffe-dogfooding
FLAG_KEY=php-metrics-dogfood-1780520626 \
SUBJECT_ID=php-metrics-validator-1780520626 \
REQUESTS=5 \
WAIT_SECONDS=120 \
EXPECT_OTLP=1 \
EXPECT_EVP=0 \
scripts/validate-local-php-telemetry.sh

Result:

OTLP metrics OK: php7=5 php8-openfeature=5 flag=php-metrics-dogfood-1780520626
EVP exposures not observed: php7=0 php8-openfeature=0 flag=php-metrics-dogfood-1780520626 subject=php-metrics-validator-1780520626

Dogfooding did not prove live non-default RC values in this run because the local agent was rejecting the configured key with API Key invalid (403 response). The system-tests run above is the config-backed validation evidence for this PR.

Related PRs: #3906, #3910, DataDog/libdatadog#2026, DataDog/libdatadog#2076, DataDog/system-tests#7033.

datadog-official · 2026-05-22T20:36:48Z

Tests

✨ Fix all issues with BitsAI or with Cursor

⚠️ Warnings

🚦 7 Pipeline jobs failed

DataDog/apm-reliability/dd-trace-php | ASAN test_c with multiple observers: [8.3]

See error
RPC failed: HTTP 503 while fetching from git repository. Error reading section header 'acknowledgments'.

DataDog/apm-reliability/dd-trace-php | ASAN test_c with multiple observers: [8.5]

See error
Uncaught Exception: wait for replay timeout in /go/src/github.com/DataDog/apm-reliability/dd-trace-php/tmp/build_extension/tests/ext/includes/request_replayer.inc:100

DataDog/apm-reliability/dd-trace-php | min install tests

See error
Check the library config files failed. DD_APPSEC_SCA_ENABLED is set by INI while expected to be set by fleet_config.

View all 7 failed jobs.

❄️ 1 New flaky test detected

tmp/build_extension/tests/ext/ffe/evaluation_metrics_unix_agent_endpoint.phpt (FFE evaluation metrics flush over DD_TRACE_AGENT_URL unix domain socket) from php.tmp.build_extension.tests.ext.ffe (Fix with Cursor)
Termsig=6
New test introduced in this PR is flaky.
View in Flaky Test Management

ℹ️ Info

No other issues found (see more)

🧪 All tests passed

🎯 Code Coverage (details)
• Patch Coverage: 100.00%
• Overall Coverage: 54.15% (+0.03%)

Useful? React with 👍 / 👎

_{This comment will be updated automatically if new data arrives.

🔗 Commit SHA: 867c3c5 | Docs | Datadog PR Page | Give us feedback!}

pr-commenter · 2026-05-22T21:47:53Z

Benchmarks [ tracer ]

Benchmark execution time: 2026-06-05 15:00:06

Comparing candidate commit 867c3c5 in PR branch leo.romanovsky/m3-ffe-evaluation-metrics with baseline commit f4ff7b4 in branch master.

Found 3 performance improvements and 1 performance regressions! Performance is the same for 189 metrics, 1 unstable metrics.

Explanation

This is an A/B test comparing a candidate commit's performance against that of a baseline commit. Performance changes are noted in the tables below as:

🟩 = significantly better candidate vs. baseline
🟥 = significantly worse candidate vs. baseline

We compute a confidence interval (CI) over the relative difference of means between metrics from the candidate and baseline commits, considering the baseline as the reference.

If the CI is entirely outside the configured SIGNIFICANT_IMPACT_THRESHOLD (or the deprecated UNCONFIDENCE_THRESHOLD), the change is considered significant.

Feel free to reach out to #apm-benchmarking-platform on Slack if you have any questions.

More details about the CI and significant changes

You can imagine this CI as a range of values that is likely to contain the true difference of means between the candidate and baseline commits.

CIs of the difference of means are often centered around 0%, because often changes are not that big:

---------------------------------(------|---^--------)-------------------------------->
                              -0.6%    0%  0.3%     +1.2%
                                 |          |        |
         lower bound of the CI --'          |        |
sample mean (center of the CI) -------------'        |
         upper bound of the CI ----------------------'

As described above, a change is considered significant if the CI is entirely outside the configured SIGNIFICANT_IMPACT_THRESHOLD (or the deprecated UNCONFIDENCE_THRESHOLD).

For instance, for an execution time metric, this confidence interval indicates a significantly worse performance:

----------------------------------------|---------|---(---------^---------)---------->
                                       0%        1%  1.3%      2.2%      3.1%
                                                  |   |         |         |
       significant impact threshold --------------'   |         |         |
                      lower bound of CI --------------'         |         |
       sample mean (center of the CI) --------------------------'         |
                      upper bound of CI ----------------------------------'

scenario:HookBench/benchHookOverheadTraceMethod-opcache

🟩 execution_time [-11.055µs; -5.431µs] or [-6.177%; -3.034%]

scenario:MessagePackSerializationBench/benchMessagePackSerialization

🟥 execution_time [+4.177µs; +5.543µs] or [+4.056%; +5.382%]

scenario:MessagePackSerializationBench/benchMessagePackSerialization-opcache

🟩 execution_time [-7.054µs; -4.186µs] or [-6.403%; -3.799%]

scenario:TraceAnnotationsBench/benchTraceAnnotationOverhead-opcache

🟩 execution_time [-7.939µs; -4.524µs] or [-4.458%; -2.540%]

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f021234c39

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Adds a parallel pathway for PHP feature-flag evaluation metrics mirroring the FfeExposures forwarder. dd-trace-php encodes `feature_flag.evaluations` counters as OTLP/protobuf in PHP (via its existing PHP 7-safe `OtlpMetricEncoder`) and ships the encoded bytes to the sidecar, which POSTs them to the user-configured OTLP HTTP metrics intake. Why a sibling action instead of reusing FfeExposures: - The OTLP collector is not the Datadog Agent. It's user-configurable via OTEL_EXPORTER_OTLP_METRICS_ENDPOINT (default http://localhost:4318/v1/metrics), so the endpoint travels with the payload rather than being derived from the sidecar session's agent base URL. - Content type differs (application/x-protobuf vs application/json). - No EVP subdomain header. - The payload is binary protobuf, not a JSON string. dd-trace-php side (PR DataDog/dd-trace-php#3911) will refactor its existing `OtlpHttpMetricTransport` (which currently does PHP-side HTTP I/O, violating the architectural rule "no I/O outside the sidecar") to call this new FFI. Validation: - `cargo test -p datadog-sidecar ffe` passes 7 tests (3 exposures + 4 metrics). - `cargo check -p datadog-sidecar-ffi` clean.

Per Bob's PR review (2026-05-22), the tracer extension must perform no I/O outside the sidecar. Replaces the raw-socket `AgentExposureTransport` with `SidecarExposureTransport`, which forwards exposure batches to the libdatadog sidecar via a new native PHP function `\DDTrace\send_ffe_exposures` that calls the `ddog_sidecar_send_ffe_exposures` FFI added in DataDog/libdatadog#2026. PHP side: - Delete `Internal/Exposure/AgentExposureTransport.php` (raw socket POST to the Agent EVP proxy). - Add `Internal/Exposure/SidecarExposureTransport.php` that JSON-encodes the batch and calls `\DDTrace\send_ffe_exposures()`. Fire-and-forget; the sidecar handles retries. - Update `ExposureWriter::createDefault()` to instantiate the sidecar transport. - Drop the obsolete `testAgentTransportBuildsAgentEvpRequest` PHPUnit test (HTTP construction now lives in libdatadog, covered by `cargo test -p datadog-sidecar ffe_flusher`). - Add `Internal/DefaultEvaluationCompletedHook` and `Internal/CompositeEvaluationCompletedHook` so production callers go through a composite hook factory. In this PR the composite contains only `ExposureHook`; the metrics PR (#3911) contributes `EvaluationMetricHook` and the file conflict at merge resolves by combining both. Update `Client::create()` to call `DefaultEvaluationCompletedHook::create()`. C/Rust bridge: - Declare `ddog_ByteSlice` (and underlying `ddog_Slice_U8`) in `components-rs/common.h` for the metrics path; declare both `ddog_sidecar_send_ffe_exposures` and `ddog_sidecar_send_ffe_metrics` in `components-rs/sidecar.h`. - Add C wrappers `ddtrace_sidecar_send_ffe_exposures(zend_string *)` and `ddtrace_sidecar_send_ffe_metrics(zend_string *endpoint, zend_string *payload_bytes)` in `ext/sidecar.{h,c}` that call the FFI with the current sidecar transport + instance id + queue id. - Declare native PHP functions `\DDTrace\send_ffe_exposures(string): bool` and `\DDTrace\send_ffe_metrics(string, string): bool` in `ext/ddtrace.stub.php`; add corresponding arginfo entries and `ZEND_FUNCTION` registrations in `ext/ddtrace_arginfo.h`; implement `PHP_FUNCTION(DDTrace_send_ffe_exposures)` and `PHP_FUNCTION(DDTrace_send_ffe_metrics)` in `ext/ddtrace.c`. - Bump `libdatadog` submodule to FFE branch tip `29762335c` (which provides both FFIs). The submodule will be bumped to the libdatadog main commit once #2026 merges. Docs: - Add `docs/php-ffe-stack/{stack,system}-pr3910.{mmd,png}` for this PR. Validation: - `php vendor/bin/phpunit --config phpunit.xml tests/api/Unit/FeatureFlags` → 41 tests, 174 assertions, OK. - libdatadog sidecar tests (`cargo test -p datadog-sidecar ffe_flusher`) → 3 passed, on the pinned submodule commit. - Mermaid PNGs regenerate via `npx @mermaid-js/mermaid-cli`. `make test_featureflags` and `make test_c TESTS=tests/ext/ffe/...` will run in CI; running them locally requires rebuilding the extension which is gated behind libdatadog #2026 merging.

Adds the M3 evaluation-metrics layer on top of the hook PR (#3909) as a sibling of the EVP exposures PR (#3910). Records `feature_flag.evaluations` for both PHP 7 (DD Client hook) and PHP 8 (OpenFeature SDK hook); both paths share `EvaluationMetricHook::sharedWriter()` for unified aggregation. OTLP/protobuf payloads are encoded in PHP via the existing `OtlpMetricEncoder` and delivered to the user-configured OTLP HTTP metrics intake through the libdatadog sidecar (`ddog_sidecar_send_ffe_metrics` FFI added in DataDog/libdatadog#2026). This branch is force-pushed (user-authorized one-time exception to the no-force-push rule, 2026-05-23) to restructure history away from being linearly stacked on the M2 exposures PR (#3910). The PR now stacks directly on the hook PR (#3909) as a sibling of the EVP PR. PHP side: - Add `Internal/Metric/EvaluationMetricWriter` with bounded series aggregation, drop accounting, and shutdown flush. - Add `Internal/Metric/EvaluationMetricHook` (DD Client hook) and `OtlpMetricEncoder` (PHP 7-safe protobuf encoding). - Add `Internal/Metric/SidecarOtlpMetricsTransport` that calls `\DDTrace\send_ffe_metrics()` (FFI declared in #3910). Endpoint resolution: `OTEL_EXPORTER_OTLP_METRICS_ENDPOINT`, falling back to `OTEL_EXPORTER_OTLP_ENDPOINT + /v1/metrics`, default `http://localhost:4318/v1/metrics`. - Add `DDTrace\OpenFeature\EvalMetricsHook` implementing `OpenFeature\interfaces\hooks\Hook` (after + error stages), registered on `DataDogProvider` via `setHooks()`. - `DataDogProvider` constructs its internal DD `Client` with `DefaultEvaluationCompletedHook::createWithoutMetric()` so the OpenFeature path records the metric via the OpenFeature hook (PR 3911 scope) and NOT via the DD Client hook — preventing double-counting. PHP 7 path keeps recording via the DD Client hook. - Add `Internal/CompositeEvaluationCompletedHook` and `Internal/DefaultEvaluationCompletedHook` (metric-only composite). This is the merge-conflict point with PR #3910's `[ExposureHook]` composite — second merge resolves by combining both hooks. - Update `Client::create()` to call `DefaultEvaluationCompletedHook::create()`. - Drop the obsolete `testOtlpTransportBuildsHttpProtobufRequest` PHPUnit test (HTTP construction now lives in libdatadog, covered by `cargo test -p datadog-sidecar ffe_metrics_flusher`). - Add `_files_openfeature.php` entry for `EvalMetricsHook.php`. C/Rust bridge: the `\DDTrace\send_ffe_metrics()` native function, its C wrapper `ddtrace_sidecar_send_ffe_metrics()`, and the `ddog_sidecar_send_ffe_metrics` FFI declaration in `components-rs/sidecar.h` were already added in #3910. This PR's branch picks up those changes once #3910 merges (or via the same libdatadog submodule pin during review). For development locally the libdatadog submodule is pinned to the FFE branch tip (`29762335c`). Docs: - Add `docs/php-ffe-stack/{stack,system}-pr3911.{mmd,png}` per the 4-PR documentation convention. Validation: - `php vendor/bin/phpunit --config phpunit.xml tests/api/Unit/FeatureFlags` → 40 tests, 160 assertions, OK. - Mermaid PNGs regenerate via `npx @mermaid-js/mermaid-cli`. `make test_featureflags`, OpenFeature PHPUnit, and ffe-dogfooding end-to-end validation will run in CI / are validated separately by FOLLOW-05 Steps 4–5.

The SidecarOtlpMetricsTransport::resolveEndpoint() default ("http://localhost:4318/v1/metrics") doesn't match the system-tests parametric setup, where the PHP test client receives DD_AGENT_HOST=<test_agent_container> but no OTEL_EXPORTER_OTLP_METRICS_ENDPOINT. The previous OtlpHttpMetricTransport (replaced by this transport) derived the OTLP endpoint from DD_AGENT_HOST + port 4318. Restore that fallback so system-tests test_php_ffe_evaluation_metric finds the test-agent OTLP intake. Resolution order (now matches the old transport): 1. OTEL_EXPORTER_OTLP_METRICS_ENDPOINT (explicit) 2. OTEL_EXPORTER_OTLP_ENDPOINT + /v1/metrics 3. DD_AGENT_HOST + :4318/v1/metrics 4. localhost:4318/v1/metrics

The C/Rust bridge for the new native PHP functions \DDTrace\send_ffe_exposures() and \DDTrace\send_ffe_metrics() lives on the M2 EVP exposures PR (#3910) because that PR introduced the bridge when refactoring the exposure transport. PR #3911 (this PR) needs the same bridge for its OTLP metrics transport — without it the SidecarOtlpMetricsTransport silently drops batches because function_exists('\\DDTrace\\send_ffe_metrics') is false. Adds the same bridge files here so the M3 branch is independently compilable. At merge time the two PRs will conflict at the file level on these bridge files; resolution is deduplication (the bridge is identical in both PRs by design). Files added/modified: - components-rs/sidecar.h: declares ddog_sidecar_send_ffe_exposures and ddog_sidecar_send_ffe_metrics FFIs. - components-rs/common.h: declares ddog_ByteSlice typedef for the metrics payload. - ext/sidecar.h, ext/sidecar.c: C wrappers ddtrace_sidecar_send_ffe_exposures() and ddtrace_sidecar_send_ffe_metrics(). - ext/ddtrace.stub.php, ext/ddtrace_arginfo.h, ext/ddtrace.c: declares the native PHP functions and the PHP_FUNCTION implementations.

Pulls in libdatadog commit `875ec8f0e` ("fix(sidecar): dispatch FFE actions before application-entry check"). Without this fix, the `SidecarOtlpMetricsTransport::send()` call from PHP would silently no-op for short-lived processes: the sidecar received the `FfeMetrics` action but dropped it because the `Entry::Occupied` gate on the application metadata had not yet fired. This unblocks the parametric system-test `Test_Feature_Flag_Parametric_Evaluation_Metrics::test_php_ffe_evaluation_metric` which exercises the full PHP -> sidecar -> OTLP-HTTP-intake path end-to-end. Local result: 26/27 FFE-scoped parametric tests pass (remaining failure is the EVP exposure test, which lives on the M2 PR #3910 branch).

Pulls in libdatadog commit `875ec8f0e` ("fix(sidecar): dispatch FFE actions before application-entry check") so the EVP exposure batch sent via `ddog_sidecar_send_ffe_exposures` is no longer silently dropped when the PHP runtime hasn't yet registered the application against the sidecar's `QueueId`. Same fix is on the sibling PR #3911 (`2a48c4987") for the OTLP metric path. Without this submodule bump, `Test_Feature_Flag_Parametric_Exposures::test_php_ffe_exposure_event` sees zero EVP POSTs at the test-agent because the sidecar's `enqueue_actions` handler discards the `FfeExposures` action under the `Entry::Occupied` gate.

…macOS+colima `build-debug-artifact` produces a `/output` bind mount via `-v ${TMP_OUT}:/output`. On macOS+colima only paths under $HOME are mounted into the Linux VM; the macOS default `/var/folders/...` temp dir is not, so writes from inside the container to `/output` land in the VM and never propagate back to the host. The script exits without error and the user's binaries dir is left with whatever stale artifact was there before. Pin TMP_OUT/TMP_PKG under $OUTPUT_DIR (which the user just passed as an absolute, usable destination) so the bind mount is always on a host-visible path. Inherits cleanup via the existing EXIT trap. The same fix is independently on the M3 branch (#3911) as part of e74b050; bringing it to the M2 branch (#3910) so both branches are buildable on macOS without a `TMPDIR=$HOME/...` override.

…gh rez Three fixes to the per-PR FFE diagrams: 1. **Titles were truncated to "PHP FFE 4-PR stack — current =" with the PR number missing.** Mermaid uses the YAML frontmatter for the title block, and YAML treats unquoted `#` as the start of a comment, so `title: PHP FFE 4-PR stack — current = #3911 (OTLP metrics)` got parsed as just `PHP FFE 4-PR stack — current =`. Quoting the title keeps the `#PR-number` portion intact. 2. **System diagrams switched from `flowchart LR` to `flowchart TD`.** LR forced the PHP-process / host / backend lanes into a single very-wide row that rendered as an unreadable horizontal strip on PR pages. TD stacks them vertically and keeps the per-lane subgraphs readable. 3. **Re-rendered at 2400×2400 with `--scale 3`** (~1800×2000 stack, ~3000×4500 system) instead of the default ~600px width. PR-page thumbnails render legibly and zoomed-in detail stays sharp. README's regeneration recipe updated with all three knobs and a note on why the `title:` quoting matters.

…gh rez Same three fixes as the parallel commit on the M3 (PR #3911) branch: 1. Quote the YAML `title:` so the `#PR-number` is not parsed as a comment (previous render had the title truncated at "current ="). 2. `flowchart LR` → `flowchart TD` on system diagrams so vertical PHP-process → host → backend lanes stack vertically instead of getting squeezed into one wide row. 3. Render with `-w 2400 -H 2400 --scale 3 -b white` (~1900×2000 stack, ~3000×4600 system) instead of ~600px default.

…gh rez Same three fixes as on the M2 (#3910) and M3 (#3911) sibling branches: 1. Quote the YAML `title:` so the `#PR-number` survives parsing (otherwise YAML treats the `#` as a comment and the title renders as "PHP FFE 4-PR stack — current =" with the rest missing). 2. `flowchart LR` → `flowchart TD` on the system diagram so the PHP-process / host-sidecar / backend lanes stack vertically. 3. Render at 2400×2400 `--scale 3` instead of ~600px default.

Brings the PHP FFE diagram convention to the M1 PR. Each subsequent PR in the stack (#3909, #3910, #3911) already carried its own stack + system diagram; #3906 was missing them. Mirrors the format used by the rest of the stack: - `stack-pr3906.mmd` — the 4-PR stack with #3906 badged as current and the downstream layers shown as "future". - `system-pr3906.mmd` — the target end-to-end architecture with M1's scope (UserCode, OpenFeature Client, DataDogProvider, DDTrace FeatureFlags Client, NativeEvaluator, Remote Config client) highlighted, and everything from the Hook layer onward dashed. All conventions match the other branches: quoted YAML titles (to keep `#PR-number` out of the YAML comment parser), `flowchart TD` orientation, rendered with `-w 2400 -H 2400 --scale 3 -b white`.

`AbstractProvider::setHooks(array $hooks)` REPLACES the hook list, so registering our `EvalMetricsHook` via `setHooks([$metricsHook])` in the constructor would silently drop our metric emission as soon as a user configures their own provider-level hooks. Override `getHooks()` to always prepend the Datadog metric hook to the caller-supplied list. The user can register their own provider hooks freely (`$provider->setHooks($theirHooks)`) and we still record `feature_flag.evaluations` on every OpenFeature evaluation. Adds a unit test that constructs a DataDogProvider, calls `$provider->setHooks([$userHook])`, then asserts `$provider->getHooks()` returns `[EvalMetricsHook, $userHook]`.

## Motivation Companion PHP evaluation-metrics work needs FFE metrics to follow the sidecar architecture used by the rest of the tracer: configure endpoints once on the session, then send typed sidecar actions without carrying endpoint strings on every flush. The previous FFE metrics path accepted an OTLP endpoint alongside each metrics batch. That diverged from the normal `Endpoint` flow, made test-session-token propagation easy to miss, and forced PHP to keep fetching endpoint configuration during request flushes. Design context: https://docs.google.com/document/d/1NvMfTpZWLBlFmEFNjdnlMyeVpy5l7KD8qujGFco6w2w/edit?tab=t.0 Companion PHP PR: DataDog/dd-trace-php#3911 ## Changes - Add an optional OTLP metrics `Endpoint` to sidecar session configuration. - Thread that endpoint through `ddog_sidecar_session_set_config` and session state. - Dispatch `SidecarAction::FfeEvaluationMetrics` through the session's configured OTLP metrics endpoint instead of an endpoint passed with each metrics batch. - Preserve request-replayer test-session routing by inheriting the agent endpoint `test_token` when the OTLP metrics endpoint does not already have one. - Update runtime test-token changes so the configured OTLP metrics endpoint is updated with the rest of the session endpoints. - Extend the sidecar FFE metrics dispatch test to assert the outgoing OTLP request carries `x-datadog-test-session-token` through `Endpoint` standard headers. ## Decisions FFE metrics remain caller-driven. This PR does not make shared evaluator calls auto-emit metrics, so SDKs that still log metrics in host language implementations can continue to coexist without double counting. The sidecar owns metrics transport, aggregation, OTLP/protobuf encoding, and endpoint/header handling. PHP remains responsible for constructing the appropriate OTLP metrics endpoint from its existing configuration resolver and supplying it once during sidecar session setup. The test-session-token behavior is intentionally implemented through `Endpoint`, not by manually adding FFE-specific headers. Production endpoints normally have no test token, so production behavior is unchanged. ## Validation Local targeted test: ```sh cargo test -p datadog-sidecar ffe_metric_actions_dispatch_without_registered_application ``` Result: passed locally. PHP integration smoke test: Companion PHP PR: DataDog/dd-trace-php#3911 The local PHP branch backing that PR was updated to this libdatadog PR head (`091c98df6`): PHP passes the OTLP metrics `Endpoint` once during sidecar session setup, and FFE metric flushes send only context plus metric payloads. ```sh docker-compose run --no-deps 8.3-bookworm bash -lc \ 'export PATH=/rust/cargo/bin:$PATH TMPDIR=/home/circleci/app/tmp TMP=/home/circleci/app/tmp TEMP=/home/circleci/app/tmp; make MAX_TEST_PARALLELISM=1 test_c TESTS=tests/ext/ffe/evaluation_metrics_unix_agent_endpoint.phpt' ``` Result: `tests/ext/ffe/evaluation_metrics_unix_agent_endpoint.phpt` passed locally under Docker on PHP 8.3/Linux aarch64: `Tests passed 1 (100.0%)`. That PHPT validates the PHP sidecar integration over `DD_TRACE_AGENT_URL=unix://...`: the request reaches `/v1/metrics`, uses `application/x-protobuf`, and carries the request-replayer test-session token through the configured endpoint path. Negative validation during review: - Removing only the mock header matcher still allows the request through, but stops proving token propagation. - Keeping the matcher and removing `Endpoint.test_token` fails with a missing `x-datadog-test-session-token` header. - Restoring both passes, confirming the request uses `Endpoint` standard headers. CI on commit `091c98d` reports all tests passed and no new flaky tests. ## End-to-End Evidence The companion PHP branch for DataDog/dd-trace-php#3911 was validated locally against this libdatadog PR head (`091c98df6`). That path exercises the full PHP metrics flow: `PHP API -> tracer request buffer -> sidecar session-configured OTLP metrics Endpoint -> sidecar FFE metrics action -> request-replayer UDS intake` Docker command used from the PHP repo: ```sh docker-compose run --no-deps 8.3-bookworm bash -lc \ 'export PATH=/rust/cargo/bin:$PATH TMPDIR=/home/circleci/app/tmp TMP=/home/circleci/app/tmp TEMP=/home/circleci/app/tmp; make MAX_TEST_PARALLELISM=1 test_c TESTS=tests/ext/ffe/evaluation_metrics_unix_agent_endpoint.phpt' ``` Observed result: ```text Number of tests : 1 1 Tests failed : 0 ( 0.0%) ( 0.0%) Tests passed : 1 (100.0%) (100.0%) ``` The passing PHPT confirms PHP can record and flush an FFE evaluation metric through the sidecar over `DD_TRACE_AGENT_URL=unix://...`, and that the captured request reaches `/v1/metrics` with `Content-Type: application/x-protobuf` plus the request-replayer test-session token propagated through the configured `Endpoint`. Co-authored-by: leo.romanovsky <leo.romanovsky@datadoghq.com>

…-metrics' into leo.romanovsky/m3-ffe-evaluation-metrics

sameerank

Looked at this mainly checking for parity with the other server SDKs, and I think overall it looks right. Nice work!

sameerank · 2026-06-04T19:03:06Z

    ): ResolutionDetailsInterface {
        $details = $this->evaluate($flagKey, $expectedType, $defaultValue, $this->normalizeContext($context));
+        $this->warnIfNonProductionRuntime($details);
+        $this->recordEvaluationMetric($flagKey, $details);


The other SDKs have been using the finally hook to record evaluations. E.g.

Ruby: https://github.com/DataDog/dd-trace-rb/blob/6cf558f959f5edff37b6a243d25754687032efc9/lib/datadog/open_feature/hooks/flag_eval_hook.rb#L29
Go: https://github.com/DataDog/dd-trace-go/blob/f7dde0ce6bcea5417dc6663cf8b089d8e5555129/openfeature/flageval_metrics.go#L61
Java: https://github.com/DataDog/dd-trace-java/blob/81bb30521d96519d510b0446c5ca3eefb631e0b9/products/feature-flagging/feature-flagging-api/src/main/java/datadog/trace/api/openfeature/FlagEvalHook.java#L36

Explanation for why we use finally in JS: https://github.com/DataDog/dd-trace-js/blob/41789b40877c2072f8022fd3d854749741103d26/packages/dd-trace/src/openfeature/eval-metrics-hook.js#L14-L20

And I see that our reason for not doing that here is that ResolutionDetails aren't included in OpenFeature's implementation https://github.com/open-feature/php-sdk/blob/af1de424cbe9dcaae0c8b17d92754ffcd28d5f2f/src/implementation/hooks/HookExecutor.php#L81-L88 so it's not possible to use the finally hook here.

It took a bit of cross-referencing to understand why this differed from the other SDKs, so worth adding a comment explaining why we do this differently

Otherwise I think this is a good approach given the limitations 👍 and hopefully at some point we can update the ^2.1 version constraint

sameerank · 2026-06-04T19:42:30Z

+    private function allocationKey(EvaluationDetails $details): ?string
+    {
+        $exposure = $details->getExposureData();
+        if (!is_array($exposure) || !isset($exposure['allocationKey']) || !is_string($exposure['allocationKey'])) {


I'm recalling that in the other SDKs we wanted to indicate that this was an internal key in the name DataDog/dd-trace-rb#5599 (comment)

But I guess that's not possible here without a concept of flagMetadata

sameerank · 2026-06-04T19:50:26Z

+    metric->flag_key = zend_string_init(flag_key, flag_key_len, 0);
+    metric->variant = zend_string_init(variant ? variant : "", variant ? variant_len : 0, 0);
+    metric->reason = zend_string_init(reason ? reason : "", reason ? reason_len : 0, 0);
+    metric->error_type = zend_string_init(error_type ? error_type : "", error_type ? error_type_len : 0, 0);
+    metric->allocation_key = zend_string_init(allocation_key ? allocation_key : "", allocation_key ? allocation_key_len : 0, 0);


So on a successful eval, this sends error_type:"" and allocation_key:"", but the sidecar omits them in ffe_metric_from_ffi, which matches what the other SDKs do

Libdatadog integrations are a bit opaque so this looked like an accident and took cross-referencing to realize that it wasn't. A comment would be helpful here too

…fe-evaluation-metrics

Also build the endpoint fully in rust code for simplicity

This reverts commit c5934af.

Signed-off-by: Bob Weinand <bob.weinand@datadoghq.com>

bwoebi

Looks like all non-flaky tests pass now

leoromanovsky mentioned this pull request May 23, 2026

Add FFE exposure emission #3910

Merged

leoromanovsky marked this pull request as ready for review May 23, 2026 16:01

leoromanovsky requested review from a team as code owners May 23, 2026 16:01

leoromanovsky requested review from dd-oleksii and typotter and removed request for a team May 23, 2026 16:01

chatgpt-codex-connector Bot reviewed May 23, 2026

View reviewed changes

Comment thread src/api/FeatureFlags/Internal/Metric/OtlpHttpMetricTransport.php Outdated

leoromanovsky marked this pull request as draft May 23, 2026 16:10

This was referenced May 23, 2026

feat(sidecar): forward FFE exposures to EVP proxy DataDog/libdatadog#2026

Merged

Add FFE evaluation completion hook #3909

Merged

leoromanovsky force-pushed the leo.romanovsky/m3-ffe-evaluation-metrics branch from f021234 to e74b050 Compare May 23, 2026 17:44

leoromanovsky changed the base branch from leo.romanovsky/m2-ffe-exposures to leo.romanovsky/m2-m3-evaluation-completed-base May 23, 2026 17:44

leoromanovsky added 3 commits May 23, 2026 13:53

leoromanovsky mentioned this pull request May 23, 2026

feat(ffe): add runtime-backed PHP feature flag evaluation #3906

Merged

leoromanovsky added 2 commits May 23, 2026 19:52

docs(ffe): drop 'Hook seam' wording, use 'Hook layer'

3a88871

Align OTLP endpoint metadata with registry

8d4f858

leoromanovsky requested review from bwoebi and morrisonlevi June 4, 2026 02:03

fix: start sidecar for FFE evaluation metrics

2b4257c

morrisonlevi and others added 3 commits June 4, 2026 08:23

Update libdatadog submodule now that PR has merged

45dcb82

Merge branch 'master' into leo.romanovsky/m3-ffe-evaluation-metrics

0c5f6a5

Merge remote-tracking branch 'origin/leo.romanovsky/m3-ffe-evaluation…

86957db

…-metrics' into leo.romanovsky/m3-ffe-evaluation-metrics

bwoebi mentioned this pull request Jun 4, 2026

Bump libdatadog latest #3956

Closed

2 tasks

sameerank approved these changes Jun 4, 2026

View reviewed changes

leoromanovsky and others added 5 commits June 4, 2026 16:36

Merge remote-tracking branch 'origin/master' into leo.romanovsky/m3-f…

8edea85

…fe-evaluation-metrics

fix: keep FFE config thread-local

786f6f9

Move ffe_metric_buffer to tracer/

91e603b

Simplify configuration for otel metrics, make it first class

f022c22

Also build the endpoint fully in rust code for simplicity

Avoid zend_string flip-flopping

c4ea52b

leoromanovsky mentioned this pull request Jun 4, 2026

test(ffe): wait for ready evaluation metric response DataDog/system-tests#7090

Draft

leoromanovsky and others added 6 commits June 4, 2026 17:49

fix: support OTEL metrics endpoint config

c5934af

docs: explain PHP FFE metric recording

c6d6913

Revert "fix: support OTEL metrics endpoint config"

5f39d5b

This reverts commit c5934af.

Fix missing OTEL_

2140b3f

Fix compilation

1bc5fc6

Signed-off-by: Bob Weinand <bob.weinand@datadoghq.com>

Fix otel zai config

f69000f

Signed-off-by: Bob Weinand <bob.weinand@datadoghq.com>

bwoebi force-pushed the leo.romanovsky/m3-ffe-evaluation-metrics branch from c722e14 to f69000f Compare June 5, 2026 12:58

Fix tests

867c3c5

bwoebi requested a review from a team as a code owner June 5, 2026 13:41

bwoebi approved these changes Jun 5, 2026

View reviewed changes

bwoebi merged commit 87f1683 into master Jun 5, 2026
2111 of 2133 checks passed

bwoebi deleted the leo.romanovsky/m3-ffe-evaluation-metrics branch June 5, 2026 14:31

github-actions Bot added this to the 1.21.0 milestone Jun 5, 2026

sameerank mentioned this pull request Jun 5, 2026

[FFL-2449] Add server-side flag evaluation metrics documentation DataDog/documentation#37257

Merged

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add FFE evaluation metrics#3911

Add FFE evaluation metrics#3911
bwoebi merged 54 commits into
masterfrom
leo.romanovsky/m3-ffe-evaluation-metrics

leoromanovsky commented May 22, 2026 •

edited

Loading

Uh oh!

datadog-official Bot commented May 22, 2026 •

edited by datadog-datadog-prod-us1-2 Bot

Loading

Uh oh!

pr-commenter Bot commented May 22, 2026 •

edited

Loading

Explanation

More details about the CI and significant changes

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

Uh oh!

sameerank left a comment

Uh oh!

sameerank Jun 4, 2026

Uh oh!

sameerank Jun 4, 2026

Uh oh!

sameerank Jun 4, 2026

Uh oh!

bwoebi left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Uh oh!

Conversation

leoromanovsky commented May 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Changes

Decisions

Validation

Uh oh!

datadog-official Bot commented May 22, 2026 • edited by datadog-datadog-prod-us1-2 Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

⚠️ Warnings

ℹ️ Info

Uh oh!

pr-commenter Bot commented May 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Benchmarks [ tracer ]

Explanation

More details about the CI and significant changes

scenario:HookBench/benchHookOverheadTraceMethod-opcache

scenario:MessagePackSerializationBench/benchMessagePackSerialization

scenario:MessagePackSerializationBench/benchMessagePackSerialization-opcache

scenario:TraceAnnotationsBench/benchTraceAnnotationOverhead-opcache

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

sameerank left a comment

Choose a reason for hiding this comment

Uh oh!

sameerank Jun 4, 2026

Choose a reason for hiding this comment

Uh oh!

sameerank Jun 4, 2026

Choose a reason for hiding this comment

Uh oh!

sameerank Jun 4, 2026

Choose a reason for hiding this comment

Uh oh!

bwoebi left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

leoromanovsky commented May 22, 2026 •

edited

Loading

datadog-official Bot commented May 22, 2026 •

edited by datadog-datadog-prod-us1-2 Bot

Loading

pr-commenter Bot commented May 22, 2026 •

edited

Loading