fix(test): isolate e2e suites by unique workflow/execution id#5888
Conversation
Automated Reviewer SuggestionsBased on the
|
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #5888 +/- ##
============================================
+ Coverage 54.09% 54.11% +0.02%
+ Complexity 2817 2812 -5
============================================
Files 1103 1103
Lines 42650 42624 -26
Branches 4588 4586 -2
============================================
- Hits 23070 23067 -3
+ Misses 18236 18217 -19
+ Partials 1344 1340 -4
*This pull request uses carry forward flags. Click here to find out more. ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
|
| config | throughput | MB/s | latency | max Δ latest / 7d | |
|---|---|---|---|---|---|
| 🔴 | bs=10 sw=10 sl=64 | 361 | 0.22 | 27,844/41,024/41,024 us | 🔴 +10.7% / 🔴 +17.3% |
| ⚪ | bs=100 sw=10 sl=64 | 812 | 0.496 | 121,677/142,043/142,043 us | ⚪ within ±5% / 🔴 -9.0% |
| ⚪ | bs=1000 sw=10 sl=64 | 918 | 0.56 | 1,093,160/1,131,893/1,131,893 us | ⚪ within ±5% / 🔴 +12.4% |
Baseline details
Latest main 439ea72 from same runner
| config | metric | PR | latest main | 7d avg | Δ latest | Δ 7d |
|---|---|---|---|---|---|---|
| bs=10 sw=10 sl=64 | throughput | 361 tuples/sec | 384 tuples/sec | 410.82 tuples/sec | -6.0% | -12.1% |
| bs=10 sw=10 sl=64 | MB/s | 0.22 MB/s | 0.235 MB/s | 0.251 MB/s | -6.4% | -12.3% |
| bs=10 sw=10 sl=64 | p50 | 27,844 us | 25,144 us | 23,785 us | +10.7% | +17.1% |
| bs=10 sw=10 sl=64 | p95 | 41,024 us | 42,620 us | 34,980 us | -3.7% | +17.3% |
| bs=10 sw=10 sl=64 | p99 | 41,024 us | 42,620 us | 34,980 us | -3.7% | +17.3% |
| bs=100 sw=10 sl=64 | throughput | 812 tuples/sec | 821 tuples/sec | 891.94 tuples/sec | -1.1% | -9.0% |
| bs=100 sw=10 sl=64 | MB/s | 0.496 MB/s | 0.501 MB/s | 0.544 MB/s | -1.0% | -8.9% |
| bs=100 sw=10 sl=64 | p50 | 121,677 us | 118,606 us | 112,277 us | +2.6% | +8.4% |
| bs=100 sw=10 sl=64 | p95 | 142,043 us | 146,972 us | 139,802 us | -3.4% | +1.6% |
| bs=100 sw=10 sl=64 | p99 | 142,043 us | 146,972 us | 139,802 us | -3.4% | +1.6% |
| bs=1000 sw=10 sl=64 | throughput | 918 tuples/sec | 938 tuples/sec | 1,041 tuples/sec | -2.1% | -11.8% |
| bs=1000 sw=10 sl=64 | MB/s | 0.56 MB/s | 0.572 MB/s | 0.635 MB/s | -2.1% | -11.9% |
| bs=1000 sw=10 sl=64 | p50 | 1,093,160 us | 1,069,627 us | 972,714 us | +2.2% | +12.4% |
| bs=1000 sw=10 sl=64 | p95 | 1,131,893 us | 1,107,261 us | 1,023,057 us | +2.2% | +10.6% |
| bs=1000 sw=10 sl=64 | p99 | 1,131,893 us | 1,107,261 us | 1,023,057 us | +2.2% | +10.6% |
Raw CSV
config_idx,batch_size,schema_width,string_len,num_batches,total_ms,total_tuples,total_bytes,tuples_per_sec,mb_per_sec,lat_p50_us,lat_p95_us,lat_p99_us
0,10,10,64,20,554.57,200,128000,361,0.220,27843.59,41024.08,41024.08
1,100,10,64,20,2462.66,2000,1280000,812,0.496,121676.66,142043.34,142043.34
2,1000,10,64,20,21784.47,20000,12800000,918,0.560,1093159.85,1131893.35,1131893.35|
/request-review @aglinxinyuan @Yicong-Huang |
There was a problem hiding this comment.
Pull request overview
This PR addresses flaky Amber end-to-end (e2e) tests caused by multiple suites concurrently sharing the same (workflowId, executionId) and DB fixture primary keys, which leads to collisions in the shared Iceberg result keyspace and Postgres rows. It introduces an explicit per-suite identifier to isolate workflows/executions and related DB fixtures so the suites can run concurrently without interfering.
Changes:
- Added
TestUtils.workflowContext(id, settings)to create aWorkflowContextwith bothworkflowIdandexecutionIdset from a provided suite id. - Parameterized e2e DB fixtures and
setUp/cleanupWorkflowExecutionDatato use the suite id (including deriving a unique email from that id). - Updated the materializing e2e suites to use distinct ids (DataProcessing=1, Pause=2, Reconfiguration=3, ReconfigurationIntegration=4).
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| amber/src/test/scala/org/apache/texera/amber/engine/e2e/TestUtils.scala | Adds an id-based WorkflowContext constructor and id-parameterized DB fixtures/setup/cleanup to avoid cross-suite collisions. |
| amber/src/test/scala/org/apache/texera/amber/engine/e2e/DataProcessingSpec.scala | Assigns a stable per-suite id and uses it for workflow contexts and DB setup/cleanup. |
| amber/src/test/scala/org/apache/texera/amber/engine/e2e/PauseSpec.scala | Assigns a stable per-suite id and uses it for workflow context creation and DB setup/cleanup. |
| amber/src/test/scala/org/apache/texera/amber/engine/e2e/ReconfigurationSpec.scala | Assigns a stable per-suite id and uses it for workflow context creation and DB setup/cleanup. |
| amber/src/test/integration/org/apache/texera/amber/engine/e2e/ReconfigurationIntegrationSpec.scala | Assigns a stable per-suite id and uses it for workflow contexts and DB setup/cleanup (including warmup). |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Yicong-Huang
left a comment
There was a problem hiding this comment.
LGTM, thanks. Let's mark this PR a fix(test) and backport it to v1.2 I assume this flakiness should also happen for v1.2
|
cc @xuang7 for backport |
|
As there are conflicts, @Ma77Ball please help on manually backport this to release/v1.2. The reason for backport is this flaky test could causes CI to fail from time to time on that branch. |
…e/v1.2 backport] (#5903) ### What changes were proposed in this PR? - Backport of #5888 to `release/v1.2`, applied as `git cherry-pick -x 8803d08`. - Each amber e2e suite now gets a distinct id via a new `TestUtils.workflowContext(id, ...)` helper (DataProcessingSpec=1, PauseSpec=2, ReconfigurationSpec=3, ReconfigurationIntegrationSpec=4), and the DB fixture rows (user/workflow/version/execution, plus a per-id email) are derived from that id, so concurrent suites no longer collide in the shared Iceberg keyspace. - One conflict resolved, in `TestUtils.scala` imports only: kept release/v1.2's `org.apache.texera.amber.config.StorageConfig` path and took the cherry-pick's added `ExecutionIdentity`/`WorkflowIdentity`/`WorkflowSettings` imports needed by the new helper. The four spec files applied cleanly. - The `runWorkflowAndReadTerminalResults`/`readMaterializedResults` helpers that exist on `main` came from a separate change not on `release/v1.2`; #5888 does not touch them, so they are correctly absent here. ### Any related issues, documentation, discussions? Related to #5887 ### How was this PR tested? - Behavior is unchanged from #5888 (test-only isolation), so it relies on the existing e2e suites carried over verbatim. Reviewer can run them in the integration job: `sbt "amber/IntegrationTest/testOnly *ReconfigurationIntegrationSpec"`, and the unit e2e suites via `sbt "amber/testOnly *DataProcessingSpec *PauseSpec *ReconfigurationSpec"`; expect all green with no Iceberg `CommitFailedException` keyspace collisions across concurrently-running suites. - Local `sbt` could not load on `release/v1.2` (its `project/` lacks `AddMetaInfLicenseFiles`, unrelated to this change), so full compile/test is left to CI on this PR. ### Was this PR authored or co-authored using generative AI tooling? Co-authored with Claude Opus 4.8 in compliance with ASF
…#5888) ### What changes were proposed in this PR? - Add `TestUtils.workflowContext(id, settings)` that sets both `workflowId` and `executionId` to `id`, and make the DB fixtures plus `setUp`/`cleanupWorkflowExecutionData` take an `id` (the user email is derived from `id` to avoid the unique-email collision). - Give each materializing e2e suite a distinct id so concurrent suites no longer share an Iceberg result keyspace or DB rows: DataProcessingSpec=1, PauseSpec=2, ReconfigurationSpec=3, ReconfigurationIntegrationSpec=4. - Test-only change, no production code; BatchSizePropagationSpec and CheckpointSpec are untouched because they do not materialize results. ### Any related issues, documentation, discussions? Closes: apache#5887 ### How was this PR tested? - `sbt "WorkflowExecutionService/Test/compile"` compiles clean on Java 17. - Run the previously-flaky suite against the integration services (Postgres test DB + MinIO/S3 + Iceberg catalog, as in the `build / amber` CI job): `sbt "WorkflowExecutionService/testOnly *DataProcessingSpec"`; expect all DataProcessingSpec tests green with no CommitFailedException flake. - The timing-dependent flake could not be reproduced locally (no MinIO/Iceberg env), so final verification is the `build / amber` CI job staying green across re-runs. ### Was this PR authored or co-authored using generative AI tooling? Co-authored with Claude Opus 4.8 in compliance with ASF
Reworks the loop back-edge write address per the apache#5900 review decision: the URI is constant per-execution config, not per-iteration data, so it no longer rides State rows / the StateFrame envelope (the merged 3-column format has no loop_start_state_uri). Instead the scheduler resolves it and ships it to workers at setup: - PhysicalOp gains a compile-time `isLoopStart` marker (set by LoopStartOpDesc, mirroring requiresMaterializedExecution). - WorkflowExecutionCoordinator derives {LoopStart logical op id -> state URI of its single input port} from the final resource-allocated schedule; the two single-port/single-reader guards that lived in the Python worker's _compute_loop_start_id move controller-side. - InitializeExecutorRequest gains `map<string,string> loopStartStateUris` (field 4), sent to every worker each region (re-)run; the Python handler stores it on Context. No static LoopStart<->LoopEnd pairing is needed: a Loop End selects the entry by the loop_start_id carried on the StateFrame it consumes (nested loops work unchanged). - MainLoop: _compute_loop_start_id shrinks to the id parse; _jump_to_loop_start looks the URI up from the injected config and fails loudly (before the jump RPC) if it is missing; the inner-LoopStart nested pass-through gate keys on frame.loop_start_id instead of the removed URI field. - LoopIntegrationSpec adopts the per-suite id isolation from apache#5888 (specId 5), and WorkflowService drops a leftover errorSubject call removed on main by apache#5922. Tests: reworked test_main_loop.py to the 3-arg emit convention and the config-driven jump (plus a new missing-URI fail-loud case), added InitializeExecutorHandler coverage for the new field, pinned isLoopStart true/false in the Loop Start/End descriptor specs.
What changes were proposed in this PR?
TestUtils.workflowContext(id, settings)that sets bothworkflowIdandexecutionIdtoid, and make the DB fixtures plussetUp/cleanupWorkflowExecutionDatatake anid(the user email is derived fromidto avoid the unique-email collision).Any related issues, documentation, discussions?
Closes: #5887
How was this PR tested?
sbt "WorkflowExecutionService/Test/compile"compiles clean on Java 17.build / amberCI job):sbt "WorkflowExecutionService/testOnly *DataProcessingSpec"; expect all DataProcessingSpec tests green with no CommitFailedException flake.build / amberCI job staying green across re-runs.Was this PR authored or co-authored using generative AI tooling?
Co-authored with Claude Opus 4.8 in compliance with ASF