Skip to content

fix(test): isolate e2e suites by unique workflow/execution id#5888

Merged
Yicong-Huang merged 4 commits into
apache:mainfrom
Ma77Ball:fix-e2e-iceberg-keyspace-collision
Jun 23, 2026
Merged

fix(test): isolate e2e suites by unique workflow/execution id#5888
Yicong-Huang merged 4 commits into
apache:mainfrom
Ma77Ball:fix-e2e-iceberg-keyspace-collision

Conversation

@Ma77Ball

Copy link
Copy Markdown
Contributor

What changes were proposed in this PR?

  • Add TestUtils.workflowContext(id, settings) that sets both workflowId and executionId to id, and make the DB fixtures plus setUp/cleanupWorkflowExecutionData take an id (the user email is derived from id to avoid the unique-email collision).
  • Give each materializing e2e suite a distinct id so concurrent suites no longer share an Iceberg result keyspace or DB rows: DataProcessingSpec=1, PauseSpec=2, ReconfigurationSpec=3, ReconfigurationIntegrationSpec=4.
  • Test-only change, no production code; BatchSizePropagationSpec and CheckpointSpec are untouched because they do not materialize results.

Any related issues, documentation, discussions?

Closes: #5887

How was this PR tested?

  • sbt "WorkflowExecutionService/Test/compile" compiles clean on Java 17.
  • Run the previously-flaky suite against the integration services (Postgres test DB + MinIO/S3 + Iceberg catalog, as in the build / amber CI job): sbt "WorkflowExecutionService/testOnly *DataProcessingSpec"; expect all DataProcessingSpec tests green with no CommitFailedException flake.
  • The timing-dependent flake could not be reproduced locally (no MinIO/Iceberg env), so final verification is the build / amber CI job staying green across re-runs.

Was this PR authored or co-authored using generative AI tooling?

Co-authored with Claude Opus 4.8 in compliance with ASF

@github-actions

Copy link
Copy Markdown
Contributor

Automated Reviewer Suggestions

Based on the git blame history of the changed files, we recommend the following reviewers:

  • Contributors with relevant context: @Yicong-Huang, @aglinxinyuan
    You can notify them by mentioning @Yicong-Huang, @aglinxinyuan in a comment.

@codecov-commenter

codecov-commenter commented Jun 22, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 54.11%. Comparing base (439ea72) to head (39f3d0c).

Additional details and impacted files
@@             Coverage Diff              @@
##               main    #5888      +/-   ##
============================================
+ Coverage     54.09%   54.11%   +0.02%     
+ Complexity     2817     2812       -5     
============================================
  Files          1103     1103              
  Lines         42650    42624      -26     
  Branches       4588     4586       -2     
============================================
- Hits          23070    23067       -3     
+ Misses        18236    18217      -19     
+ Partials       1344     1340       -4     
Flag Coverage Δ *Carryforward flag
access-control-service 70.44% <ø> (ø) Carriedforward from 3ced705
agent-service 34.36% <ø> (ø) Carriedforward from 3ced705
amber 55.63% <ø> (+0.01%) ⬆️
computing-unit-managing-service 1.65% <ø> (ø) Carriedforward from 3ced705
config-service 57.35% <ø> (ø) Carriedforward from 3ced705
file-service 57.36% <ø> (-1.24%) ⬇️ Carriedforward from 3ced705
frontend 48.15% <ø> (+0.06%) ⬆️ Carriedforward from 3ced705
pyamber 90.20% <ø> (ø) Carriedforward from 3ced705
python 90.76% <ø> (ø) Carriedforward from 3ced705
workflow-compiling-service 58.69% <ø> (ø) Carriedforward from 3ced705

*This pull request uses carry forward flags. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@Ma77Ball Ma77Ball marked this pull request as ready for review June 22, 2026 11:07
@github-actions github-actions Bot requested a review from aglinxinyuan June 22, 2026 11:16
@github-actions

github-actions Bot commented Jun 22, 2026

Copy link
Copy Markdown
Contributor

⚠️ Benchmark changes need a look

🟢 0 better · 🔴 3 worse · ⚪ 12 noise (<±5%) · 0 without baseline

Compared against main 439ea72 benchmarked on this same runner, so the delta is largely free of cross-runner hardware noise. The "7d avg" column still reflects the gh-pages dashboard. Treat <±5% as noise unless repeated.

Dashboard · Run

config throughput MB/s latency max Δ latest / 7d
🔴 bs=10 sw=10 sl=64 361 0.22 27,844/41,024/41,024 us 🔴 +10.7% / 🔴 +17.3%
bs=100 sw=10 sl=64 812 0.496 121,677/142,043/142,043 us ⚪ within ±5% / 🔴 -9.0%
bs=1000 sw=10 sl=64 918 0.56 1,093,160/1,131,893/1,131,893 us ⚪ within ±5% / 🔴 +12.4%
Baseline details

Latest main 439ea72 from same runner

config metric PR latest main 7d avg Δ latest Δ 7d
bs=10 sw=10 sl=64 throughput 361 tuples/sec 384 tuples/sec 410.82 tuples/sec -6.0% -12.1%
bs=10 sw=10 sl=64 MB/s 0.22 MB/s 0.235 MB/s 0.251 MB/s -6.4% -12.3%
bs=10 sw=10 sl=64 p50 27,844 us 25,144 us 23,785 us +10.7% +17.1%
bs=10 sw=10 sl=64 p95 41,024 us 42,620 us 34,980 us -3.7% +17.3%
bs=10 sw=10 sl=64 p99 41,024 us 42,620 us 34,980 us -3.7% +17.3%
bs=100 sw=10 sl=64 throughput 812 tuples/sec 821 tuples/sec 891.94 tuples/sec -1.1% -9.0%
bs=100 sw=10 sl=64 MB/s 0.496 MB/s 0.501 MB/s 0.544 MB/s -1.0% -8.9%
bs=100 sw=10 sl=64 p50 121,677 us 118,606 us 112,277 us +2.6% +8.4%
bs=100 sw=10 sl=64 p95 142,043 us 146,972 us 139,802 us -3.4% +1.6%
bs=100 sw=10 sl=64 p99 142,043 us 146,972 us 139,802 us -3.4% +1.6%
bs=1000 sw=10 sl=64 throughput 918 tuples/sec 938 tuples/sec 1,041 tuples/sec -2.1% -11.8%
bs=1000 sw=10 sl=64 MB/s 0.56 MB/s 0.572 MB/s 0.635 MB/s -2.1% -11.9%
bs=1000 sw=10 sl=64 p50 1,093,160 us 1,069,627 us 972,714 us +2.2% +12.4%
bs=1000 sw=10 sl=64 p95 1,131,893 us 1,107,261 us 1,023,057 us +2.2% +10.6%
bs=1000 sw=10 sl=64 p99 1,131,893 us 1,107,261 us 1,023,057 us +2.2% +10.6%
Raw CSV
config_idx,batch_size,schema_width,string_len,num_batches,total_ms,total_tuples,total_bytes,tuples_per_sec,mb_per_sec,lat_p50_us,lat_p95_us,lat_p99_us
0,10,10,64,20,554.57,200,128000,361,0.220,27843.59,41024.08,41024.08
1,100,10,64,20,2462.66,2000,1280000,812,0.496,121676.66,142043.34,142043.34
2,1000,10,64,20,21784.47,20000,12800000,918,0.560,1093159.85,1131893.35,1131893.35

@Ma77Ball

Copy link
Copy Markdown
Contributor Author

/request-review @aglinxinyuan @Yicong-Huang

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses flaky Amber end-to-end (e2e) tests caused by multiple suites concurrently sharing the same (workflowId, executionId) and DB fixture primary keys, which leads to collisions in the shared Iceberg result keyspace and Postgres rows. It introduces an explicit per-suite identifier to isolate workflows/executions and related DB fixtures so the suites can run concurrently without interfering.

Changes:

  • Added TestUtils.workflowContext(id, settings) to create a WorkflowContext with both workflowId and executionId set from a provided suite id.
  • Parameterized e2e DB fixtures and setUp/cleanupWorkflowExecutionData to use the suite id (including deriving a unique email from that id).
  • Updated the materializing e2e suites to use distinct ids (DataProcessing=1, Pause=2, Reconfiguration=3, ReconfigurationIntegration=4).

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated no comments.

Show a summary per file
File Description
amber/src/test/scala/org/apache/texera/amber/engine/e2e/TestUtils.scala Adds an id-based WorkflowContext constructor and id-parameterized DB fixtures/setup/cleanup to avoid cross-suite collisions.
amber/src/test/scala/org/apache/texera/amber/engine/e2e/DataProcessingSpec.scala Assigns a stable per-suite id and uses it for workflow contexts and DB setup/cleanup.
amber/src/test/scala/org/apache/texera/amber/engine/e2e/PauseSpec.scala Assigns a stable per-suite id and uses it for workflow context creation and DB setup/cleanup.
amber/src/test/scala/org/apache/texera/amber/engine/e2e/ReconfigurationSpec.scala Assigns a stable per-suite id and uses it for workflow context creation and DB setup/cleanup.
amber/src/test/integration/org/apache/texera/amber/engine/e2e/ReconfigurationIntegrationSpec.scala Assigns a stable per-suite id and uses it for workflow contexts and DB setup/cleanup (including warmup).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@Yicong-Huang Yicong-Huang left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks. Let's mark this PR a fix(test) and backport it to v1.2 I assume this flakiness should also happen for v1.2

@Yicong-Huang

Copy link
Copy Markdown
Contributor

cc @xuang7 for backport

@xuang7 xuang7 added the release/v1.2 back porting to release/v1.2 label Jun 22, 2026
@Ma77Ball Ma77Ball changed the title test(amber): isolate e2e suites by unique workflow/execution id fix(test): isolate e2e suites by unique workflow/execution id Jun 22, 2026
@Yicong-Huang Yicong-Huang removed the release/v1.2 back porting to release/v1.2 label Jun 23, 2026
@Yicong-Huang

Copy link
Copy Markdown
Contributor

As there are conflicts, @Ma77Ball please help on manually backport this to release/v1.2. The reason for backport is this flaky test could causes CI to fail from time to time on that branch.

@Yicong-Huang Yicong-Huang enabled auto-merge June 23, 2026 05:27
@Yicong-Huang Yicong-Huang added this pull request to the merge queue Jun 23, 2026
Merged via the queue into apache:main with commit 8803d08 Jun 23, 2026
32 of 36 checks passed
Yicong-Huang pushed a commit that referenced this pull request Jun 23, 2026
…e/v1.2 backport] (#5903)

### What changes were proposed in this PR?
- Backport of #5888 to `release/v1.2`, applied as `git cherry-pick -x
8803d08`.
- Each amber e2e suite now gets a distinct id via a new
`TestUtils.workflowContext(id, ...)` helper (DataProcessingSpec=1,
PauseSpec=2, ReconfigurationSpec=3, ReconfigurationIntegrationSpec=4),
and the DB fixture rows (user/workflow/version/execution, plus a per-id
email) are derived from that id, so concurrent suites no longer collide
in the shared Iceberg keyspace.
- One conflict resolved, in `TestUtils.scala` imports only: kept
release/v1.2's `org.apache.texera.amber.config.StorageConfig` path and
took the cherry-pick's added
`ExecutionIdentity`/`WorkflowIdentity`/`WorkflowSettings` imports needed
by the new helper. The four spec files applied cleanly.
- The `runWorkflowAndReadTerminalResults`/`readMaterializedResults`
helpers that exist on `main` came from a separate change not on
`release/v1.2`; #5888 does not touch them, so they are correctly absent
here.
### Any related issues, documentation, discussions?
Related to #5887 

### How was this PR tested?
- Behavior is unchanged from #5888 (test-only isolation), so it relies
on the existing e2e suites carried over verbatim. Reviewer can run them
in the integration job: `sbt "amber/IntegrationTest/testOnly
*ReconfigurationIntegrationSpec"`, and the unit e2e suites via `sbt
"amber/testOnly *DataProcessingSpec *PauseSpec *ReconfigurationSpec"`;
expect all green with no Iceberg `CommitFailedException` keyspace
collisions across concurrently-running suites.
- Local `sbt` could not load on `release/v1.2` (its `project/` lacks
`AddMetaInfLicenseFiles`, unrelated to this change), so full
compile/test is left to CI on this PR.
### Was this PR authored or co-authored using generative AI tooling?
Co-authored with Claude Opus 4.8 in compliance with ASF
yangzhang75 pushed a commit to yangzhang75/texera that referenced this pull request Jun 24, 2026
…#5888)

### What changes were proposed in this PR?
- Add `TestUtils.workflowContext(id, settings)` that sets both
`workflowId` and `executionId` to `id`, and make the DB fixtures plus
`setUp`/`cleanupWorkflowExecutionData` take an `id` (the user email is
derived from `id` to avoid the unique-email collision).
- Give each materializing e2e suite a distinct id so concurrent suites
no longer share an Iceberg result keyspace or DB rows:
DataProcessingSpec=1, PauseSpec=2, ReconfigurationSpec=3,
ReconfigurationIntegrationSpec=4.
- Test-only change, no production code; BatchSizePropagationSpec and
CheckpointSpec are untouched because they do not materialize results.
### Any related issues, documentation, discussions?
Closes: apache#5887
### How was this PR tested?
- `sbt "WorkflowExecutionService/Test/compile"` compiles clean on Java
17.
- Run the previously-flaky suite against the integration services
(Postgres test DB + MinIO/S3 + Iceberg catalog, as in the `build /
amber` CI job): `sbt "WorkflowExecutionService/testOnly
*DataProcessingSpec"`; expect all DataProcessingSpec tests green with no
CommitFailedException flake.
- The timing-dependent flake could not be reproduced locally (no
MinIO/Iceberg env), so final verification is the `build / amber` CI job
staying green across re-runs.
### Was this PR authored or co-authored using generative AI tooling?
Co-authored with Claude Opus 4.8 in compliance with ASF
aglinxinyuan added a commit to aglinxinyuan/texera that referenced this pull request Jul 2, 2026
Reworks the loop back-edge write address per the apache#5900 review decision:
the URI is constant per-execution config, not per-iteration data, so it
no longer rides State rows / the StateFrame envelope (the merged 3-column
format has no loop_start_state_uri). Instead the scheduler resolves it and
ships it to workers at setup:

- PhysicalOp gains a compile-time `isLoopStart` marker (set by
  LoopStartOpDesc, mirroring requiresMaterializedExecution).
- WorkflowExecutionCoordinator derives {LoopStart logical op id -> state
  URI of its single input port} from the final resource-allocated
  schedule; the two single-port/single-reader guards that lived in the
  Python worker's _compute_loop_start_id move controller-side.
- InitializeExecutorRequest gains `map<string,string> loopStartStateUris`
  (field 4), sent to every worker each region (re-)run; the Python
  handler stores it on Context. No static LoopStart<->LoopEnd pairing is
  needed: a Loop End selects the entry by the loop_start_id carried on
  the StateFrame it consumes (nested loops work unchanged).
- MainLoop: _compute_loop_start_id shrinks to the id parse;
  _jump_to_loop_start looks the URI up from the injected config and fails
  loudly (before the jump RPC) if it is missing; the inner-LoopStart
  nested pass-through gate keys on frame.loop_start_id instead of the
  removed URI field.
- LoopIntegrationSpec adopts the per-suite id isolation from apache#5888
  (specId 5), and WorkflowService drops a leftover errorSubject call
  removed on main by apache#5922.

Tests: reworked test_main_loop.py to the 3-arg emit convention and the
config-driven jump (plus a new missing-URI fail-loud case), added
InitializeExecutorHandler coverage for the new field, pinned
isLoopStart true/false in the Loop Start/End descriptor specs.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

e2e suites collide on a shared Iceberg result keyspace under parallel tests

5 participants