[FLUSS-3483][coordinator] Reject lake snapshot commits for dropped tables by Jackeyzhe · Pull Request #3494 · apache/fluss

Jackeyzhe · 2026-06-18T04:19:05Z

Purpose

Linked issue: close #3483

This PR fixes a race where a stale V2 lake snapshot commit can be accepted after a table is dropped and recreated. In that case, the recreated table may get a /laketable znode pointing to an offsets metadata file generated for the old table id, causing lake tiering to repeatedly fail when reading the stale offsets path.

Brief change log

Reject V2 lake snapshot commits when the target table id no longer exists or is queued for deletion.
Validate the V2 lake snapshot offsets metadata before writing it to ZK:
- the offsets file must be readable;
- the file-level tableId must match the commit table id;
- every TableBucket in the offsets file must also belong to the commit table id.
Validate both tiered offsets and readable offsets metadata paths when they differ.
Add regression coverage for:
- non-existent table id commits;
- table queued for deletion;
- recreated table id with stale offsets metadata from the old table id;
- RPC-level V2 commit after table drop.

Tests

mvn -pl fluss-server -DskipITs -Dcheckstyle.skip -Drat.skip -Dspotless.check.skip=true -Dtest=CoordinatorEventProcessorTest#testCommitLakeTableSnapshotV2RejectsMismatchedOffsetsFileTableId+testCommitLakeTableSnapshotV2RejectsNonExistentTable+testCommitLakeTableSnapshotV2RejectsQueuedForDeletionTable test

mvn -pl fluss-server -DskipITs -Dcheckstyle.skip -Drat.skip -Dspotless.check.skip=true -Dtest=CommitLakeTableSnapshotITCase#testCommitLakeTableSnapshotV2RejectedAfterDropTable test

API and Format

no

Documentation

no

Copilot

Pull request overview

This PR addresses a coordinator race where stale V2 lake snapshot commits could be accepted after a table drop/recreate, potentially writing an offsets-metadata pointer for an old tableId and permanently breaking lake tiering for the recreated table.

Changes:

Add coordinator-side rejection for V2 lake snapshot commits targeting a tableId that no longer exists or is queued for deletion.
Add regression tests (unit + integration) covering non-existent/queued-for-deletion tableIds and an RPC-level V2 commit after a drop.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

File	Description
fluss-server/src/main/java/org/apache/fluss/server/coordinator/CoordinatorEventProcessor.java	Adds a guard to reject V2 snapshot commits for dropped / queued-for-deletion tables.
fluss-server/src/test/java/org/apache/fluss/server/coordinator/CoordinatorEventProcessorTest.java	Adds unit tests verifying V2 commit rejection for non-existent and queued-for-deletion tables.
fluss-server/src/test/java/org/apache/fluss/server/replica/CommitLakeTableSnapshotITCase.java	Adds an IT case verifying V2 snapshot commit is rejected after dropping a table.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

                                    throw new FlussRuntimeException(
                                            "Lake snapshot metadata is null for table " + tableId);
                                }
+                                ensureTableCanAcceptLakeSnapshot(tableId);
                                lakeTableHelper.registerLakeTableSnapshotV2(


luoyuxia

@Jackeyzhe Thanks for the pr. LGTM overall. Left minor comments. PTAL

luoyuxia · 2026-06-24T03:48:03Z

    }

+    @Test
+    void testCommitLakeTableSnapshotV2RejectsNonExistentTable() throws Exception {


nit: I think we can remove the ut since testCommitLakeTableSnapshotV2RejectedAfterDropTable alreay cover it. I want to keep this test class short if possible

luoyuxia · 2026-06-24T03:48:43Z

                });
    }

+    private void ensureTableCanAcceptLakeSnapshot(long tableId) {


nit: rename to ensureTableNotDeleted? I can image maybe other event maywell need this method.

luoyuxia · 2026-07-04T09:11:46Z

@Jackeyzhe Thanks for working on this.

I wonder if we can make this fix simpler and close the race at the ZK write boundary instead. The current approach adds coordinator-side lifecycle checks before submitting the actual ZK write to ioExecutor, but that check can become stale before registerLakeTableSnapshotV2 runs. For example, the commit event may pass the context check while the table still exists, then dropTable can delete /tabletservers/tables/<tableId>, and the later IO task can still recreate /tabletservers/tables/<tableId>/laketable because upsertLakeTable uses creatingParentsIfNeeded().

Would it be enough to change the create branch in ZooKeeperClient.upsertLakeTable to avoid recreating the table parent?

// Do not recreate /tabletservers/tables/<tableId> for a late commit.
zkClient.create().forPath(zkPath, zkData);

Then a late commit after deleteTableAssignment(tableId) would fail with NoNodeException instead of resurrecting the old tableId znode. We can convert that to TableNotExistException for the commit response. The coordinator-side queued/deleted check can still be kept as an early rejection, but the ZK write itself should be what prevents the resurrection race.

Jackeyzhe · 2026-07-04T13:33:58Z

@Jackeyzhe Thanks for working on this.

I wonder if we can make this fix simpler and close the race at the ZK write boundary instead. The current approach adds coordinator-side lifecycle checks before submitting the actual ZK write to ioExecutor, but that check can become stale before registerLakeTableSnapshotV2 runs. For example, the commit event may pass the context check while the table still exists, then dropTable can delete /tabletservers/tables/<tableId>, and the later IO task can still recreate /tabletservers/tables/<tableId>/laketable because upsertLakeTable uses creatingParentsIfNeeded().

Would it be enough to change the create branch in ZooKeeperClient.upsertLakeTable to avoid recreating the table parent?
// Do not recreate /tabletservers/tables/<tableId> for a late commit.
zkClient.create().forPath(zkPath, zkData);
Then a late commit after deleteTableAssignment(tableId) would fail with NoNodeException instead of resurrecting the old tableId znode. We can convert that to TableNotExistException for the commit response. The coordinator-side queued/deleted check can still be kept as an early rejection, but the ZK write itself should be what prevents the resurrection race.

Thanks for pointing this out. I agree that the coordinator-side check is only an early rejection and does not fully close the race before the actual ZK write.

I'll update the fix to make the ZK write boundary authoritative: for the create branch in ZooKeeperClient.upsertLakeTable, avoid creating missing parents and use create().forPath(...) directly. Then a late commit after the table assignment znode has been deleted will fail with NoNodeException instead of recreating /tabletservers/tables/. I'll convert that failure to TableNotExistException for the commit response.

I'll keep the coordinator-side queued/deleted check as an early rejection, but rely on the ZK create semantics to prevent resurrection.

luoyuxia · 2026-07-04T13:58:40Z

@Jackeyzhe Thanks for working on this.
I wonder if we can make this fix simpler and close the race at the ZK write boundary instead. The current approach adds coordinator-side lifecycle checks before submitting the actual ZK write to ioExecutor, but that check can become stale before registerLakeTableSnapshotV2 runs. For example, the commit event may pass the context check while the table still exists, then dropTable can delete /tabletservers/tables/<tableId>, and the later IO task can still recreate /tabletservers/tables/<tableId>/laketable because upsertLakeTable uses creatingParentsIfNeeded().
Would it be enough to change the create branch in ZooKeeperClient.upsertLakeTable to avoid recreating the table parent?
// Do not recreate /tabletservers/tables/<tableId> for a late commit.
zkClient.create().forPath(zkPath, zkData);
Then a late commit after deleteTableAssignment(tableId) would fail with NoNodeException instead of resurrecting the old tableId znode. We can convert that to TableNotExistException for the commit response. The coordinator-side queued/deleted check can still be kept as an early rejection, but the ZK write itself should be what prevents the resurrection race.
Thanks for pointing this out. I agree that the coordinator-side check is only an early rejection and does not fully close the race before the actual ZK write.

I'll update the fix to make the ZK write boundary authoritative: for the create branch in ZooKeeperClient.upsertLakeTable, avoid creating missing parents and use create().forPath(...) directly. Then a late commit after the table assignment znode has been deleted will fail with NoNodeException instead of recreating /tabletservers/tables/. I'll convert that failure to TableNotExistException for the commit response.

I'll keep the coordinator-side queued/deleted check as an early rejection, but rely on the ZK create semantics to prevent resurrection.

Can we remove the coordinator-side queued/deleted check to keep code simple?

Jackeyzhe · 2026-07-04T14:21:38Z

@Jackeyzhe Thanks for working on this.
I wonder if we can make this fix simpler and close the race at the ZK write boundary instead. The current approach adds coordinator-side lifecycle checks before submitting the actual ZK write to ioExecutor, but that check can become stale before registerLakeTableSnapshotV2 runs. For example, the commit event may pass the context check while the table still exists, then dropTable can delete /tabletservers/tables/<tableId>, and the later IO task can still recreate /tabletservers/tables/<tableId>/laketable because upsertLakeTable uses creatingParentsIfNeeded().
Would it be enough to change the create branch in ZooKeeperClient.upsertLakeTable to avoid recreating the table parent?
// Do not recreate /tabletservers/tables/<tableId> for a late commit.
zkClient.create().forPath(zkPath, zkData);
Then a late commit after deleteTableAssignment(tableId) would fail with NoNodeException instead of resurrecting the old tableId znode. We can convert that to TableNotExistException for the commit response. The coordinator-side queued/deleted check can still be kept as an early rejection, but the ZK write itself should be what prevents the resurrection race.
Thanks for pointing this out. I agree that the coordinator-side check is only an early rejection and does not fully close the race before the actual ZK write.
I'll update the fix to make the ZK write boundary authoritative: for the create branch in ZooKeeperClient.upsertLakeTable, avoid creating missing parents and use create().forPath(...) directly. Then a late commit after the table assignment znode has been deleted will fail with NoNodeException instead of recreating /tabletservers/tables/. I'll convert that failure to TableNotExistException for the commit response.
I'll keep the coordinator-side queued/deleted check as an early rejection, but rely on the ZK create semantics to prevent resurrection.
Can we remove the coordinator-side queued/deleted check to keep code simple?

sure, I'll remove it

…ng dropped tables Scope lake snapshot commit fix to dropped table race

…ng dropped tables

Jackeyzhe changed the title ~~[FLUSS-3483][coordinator] Validate lake snapshot offsets before commit~~ [FLUSS-3483][coordinator] Reject lake snapshot commits for dropped tables Jun 24, 2026

Jackeyzhe mentioned this pull request Jun 24, 2026

commitLakeTableSnapshot writes stale snapshot path after dropTable, leaving the tiering pipeline permanently stuck #3483

Open

2 tasks

luoyuxia requested a review from Copilot June 24, 2026 02:03

Copilot started reviewing on behalf of luoyuxia June 24, 2026 02:03 View session

Copilot AI reviewed Jun 24, 2026

View reviewed changes

luoyuxia reviewed Jun 24, 2026

View reviewed changes

Jackeyzhe force-pushed the fluss-3483 branch from 919b0d1 to 6b5dfee Compare June 24, 2026 15:11

Jackeyzhe requested a review from luoyuxia June 26, 2026 14:32

Jackeyzhe force-pushed the fluss-3483 branch from 6b5dfee to 3a6d198 Compare June 26, 2026 14:58

Jackeyzhe force-pushed the fluss-3483 branch from c0c1252 to 919b0d1 Compare July 4, 2026 14:45

[FLUSS-3483][server] Prevent late lake snapshot commits from recreati…

45acb33

…ng dropped tables Scope lake snapshot commit fix to dropped table race

Jackeyzhe force-pushed the fluss-3483 branch 2 times, most recently from 17eb7f6 to 45acb33 Compare July 4, 2026 15:18

jackeyzhe added 3 commits July 4, 2026 23:21

[FLUSS-3483][server] Prevent late lake snapshot commits from recreati…

d969a4e

…ng dropped tables

[FLUSS-3483][server] Prevent late lake snapshot commits from recreati…

593878c

…ng dropped tables

[FLUSS-3483][server] Prevent late lake snapshot commits from recreati…

0166adc

…ng dropped tables

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[FLUSS-3483][coordinator] Reject lake snapshot commits for dropped tables#3494

[FLUSS-3483][coordinator] Reject lake snapshot commits for dropped tables#3494
Jackeyzhe wants to merge 4 commits into
apache:mainfrom
Jackeyzhe:fluss-3483

Jackeyzhe commented Jun 18, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

luoyuxia left a comment

Uh oh!

luoyuxia Jun 24, 2026

Uh oh!

luoyuxia Jun 24, 2026

Uh oh!

luoyuxia commented Jul 4, 2026

Uh oh!

Jackeyzhe commented Jul 4, 2026

Uh oh!

luoyuxia commented Jul 4, 2026

Uh oh!

Jackeyzhe commented Jul 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

Jackeyzhe commented Jun 18, 2026

Purpose

Brief change log

Tests

API and Format

Documentation

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

luoyuxia left a comment

Choose a reason for hiding this comment

Uh oh!

luoyuxia Jun 24, 2026

Choose a reason for hiding this comment

Uh oh!

luoyuxia Jun 24, 2026

Choose a reason for hiding this comment

Uh oh!

luoyuxia commented Jul 4, 2026

Uh oh!

Jackeyzhe commented Jul 4, 2026

Uh oh!

luoyuxia commented Jul 4, 2026

Uh oh!

Jackeyzhe commented Jul 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants