fix: prevent second batch store reset wiping postage snapshot on --resync#5499
Open
martinconic wants to merge 1 commit into
Open
fix: prevent second batch store reset wiping postage snapshot on --resync#5499martinconic wants to merge 1 commit into
martinconic wants to merge 1 commit into
Conversation
acud
reviewed
Jun 10, 2026
| // postage snapshot has already reset and rebuilt the batch store, so the live | ||
| // sync continues from the snapshot instead of wiping it (see issue #5495). | ||
| func (svc *batchService) SkipResync() { | ||
| svc.resync = false |
Contributor
There was a problem hiding this comment.
to me the fact that there's two places where the batch service could get reset already smells a bit. with this change we have still two places where a reset happens + third path to disable one of them potentially. so the question is asked - why have two places that do this in the first place?
Contributor
Author
There was a problem hiding this comment.
The two reset sites aren't pure duplication. They serve different roles:
- snapshot service → resets + rebuilds from the snapshot
- live service → resets + rebuilds from chain genesis — this is the fallback path, needed when there's no snapshot
So you can't simply delete one site without merging the two services (they use different listeners — snapshot filterer vs live chain backend).
I changed the implementation to not modify at all batchservice.go
c26bb91 to
a398078
Compare
…sync When a node is started with --resync together with the postage snapshot, the batch store was reset twice. The snapshot batch service resets and rebuilds the store from the snapshot, but the live-chain batch service was constructed with the same resync flag and reset the store a second time in its Start(), discarding the freshly loaded snapshot. Live sync then began at the snapshot block height against an empty store, so the first top-up or dilute event for any batch created before the snapshot height failed with "get batch <id>: storage: not found" and the node shut down. This is why disabling the snapshot (skip-postage-snapshot) worked around it: the store was reset only once and sync replayed from contract genesis, where every create precedes its top-up. Construct the live batch service after the snapshot path has run and pass resync only when the snapshot did not rebuild the store (o.Resync && !snapshotLoaded). The store is still reset exactly once in every other case (snapshot skipped, failed, not applicable, or a non-resync start). Closes #5495
a398078 to
69df1aa
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Checklist
Description
Fixes #5495: a node started with
--resyncwhile the postage snapshot is active shuts down during sync with:Root cause
The batch store is reset twice on startup in this configuration:
snapshotBatchSvc.Start()resets the store (resync) and rebuilds it from the snapshot, then setspostageSyncStart = maxBlockHeight.batchSvcwas constructed with the sameo.Resyncflag, so itsStart()reset the store again, discarding the snapshot that was just loaded.Live sync then started from the snapshot block height against an empty store, so the first top-up/dilute event referencing a batch created before the snapshot height failed with
storage: not found, and the node shut down.This matches the report exactly: two
"resync requested, resetting batch store"log lines, and the fact thatskip-postage-snapshotworks around it (only one reset, and sync replays from contract genesis where every create precedes its top-up).Note: this is independent of #5343/#5482. The flow originates in #5094 and is still present on master; the reporter's build (
2.8.0-41d6efc6) predates the #5343 revert, but reverting #5343 does not address this.Fix
The two reset sites are not redundant — one rebuilds from the snapshot, the other is the fallback rebuild for the no-snapshot / snapshot-failed / non-mainnet cases — they were just both armed at once.
Construct the live batch service after the snapshot path has run and pass resync only when the snapshot did not rebuild the store:
The reset logic stays in its single existing place (
batchservice.Start()), exactly one service is armed to reset, and there is no runtime toggle. The store is still reset exactly once in every other case (snapshot skipped, failed, not applicable, or a non-resync start).Testing
TestResyncControlsResetinbatchserviceasserts both directions of the contractnode.gorelies on:resync=true⇒ store reset onceresync=false⇒ store not reset (the post-snapshot live-service case)go test -race ./pkg/postage/batchservice/...passesgo build ./pkg/node/...passes;go vet,gofumpt, andgolangci-lintcleanOpen API Spec Version Changes (if applicable)
None.
Related Issue
Closes #5495
AI Disclosure