poll device readiness before formatting sticky disk#111
Draft
devin-ai-integration[bot] wants to merge 3 commits into
Draft
poll device readiness before formatting sticky disk#111devin-ai-integration[bot] wants to merge 3 commits into
devin-ai-integration[bot] wants to merge 3 commits into
Conversation
After a Firecracker drive hot-swap (PATCH /drives), the guest kernel learns the new backing device size via an async virtio config-change interrupt. Fast consumers like mkfs.ext4 can observe a zero device size if they run before the interrupt is processed. Poll blockdev --getsize64 until the device reports non-zero (up to 5s, 50ms interval) before attempting blkid or mkfs. Co-Authored-By: Paul Bardea <paul@blacksmith.sh>
Contributor
Author
🤖 Devin AI EngineerI'll be helping with this pull request! Here's what you should know: ✅ I will automatically:
Note: I can only respond to comments from users who have write access to this repository. ⚙️ Control Options:
|
Co-Authored-By: Paul Bardea <paul@blacksmith.sh>
The 5s timeout caused the step duration regression test to fail because in CI the placeholder device never gets hot-swapped, so the full timeout was consumed. Reduce to 2s (still well above the ~51ms virtio race) and warn instead of throw on timeout, letting the existing mkfs error handling proceed naturally. Co-Authored-By: Paul Bardea <paul@blacksmith.sh>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes a race condition where
mkfs.ext4 /dev/vdbfails with "Device size reported to be zero" after a Firecracker drive hot-swap.Root cause: Firecracker's
PATCH /drivesis host-synchronous — it returns once the backing file is swapped on the host. The guest kernel learns the new capacity via an async virtio config-change interrupt. When the gRPC response reachessetup-docker-builderand it immediately runsmkfs, only ~50ms may have elapsed — not enough for the guest's virtio-blk driver to update the device geometry.Fix: Add
waitForDeviceReady(device)at the top ofmaybeFormatBlockDevicethat pollsblockdev --getsize64until the device reports non-zero size (50ms interval, 5s timeout). This is deterministic — it waits exactly as long as the kernel needs, no more.Observed in FastActions/fa#4268 CI — 51ms between
PATCH /drives/2completing on the host andmkfs.ext4failing in the guest.cc @brucemakallan
Link to Devin session: https://app.devin.ai/sessions/45a3be2102b746d393226ba6fec5b390
Need help on this PR? Tag
/codesmithwith what you need. Autofix is disabled.Need help on this PR? Tag
/codesmithwith what you need. Autofix is disabled. (Staging)