feat(storage): full object checksum: implement rolling checksum and verification in reads resumption strategy#17262
Merged
Merged
Conversation
Contributor
There was a problem hiding this comment.
Code Review
This pull request introduces support for full-object rolling checksum verification during resumed reads, allowing verification once the stream completes. It also adds a flag to conditionally disable checksum validation. The review feedback suggests optimizing performance by bypassing rolling checksum updates when checksums are disabled, and recommends adding a unit test to verify that full-object checksum mismatches are ignored when checksum validation is disabled.
…yncio fix from main
…est-asyncio issues
kalragauri
approved these changes
May 29, 2026
sofisl
added a commit
that referenced
this pull request
Jun 11, 2026
PR created by the Librarian CLI to initialize a release. Merging this PR will auto trigger a release. Librarian Version: v0.19.0 Language Image: us-central1-docker.pkg.dev/cloud-sdk-librarian-prod/images-prod/python-librarian-generator@sha256:234b9d1f2ddb057ed7ac6a38db0bf8163d839c65c6cf88ade52530cddebce59e <details><summary>gapic-generator: v1.35.0</summary> ## [v1.35.0](gapic-generator-v1.34.1...gapic-generator-v1.35.0) (2026-06-11) ### Features * setup.py matches prerelease versions (#17370) ([25b857e](25b857e1)) ### Bug Fixes * require protobuf 6.33.5 to address CVE-2026-0994 (#17349) ([6642263](66422636)) </details> <details><summary>google-auth: v2.54.0</summary> ## [v2.54.0](google-auth-v2.53.0...google-auth-v2.54.0) (2026-06-11) ### Features * implement regional access boundary support for standalone JWT and async service accounts (#17025) ([35af616](35af6168)) ### Bug Fixes * configure mTLS for impersonated credentials (#17404) ([57269d5](57269d56)) * fail-fast on missing ECP config file to avoid 30s hang (#17377) ([e096127](e0961270)) * Rename the 'seed' argument for setting an initial regional access boundary for clarity (#17186) ([e5c8cf9](e5c8cf92)) * update incorrect urls in setup.py to point at monorepo vs splitrepo (#17237) ([eaed04b](eaed04ba)) </details> <details><summary>google-cloud-alloydb: v0.11.0</summary> ## [v0.11.0](google-cloud-alloydb-v0.10.0...google-cloud-alloydb-v0.11.0) (2026-06-11) ### Features * update API sources and regenerate (#17413) ([59fe7cf](59fe7cf8)) </details> <details><summary>google-cloud-biglake: v0.5.0</summary> ## [v0.5.0](google-cloud-biglake-v0.4.0...google-cloud-biglake-v0.5.0) (2026-06-11) ### Features * update API sources and regenerate (#17431) ([2e75c78](2e75c78c)) </details> <details><summary>google-cloud-ces: v0.7.0</summary> ## [v0.7.0](google-cloud-ces-v0.6.0...google-cloud-ces-v0.7.0) (2026-06-11) ### Features * update API sources and regenerate (#17413) ([59fe7cf](59fe7cf8)) </details> <details><summary>google-cloud-confidentialcomputing: v0.11.0</summary> ## [v0.11.0](google-cloud-confidentialcomputing-v0.10.0...google-cloud-confidentialcomputing-v0.11.0) (2026-06-11) ### Features * update API sources and regenerate (#17413) ([59fe7cf](59fe7cf8)) </details> <details><summary>google-cloud-modelarmor: v0.7.0</summary> ## [v0.7.0](google-cloud-modelarmor-v0.6.0...google-cloud-modelarmor-v0.7.0) (2026-06-11) ### Features * update API sources and regenerate (#17413) ([59fe7cf](59fe7cf8)) </details> <details><summary>google-cloud-network-services: v0.10.0</summary> ## [v0.10.0](google-cloud-network-services-v0.9.0...google-cloud-network-services-v0.10.0) (2026-06-11) ### Features * update API sources and regenerate (#17431) ([2e75c78](2e75c78c)) </details> <details><summary>google-cloud-oracledatabase: v0.6.0</summary> ## [v0.6.0](google-cloud-oracledatabase-v0.5.0...google-cloud-oracledatabase-v0.6.0) (2026-06-11) ### Features * update API sources and regenerate (#17413) ([59fe7cf](59fe7cf8)) </details> <details><summary>google-cloud-spanner: v3.68.0</summary> ## [v3.68.0](google-cloud-spanner-v3.67.0...google-cloud-spanner-v3.68.0) (2026-06-11) ### Features * add asynchronous code snippets and minor cleanup changes (#17337) ([d6aaf61](d6aaf610)) ### Performance Improvements * optimize query result decoding (#17375) ([3f70b2f](3f70b2ff)) </details> <details><summary>google-cloud-storage: v3.12.0</summary> ## [v3.12.0](google-cloud-storage-v3.11.0...google-cloud-storage-v3.12.0) (2026-06-11) ### Features * full object checksum: implement rolling checksum and verification in reads resumption strategy (#17262) ([2361ba6](2361ba6e)) * Enable full object checksum PR 1/3 : parse finalize_time and server crc32c in async object stream (#17261) ([72c7a27](72c7a272)) * full object checksum: integrate full-object checksum in AsyncMultiRangeDownloader (#17263) ([b6a85e4](b6a85e49)) </details> <details><summary>google-developer-knowledge: v0.1.0</summary> ## [v0.1.0](google-developer-knowledge-v0.0.0...google-developer-knowledge-v0.1.0) (2026-06-11) ### Features * add google-developer-knowledge (#17417) ([ca02afc](ca02afce)) </details>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
1. Overview of the Solution
This solution implements end-to-end full-object checksum validation in
AsyncMultiRangeDownloaderfor the asynchronous Google Cloud Storage Python client library. As asynchronous multiplexed downloads of non-contiguous ranges are performed concurrently over a single bidirectional gRPC connection, this feature automatically and incrementally calculates a rolling checksum as bytes arrive and validates it against the server's authoritative object checksum once the download completes.The technical approach consists of three coordinated layers:
_AsyncReadObjectStream(Stream Ingestion): Safely extracts the authoritative server checksum (full_obj_server_crc32c) and finalization status (is_finalized) from the object metadata received in the first data payload response of the stream._ReadResumptionStrategy&_DownloadState(Verification Logic): Computes an isolated, persistent rolling checksum in the individual_DownloadStateobject to ensure calculations do not bleed across concurrent multiplexed ranges. Crucially, the rolling hash updates only after buffer writes succeed to prevent state corruption during retry re-connects, raising aDataCorruptionexception on completion if a mismatch occurs.AsyncMultiRangeDownloader(Orchestration & Cleanup): Detects candidate full-object ranges (e.g.,(0, 0)or(0, persisted_size)), propagates checksum settings to the resumption strategy, and guarantees robust cleanup (closing the stream immediately and unregistering IDs) if data corruption or write errors occur.2. What This PR Specifically Does
This PR implements Step 2: Full-Object Rolling Checksum & Resumption Verification Logic of the solution:
_DownloadStateto trackis_full_object_readand initialize an isolatedgoogle_crc32c.Checksum()rolling instance._ReadResumptionStrategy.update_state_from_response()to run buffer writes before updating the rolling checksum, ensuring transactional safety during connection failures and retry reconnects.enable_checksumisFalse.range_endagainst the server's authoritative checksum, raising aDataCorruptionexception if a mismatch is found.test_reads_resumption_strategy.pyto verify successful validation, failure exceptions, and bypassed checks when validation is disabled.