Security and robustness fixes for the dataset/download path plus a#436
Open
Garthigan wants to merge 3 commits into
Open
Security and robustness fixes for the dataset/download path plus a#436Garthigan wants to merge 3 commits into
Garthigan wants to merge 3 commits into
Conversation
Python 3.12 install fix:
* Add `l2l.data.utils.safe_extract`, a shared helper that validates
tar/zip members before extraction to prevent path traversal
(Zip-Slip). Route all 12 `extractall` call sites (vgg_flowers,
fgvc_fungi, describable_textures, cu_birds200, tiered_imagenet,
cifarfs, fc100, and text/news_classification) through it. This
generalizes the existing `safe_extract` already used in
fgvc_aircraft.py.
* Load downloaded pretrained backbones with `weights_only=True`
(vision/models) to avoid arbitrary code execution from tampered
checkpoints.
* Add request timeouts to all downloads (download_file,
download_file_from_google_drive, vgg_flowers labels,
news_classification).
* Fix file-descriptor leaks by closin
context managers (fc100, tiered_imagenet).
* Switch vgg_flowers image/label URLs
* Remove unused `distutils.core` import from setup.py so install
works on Python 3.12+ (distutils reed
name was shadowed and never used.
Covers benign tar/zip extraction, path-traversal (Zip-Slip) rejection for both tar and zip, and the unsupported-archive TypeError. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Fixes #413, #417, and #433.
This PR improves the security and robustness of the dataset download and extraction pipeline while also fixing Python 3.12 installation compatibility.
Security & Robustness
Path Traversal (Zip-Slip) Protection
l2l.data.utils.safe_extract()helper that validates archive members before extraction.extractall()calls withsafe_extract()across:vgg_flowersfgvc_fungidescribable_texturescu_birds200tiered_imagenetcifarfsfc100text/news_classificationfgvc_aircraft.py.Safer Model Deserialization
torch.load(..., weights_only=True)when loading downloaded model checkpoints to prevent arbitrary code execution from tampered checkpoint files.Download Timeouts
download_file()download_file_from_google_drive()vgg_flowerslabel downloadsnews_classificationResource Management
ZipFileandTarFileobjects are properly closed infc100andtiered_imagenetto prevent resource leaks.Transport Security
vgg_flowersimage and label download URLs fromhttp://tohttps://.Python 3.12 Compatibility
from distutils.core import setupimport fromsetup.py.distutilswas removed from the Python standard library in Python 3.12 (PEP 632). The import prevented installation and was unnecessary becausesetuptools.setupwas already being used.Tests
Added
tests/unit/data/safe_extract_test.pycovering:Contribution Checklist
CHANGELOG.md.