Skip to content

Security and robustness fixes for the dataset/download path plus a#436

Open
Garthigan wants to merge 3 commits into
learnables:masterfrom
Garthigan:security-and-py312-fixes
Open

Security and robustness fixes for the dataset/download path plus a#436
Garthigan wants to merge 3 commits into
learnables:masterfrom
Garthigan:security-and-py312-fixes

Conversation

@Garthigan

Copy link
Copy Markdown

Description

Fixes #413, #417, and #433.

This PR improves the security and robustness of the dataset download and extraction pipeline while also fixing Python 3.12 installation compatibility.

Security & Robustness

  • Path Traversal (Zip-Slip) Protection

    • Introduces a shared l2l.data.utils.safe_extract() helper that validates archive members before extraction.
    • Replaces all direct extractall() calls with safe_extract() across:
      • vgg_flowers
      • fgvc_fungi
      • describable_textures
      • cu_birds200
      • tiered_imagenet
      • cifarfs
      • fc100
      • text/news_classification
    • Generalizes the archive validation logic that previously existed only in fgvc_aircraft.py.
  • Safer Model Deserialization

    • Uses torch.load(..., weights_only=True) when loading downloaded model checkpoints to prevent arbitrary code execution from tampered checkpoint files.
  • Download Timeouts

    • Adds request timeouts to avoid indefinitely hanging downloads in:
      • download_file()
      • download_file_from_google_drive()
      • vgg_flowers label downloads
      • news_classification
  • Resource Management

    • Ensures ZipFile and TarFile objects are properly closed in fc100 and tiered_imagenet to prevent resource leaks.
  • Transport Security

    • Updates the vgg_flowers image and label download URLs from http:// to https://.

Python 3.12 Compatibility

  • Removes the unused from distutils.core import setup import from setup.py.
  • distutils was removed from the Python standard library in Python 3.12 (PEP 632). The import prevented installation and was unnecessary because setuptools.setup was already being used.

Tests

Added tests/unit/data/safe_extract_test.py covering:

  • Safe extraction of valid ZIP and TAR archives.
  • Rejection of path traversal (Zip-Slip) attacks for both ZIP and TAR archives.
  • Validation of unsupported archive types.

Contribution Checklist

  • My contribution is listed in CHANGELOG.md.
  • My contribution modifies code in the main library.
  • My modifications are tested.
  • My modifications are documented.

Garthigan and others added 3 commits July 4, 2026 12:28
   Python 3.12 install fix:

   * Add `l2l.data.utils.safe_extract`, a shared helper that validates
     tar/zip members before extraction to prevent path traversal
     (Zip-Slip). Route all 12 `extractall` call sites (vgg_flowers,
     fgvc_fungi, describable_textures, cu_birds200, tiered_imagenet,
     cifarfs, fc100, and text/news_classification) through it. This
     generalizes the existing `safe_extract` already used in
     fgvc_aircraft.py.
   * Load downloaded pretrained backbones with `weights_only=True`
     (vision/models) to avoid arbitrary code execution from tampered
     checkpoints.
   * Add request timeouts to all downloads (download_file,
     download_file_from_google_drive, vgg_flowers labels,
     news_classification).
   * Fix file-descriptor leaks by closin
     context managers (fc100, tiered_imagenet).
   * Switch vgg_flowers image/label URLs
   * Remove unused `distutils.core` import from setup.py so install
     works on Python 3.12+ (distutils reed
     name was shadowed and never used.
Covers benign tar/zip extraction, path-traversal (Zip-Slip) rejection
for both tar and zip, and the unsupported-archive TypeError.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Failed building wheel on Python 3.11 - "longintrepr.h”: No such file or directory

1 participant