Skip to content

gh-152718: Reject oversized table counts in the profiling binary reader#152719

Open
tonghuaroot wants to merge 3 commits into
python:mainfrom
tonghuaroot:fix-profiling-reader-alloc
Open

gh-152718: Reject oversized table counts in the profiling binary reader#152719
tonghuaroot wants to merge 3 commits into
python:mainfrom
tonghuaroot:fix-profiling-reader-alloc

Conversation

@tonghuaroot

@tonghuaroot tonghuaroot commented Jul 1, 2026

Copy link
Copy Markdown
Contributor

This bounds the eager string-table and frame-table allocations in the
_remote_debugging binary profile reader against the file size before
allocating, so a .pyb file whose footer declares an oversized count is
rejected with a ValueError instead of triggering a multi-gigabyte allocation.

Each table entry occupies at least one byte on disk (a string is a >=1-byte
length varint, a frame is six >=1-byte varints plus the opcode byte), so a count
larger than (file_size - table_offset) / MIN_ENTRY_SIZE cannot be backed by
real data. This mirrors the existing RLE-count bound in binary_reader_replay.
Legitimate files are unaffected.

Regression tests in test_binary_format patch the footer counts and assert the
reader rejects both an oversized and a one-past-capacity count, while a count
exactly at the cap still opens.

@pablogsal pablogsal left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

}
#endif

size_t max_frames =

@maurycy maurycy Jul 2, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wondering about a helper. With these two and this one:

/* Validate RLE count to prevent DoS from malicious files.
* Each RLE sample needs at least 2 bytes (1 byte min varint + 1 status byte).
* Also reject absurdly large counts that would exhaust memory. */
size_t remaining_data = reader->sample_data_size - offset;
size_t max_possible_samples = remaining_data / 2;
if (count > max_possible_samples) {
PyErr_Format(PyExc_ValueError,
"Invalid RLE count %u exceeds maximum possible %zu for remaining data",
count, max_possible_samples);
return -1;
}

it'd be three. Something along the lines of reader_validate_count("RLE", count, reader->sample_data_size - offset, 2) etc.

Generally, wondering about centralizing all these validations in a single place.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea — done in 572947c: the string, frame and RLE checks now share a reader_validate_count(what, count, available, min_entry_size) helper. One heads-up: the RLE call sits in the region #152722 also touches, so whichever lands first the other will need a small rebase there. Thanks!

…count

Addresses review feedback: centralize the string/frame/RLE oversized-count
validations behind one helper.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants