Skip to content

Guard XCom/DagRun pickle-removal migrations (0049, 0055) against NaN/Infinity and U+0000 in bytea->JSONB conversion#69064

Merged
uranusjr merged 5 commits into
apache:mainfrom
seanmuth:fix/migration-strip-nul-bytes
Jul 2, 2026
Merged

Guard XCom/DagRun pickle-removal migrations (0049, 0055) against NaN/Infinity and U+0000 in bytea->JSONB conversion#69064
uranusjr merged 5 commits into
apache:mainfrom
seanmuth:fix/migration-strip-nul-bytes

Conversation

@seanmuth

@seanmuth seanmuth commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

Why

Migrations 0049_3_0_0_remove_pickled_data_from_xcom_table (xcom.value) and 0055_3_0_0_remove_pickled_data_from_dagrun_table (dag_run.conf) both convert legacy pickled bytea columns to JSON/JSONB during the 2.x -> 3.x upgrade. Both can be fed values that are legal in the pickled representation but illegal in strict JSON/JSONB:

  1. Non-finite floats NaN / Infinity / -Infinity (json.dumps default allow_nan=True emits the bare tokens).
  2. The U+0000 (NUL) escape that json.dumps emits for embedded null bytes (common with data from fixed-width exports, C buffers, Mongo, or bad UTF-16->UTF-8 conversions).

Before this PR each migration only handled one of the two classes:

non-finite floats U+0000 (NUL)
0049 (xcom, SQL regexp_replace) handled (#57893) unhandled -> bulk CAST(... AS JSONB) aborts the whole upgrade
0055 (dag_run, Python json.dumps) unhandled -> per-row except silently drops the conf unhandled

The PostgreSQL error for the NUL case:

sqlalchemy.exc.DataError: (psycopg2.errors.UntranslatableCharacter) unsupported Unicode escape sequence
DETAIL: <NUL> cannot be converted to text.
CONTEXT: JSON data, line 1: ..."col": "F<NUL>...

Fixes #69063. (The NaN-in-xcom half was #57893; this also closes the symmetric NaN-in-conf silent-data-loss gap in 0055.)

What

Make both bytea->JSONB migrations guard against both value classes:

  • 0049 (xcom) - keep the existing NaN/Infinity quoting; add a U+0000 strip to all three dialect branches:
    • PostgreSQL: replace(convert_from(value,'UTF8'), '<NUL-escape>', '') wrapping the regexp_replace.
    • MySQL: REPLACE(CONVERT(value USING utf8mb4), '<NUL-escape>', '') (same file-vs-SQL backslash escaping as the existing NaN regex in that branch).
    • SQLite: an extra inner REPLACE(..., '<NUL-escape>', '') (no REGEXP_REPLACE, consistent with the existing approach).
  • 0055 (dag_run.conf) - add a recursive _json_safe() that quotes non-finite floats to "NaN"/"Infinity"/"-Infinity" (mirroring 0049's SQL output) before json.dumps, and strip the U+0000 escape from the result. 0055 is dialect-agnostic (Python serialization), so this one helper covers all backends. The conf is now preserved instead of dropped.

NaN/Infinity are quoted (kept as strings, matching 0049); U+0000 cannot be quoted/kept, so it is stripped. json.dumps (default ensure_ascii=True) only ever emits a NUL as the 6-char escape, never a raw 0x00, so stripping the escape covers values produced by normal serialization. Migration files are amended in place per the project convention for migration bugfixes.

Minimal repro (NUL)

-- '\x7b226b223a2022465c75303030306f6f227d' decodes to: {"k": "F<NUL>oo"}
SELECT CAST(CONVERT_FROM('\x7b226b223a2022465c75303030306f6f227d'::bytea, 'UTF8') AS JSONB);
-- ERROR: unsupported Unicode escape sequence  /  DETAIL: <NUL> cannot be converted to text.

To keep the SQL testable, 0049's per-dialect sanitize statements are extracted into _xcom_{pg,mysql,sqlite}_sanitize_sql() helpers (same pattern as migration 0117); the production calls default to table="xcom", so the emitted SQL is byte-for-byte unchanged.

Notes

  • Migration unit tests added: NaN/Infinity/-Infinity and a NUL-bearing value, for xcom (all dialects) and dag_run.conf.
  • Happy to add a newsfragment if maintainers want one for an in-place migration bugfix — let me know the preferred category.

Related: #57893 (the NaN/Infinity sanitization for the xcom conversion).


Authored with Claude Code (Claude Opus 4.8).

…before JSON(B) conversion

Migrations 0049 (xcom.value) and 0055 (dag_run.conf) convert the legacy
pickled bytea columns to JSON/JSONB but do not handle the U+0000 (NUL)
escape that json.dumps emits for embedded null bytes. PostgreSQL JSON/JSONB
cannot represent it ("unsupported Unicode escape sequence ... cannot be
converted to text"), so:

  * 0049 (xcom): the bulk ALTER ... USING CAST(... AS JSONB) aborts the
    entire upgrade.
  * 0055 (dag_run): the per-row try/except drops the row, silently losing
    the conf.

This is the same class of bug as the already-handled NaN/Infinity tokens
(apache#57893): values legal in pickle but illegal in strict
JSON/JSONB. Unlike the non-finite floats, a NUL cannot be quoted/kept, so
it is stripped. Covers all three dialects in 0049 (Postgres/MySQL/SQLite)
and the Python serialization path in 0055.

Closes apache#69063

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@boring-cyborg boring-cyborg Bot added the area:db-migrations PRs with DB migration label Jun 26, 2026
0055 converts pickled dag_run.conf to JSON/JSONB Python-side via json.dumps
and a per-row try/except. A float('nan')/inf/-inf in conf serializes to the
bare tokens NaN/Infinity/-Infinity (json.dumps default allow_nan=True), which
PostgreSQL JSON/JSONB rejects -> the row was caught by the except and silently
dropped, losing the conf.

Add a recursive _json_safe() that quotes non-finite floats to "NaN"/"Infinity"/
"-Infinity" (mirroring the SQL sanitization already in 0049 for xcom) before
json.dumps, so the value is preserved. Both bytea->JSONB migrations now guard
against both illegal-in-JSON value classes (non-finite floats and the U+0000
escape).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@seanmuth seanmuth changed the title Strip U+0000 (NUL) escape in XCom/DagRun pickle-removal migrations (0049, 0055) Guard XCom/DagRun pickle-removal migrations (0049, 0055) against NaN/Infinity and U+0000 in bytea->JSONB conversion Jun 26, 2026
Extract 0049's per-dialect value-sanitization SQL into module-level helpers
(_xcom_pg/mysql/sqlite_sanitize_sql), mirroring the testability pattern in
migration 0117. Production calls default to table="xcom" so the emitted SQL is
byte-for-byte identical to the previous inline statements.

Tests:
  * test_0049: runs the real sanitize SQL against an isolated table. SQLite case
    is backend-independent; Postgres (asserts the JSONB cast fails on the NUL
    escape pre-sanitize and succeeds after) and MySQL cases are backend-marked.
  * test_0055: pure-Python tests for _json_safe (non-finite float quoting,
    recursion into dict/list/tuple) and the full json.dumps(...).replace(NUL,'')
    pipeline, asserting strict-valid JSON via json.loads(parse_constant=...).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@seanmuth seanmuth marked this pull request as ready for review June 26, 2026 22:07
@seanmuth seanmuth requested a review from ephraimbuddy as a code owner June 26, 2026 22:07
seanmuth and others added 2 commits June 29, 2026 08:57
…oken

Review feedback from @uranusjr:

* 0055 _json_safe: a blind string .replace on the json.dumps output corrupts a
  genuinely escaped backslash sequence (an embedded literal backslash + u0000).
  Handle NUL on the object instead (strip chr(0) from str values before dumps),
  drop the post-dump .replace, and recurse via collections.abc Mapping/Sequence
  (excluding str/bytes) so dict/list subclasses and keys are covered too.

* 0049: the SQL replace(..., U+0000, '') had the same false-match. Use a
  protect-strip-restore sentinel swap in all three dialects (escaped backslashes
  -> chr(1), strip the active NUL escape, restore); chr(1) is safe because
  json.dumps escapes control bytes so it never appears in the JSON text.

* 0049 helpers: substitute the table via an explicit __TABLE__ token instead of
  matching "UPDATE xcom" (str.format / Template don't fit -- the SQL contains
  literal braces from the regex and $ from anchors / the MySQL replacement).

Tests extended: assert a literal backslash + u0000 in the data survives
sanitization (xcom all dialects, dag_run.conf), plus NUL-in-string stripping,
Mapping/Sequence recursion, and key sanitization.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
CI Static checks failed on the new files:
* PT006: pytest.mark.parametrize first arg must be a tuple -> ("value", "expected").
* D-rule: _json_safe docstring summary moved onto its own line (ruff --fix).
* ruff-format: collapse multi-line conn.execute calls that fit on one line in
  test_0049.

No logic changes; the migration SQL and _json_safe behavior are unchanged
(re-verified across all dialects).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

@phanikumv phanikumv left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the newsfragment question: skip it. Prior in-place fixes to these same migrations (#62686, #57866, #53812) landed without one, and since both files target the already-released 3.0.0, a note would just show up in a later changelog for a
migration which users have already run or never hit.

@uranusjr uranusjr merged commit 54f4300 into apache:main Jul 2, 2026
77 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:db-migrations PRs with DB migration

Projects

None yet

Development

Successfully merging this pull request may close these issues.

AF 2 to 3 migration 0049 (remove_pickled_data_from_xcom_table) aborts on U+0000 NUL escape in XCom values during bytea to JSONB cast

3 participants