Skip to content

[Python][Parquet] FIXED_LEN_BYTE_ARRAY fails to cast to UUID on Python 3.14 / Nightly builds #50312

Description

@GiTaDi-CrEaTe

Describe the bug, including details regarding any error messages, version, and platform.

Description of the bug

When writing uuid.UUID objects to Parquet using PyArrow, the data is correctly stored as a FIXED_LEN_BYTE_ARRAY. However, when reading this data back on Python 3.14 / Nightly builds, PyArrow fails to cast the 16 bytes back into Python uuid.UUID objects, instead returning raw bytes.

This works perfectly on Python 3.13 and below, but introduces a regression on other builds.

To Reproduce:

import pyarrow as pa
import pyarrow.parquet as pq
import pandas as pd
import uuid

# Create a simple table with a UUID
original_uuid = uuid.uuid4()
df = pd.DataFrame({"id": [original_uuid]})
table = pa.Table.from_pandas(df)

# Write to parquet and read back
pq.write_table(table, "test_uuid.parquet")
read_table = pq.read_table("test_uuid.parquet")
result_df = read_table.to_pandas()

# On Python 3.13, this is a uuid.UUID object. 
# On Python 3.14 / Nightly, this is raw bytes.
print(type(result_df.loc[0, "id"]))

---
This was discovered while adding upstream UUID Parquet tests to the pandas test suite (pandas-dev/pandas#65647).

### Component(s)

Python, Parquet

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions