Skip to content

gh-148675: Add Zd/Zf formats to array, ctypes, memoryview, struct#148676

Merged
vstinner merged 23 commits intopython:mainfrom
vstinner:complex_formats
May 4, 2026
Merged

gh-148675: Add Zd/Zf formats to array, ctypes, memoryview, struct#148676
vstinner merged 23 commits intopython:mainfrom
vstinner:complex_formats

Conversation

@vstinner
Copy link
Copy Markdown
Member

@vstinner vstinner commented Apr 17, 2026

  • Add Zd/Zf format support to array, memoryview and struct.
  • ctypes: Replace F/D/G complex format with Zf/Zd/Zg.
  • Modify array, ctypes and struct modules to support format strings longer than 1 character (such as "Zd").
  • Change array.typecodes type from str to tuple.

📚 Documentation preview 📚: https://cpython-previews--148676.org.readthedocs.build/

@vstinner

This comment was marked as outdated.

@vstinner
Copy link
Copy Markdown
Member Author

array.typecodes is a string with all available type codes. I added Zf and Zd to this string, but "Z" in array.typecodes is now true which is a bug.

Maybe array.typecodes string should be converted to a tuple?

>>> import array
>>> array.typecodes
'bBuwhHiIlLqQefdFDZfZd'
>>> "Z" in array.typecodes
True
>>> list(array.typecodes)
['b', 'B', 'u', 'w', 'h', 'H', 'i', 'I', 'l', 'L', 'q', 'Q', 'e', 'f', 'd', 'F', 'D', 'Z', 'f', 'Z', 'd']

Comment thread Modules/_ctypes/cfield.c Outdated
Comment thread Modules/arraymodule.c
IEEE_754_DOUBLE_COMPLEX_BE : IEEE_754_DOUBLE_COMPLEX_LE;

case 'Z': {
switch (typecode[1]) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if only "Z" is given with nothing following? Is that error handled correctly?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's handled by the default clause. In this case typecode is a null-terminated string like "Z" or something else.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The string "Z" goes to default: return UNKNOWN_FORMAT; case. I added an explicit test to test_array.

@skirpichev
Copy link
Copy Markdown
Member

Maybe array.typecodes string should be converted to a tuple?

Documentation says it's a string. Probably, it's a way to go, but it's not a backward-compatible change.

I more worry that you essentially introduce a second set of format types, a little illustration:

Python 3.15.0a8 (heads/test-vstinner-patch:2aabdf41600, May  1 2026, 08:50:37) [GCC 14.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import numpy as np
>>> import array
>>> a1 = array.array('D', [1, 2])
>>> np.array(a1)
Traceback (most recent call last):
 ...
Traceback (most recent call last):
  File "<python-input-3>", line 1, in <module>
    np.array(a1)
    ~~~~~~~~^^^^
ValueError: 'D' is not a valid PEP 3118 buffer format string
>>> a2 = array.array('Zd', [1, 2])
>>> np.array(a2)
array([1.+0.j, 2.+0.j])

One is able to interoperate with NumPy & Co, other - not. IMO, this is something, that should be explained in docs. Or this will confuse users.

BTW, the ctypes change is not documented.

@vstinner vstinner marked this pull request as ready for review May 2, 2026 10:52
@vstinner vstinner requested a review from AA-Turner as a code owner May 2, 2026 10:52
@vstinner
Copy link
Copy Markdown
Member Author

vstinner commented May 2, 2026

The Steering Council is fine with this change, so I mark it as ready for review:

In the meantime, we generally agree that following numpy's lead here is the right thing to do. That could mean completing @vstinner's draft PR in time for 3.15b1 (which generally LGTU) or something smaller, and then if necessary try to get a more comprehensive change into 3.15b2. @hugovk is on board with that approach, with this falling under the "betas are for fixing things" principle.

@encukou @skirpichev @serhiy-storchaka: It would be nice if you could review this change before next Tuesday (Python 3.15 beta1, feature freeze).

I updated the PR to change array.typecodes type from str to tuple. I also fixed a few more bugs and added more tests (to test_array).

@vstinner
Copy link
Copy Markdown
Member Author

vstinner commented May 2, 2026

BTW, the ctypes change is not documented.

c_float_complex, c_double_complex and c_longdouble_complex are not modified by this PR, they still use the formats: "F", "D", and "G".

How would you like to document the ctypes change adding "Zf", "Zd" and "Zg" formats?

I more worry that you essentially introduce a second set of format types, a little illustration: (...)
One is able to interoperate with NumPy & Co, other - not. IMO, this is something, that should be explained in docs. Or this will confuse users.

This PR adds Zf/Zd formats for compatibility with numpy. It keeps F/D formats for backward compatibility.

There were discussions on removing or deprecating F/D formats. I would prefer to defer that to Python 3.16. Some people even asked for a PEP for such change. The timing until Python 3.15 beta1 (next Tuesday!) is too short for such large change.

The array and struct documentation can be enhanced after Python 3.15 beta1 to clarify which formats are preferred.

Comment thread Modules/_ctypes/cfield.c
Comment on lines +1649 to +1651
TABLE_ENTRY(Zd, &ffi_type_complex_double);
TABLE_ENTRY(Zf, &ffi_type_complex_float);
TABLE_ENTRY(Zg, &ffi_type_complex_longdouble);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These type codes available in ctypes alongside with old. Shouldn't this be documented, no?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I documented the change at 3 places: Changelog entry, What's New entry, and I documented the formats change in the ctypes documentation with a versionchanged markup. It should be enough, no?

@read-the-docs-community
Copy link
Copy Markdown

read-the-docs-community Bot commented May 2, 2026

Documentation build overview

📚 cpython-previews | 🛠️ Build #32526607 | 📁 Comparing 35a4691 against main (2a07ff9)

  🔍 Preview build  

92 files changed · + 1 added · ± 91 modified

+ Added

± Modified

@vstinner
Copy link
Copy Markdown
Member Author

vstinner commented May 2, 2026

@skirpichev: I'm not sure if it makes sense to add Zf, Zd and Zg formats to ctypes, since they are not used. c_float_complex, c_double_complex and c_longdouble_complex still use the formats: F, D, and G. In ctypes, the format is not used directly by users. Maybe it would make sense to replace F/D/G with Zf/Zd/Zg if we decide to deprecate F and D formats (in Python 3.16?).

Should I just revert ctypes changes?

@vstinner
Copy link
Copy Markdown
Member Author

vstinner commented May 2, 2026

@hpkfft: Do we really have to change ctypes right now? It seems like numpy is just fine with ctypes using D format:

$ python
Python 3.14.4 (main, Apr  8 2026, 00:00:00) [GCC 15.2.1 20260123 (Red Hat 15.2.1-7)] on linux
>>> import ctypes
>>> ctypes.c_double_complex._type_
'D'

>>> import numpy._core._dtype_ctypes
>>> numpy._core._dtype_ctypes.dtype_from_ctypes_type(ctypes.c_double_complex)
dtype('complex128')
>>> numpy._core._dtype_ctypes.dtype_from_ctypes_type(ctypes.c_double_complex * 3)
dtype(('<c16', (3,)))

>>> class MyStruct(ctypes.Structure):
...     _fields_ = [('x', ctypes.c_double_complex)]
...     
>>> ctypes.sizeof(MyStruct)
16

>>> numpy._core._dtype_ctypes.dtype_from_ctypes_type(MyStruct)
dtype([('x', '<c16')], align=True)

@skirpichev
Copy link
Copy Markdown
Member

How would you like to document the ctypes change adding "Zf", "Zd" and "Zg" formats?

I suspect like struct & co. Duplicated types.

Should I just revert ctypes changes?

Ok, probably I should have more close look. No, if your change provide correct type codes for the buffer protocol in the ctypes.

This PR adds Zf/Zd formats for compatibility with numpy. It keeps F/D formats for backward compatibility.

I'm not sure that this improves numpy compatibility. NumPy uses short codes for numpy.array, numpy.dtype and so on. Short codes were used in the CPython for compatibility with NumPy.

For me, it seems NumPy people do care about format types, used for the buffer protocol, but have no clear vision on what to do for other interfaces, like the struct module.

I'll try to provide a patch, that uses PEP 3118 type codes only for the buffer protocol. But we out of time.

The array and struct documentation can be enhanced after Python 3.15 beta1 to clarify which formats are preferred.

I think the documentation at least should prevent bugs like "why you are using 'Zd' in the struct module - NumPy uses 'D'!".

There were discussions on removing or deprecating F/D formats.

The brain split with duplicated types should be temporary. If we have no clear proposal on this - lets not merge such change.

Will this improve NumPy compatibility? I don't thinks so. NumPy people don't want to get rid of inconsistencies in the type codes. Fine. But lets wait at least some suggestions on how we should handle them in the CPython! Meanwhile, reversion of disturbing changes in the 3.15 - seems to be the best option for me.

BTW, the test_buffer.py has many tests for the buffer protocol. I think you should add tests for new formats here too.

@ngoldbaum
Copy link
Copy Markdown
Contributor

Ping @seberg since you brought this up originally from NumPy.

@vstinner
Copy link
Copy Markdown
Member Author

vstinner commented May 2, 2026

Summary of supported complex formats.

Python 3.14:

  • struct (2): F/D
  • array (0): none
  • memoryview (0): none
  • ctypes (3): F/D/G

Python 3.15 with this PR:

  • struct (4): F/D and Zf/Zd
  • array (4): F/D and Zf/Zd
  • memoryview (4): F/D and Zf/Zd
  • ctypes (3): Zf/Zd/Zg

@skirpichev skirpichev self-requested a review May 2, 2026 17:03
@hpkfft
Copy link
Copy Markdown

hpkfft commented May 2, 2026

This looks great! I really appreciate everyone's diligence and commitment!

@hpkfft: Do we really have to change ctypes right now?

I don't think you need my opinion on this any more, which is good because I don't use ctypes and cannot give an informed answer. It certainly looks good to me that the Z prefix is consistently available in the 3.15 summary as posted above.

@skirpichev
Copy link
Copy Markdown
Member

It does improve numpy compatibility. Example:
[...] a = np.array([1, 2, 3], dtype='D')

Sure. But how about something like array.array('Zd', [1, 2])? Or struct.pack('Zd', 1j)? Wouldn't people expect here same convention as for dtype kwarg of the numpy.array? It's certainly better documented than the buffer API.

On another hand, if we are going to use two-letters codes, why not drop 'F' and 'D' format codes for the memoryview and the array module? No compatibility break. And also deprecate such format codes for the struct module.

Comment thread Doc/library/struct.rst
Comment on lines +267 to +270
| ``Zf`` | :c:expr:`float complex` | complex | 8 | \(10) |
+--------+--------------------------+--------------------+----------------+------------+
| ``Zd`` | :c:expr:`double complex` | complex | 16 | \(10) |
+--------+--------------------------+--------------------+----------------+------------+
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest leave one entry per data type with either one-letter or two-letters code. And mention an alternative in the note, perhaps with a deprecation. Technically, no new types here - just aliases.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer this decision after Python 3.15 beta1, and don't put any preference for now in the beta1.

@hpkfft
Copy link
Copy Markdown

hpkfft commented May 2, 2026

On another hand, if we are going to use two-letters codes, why not drop 'F' and 'D' format codes for the memoryview and the array module? No compatibility break. And also deprecate such format codes for the struct module.

@seberg, @rgommers, @mattip, would NumPy change to allow 'Zd' in the following:

>>> np.array([1, 2, 3], dtype='D')
array([1.+0.j, 2.+0.j, 3.+0.j])
>>> np.array([1, 2, 3], dtype='Zd')
Traceback (most recent call last):
  File "<python-input-3>", line 1, in <module>
    np.array([1, 2, 3], dtype='Zd')
    ~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^
TypeError: data type 'Zd' not understood

If so, dropping 'F' and 'D' format codes for Python's memoryview and array modules sounds good.
Both NumPy and the struct module would probably want to keep 'F' and 'D' forever, but could soft deprecate them (or not).

Copy link
Copy Markdown
Member

@gpshead gpshead left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One actual bug (trailing junk chars silently accepted), otherwise just a bunch of minor nits.

Comment thread Doc/whatsnew/3.15.rst Outdated
Comment thread Modules/_ctypes/_ctypes.c
Comment thread Modules/arraymodule.c Outdated
Py_XDECREF(it);
PyErr_SetString(PyExc_ValueError,
"bad typecode (must be b, B, u, w, h, H, i, I, l, L, q, Q, f or d)");
"bad typecode (must be b, B, u, w, h, H, i, I, l, L, q, Q, f, d, F, D, Zd or Zf)");
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add the already missing e to this list... could we unittest that the list matches descriptors[] or just programatically generate the error message? (its an uncommon error, building the message at raise time feels fine)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added e to the list.

Comment thread Objects/memoryobject.c Outdated
Comment thread Objects/memoryobject.c
return (x[0] == y[0]) && (x[1] == y[1]);
}
case 'Z': {
switch (fmt[1]) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

defensive coding, add a default: to this switch. return MV_COMPARE_NOT_IMPL?

Comment thread Modules/_ctypes/_ctypes.c
Comment thread Objects/memoryobject.c Outdated
Comment thread Lib/test/test_buffer.py
# Format codes supported by array.array
ARRAY = NATIVE.copy()
for k in NATIVE:
if not k in "bBhHiIlLfd":
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is a bug in tests. Maybe you can fix this in the pr? 'e' format type is missing too.

@rgommers
Copy link
Copy Markdown
Contributor

rgommers commented May 3, 2026

@seberg, @rgommers, @mattip, would NumPy change to allow 'Zd' in the following:

After a first look at this: probably not. That seems to conflate NumPy typecodes (which are all single-character) with buffer protocol format codes. The 'D'/'Zd' mismatch was deliberately introduced in PEP 3118 (I believe since 'Z' prefix is clearer and composes better) and this difference has existed for about two decades now (see, e.g., https://github.com/numpy/numpy/blame/8d78a99c0bb441ebea2487aa5ac4f6a19cb9a34b/numpy/_core/_internal.py#L527-L530).

Trying to unify all this seems like a recipe for more pain, that's just going to keep rippling outward, and we definitely cannot deprecate F/D/G in NumPy. I'd rather just say that these are two sets of character/string codes for data types that are distinct, and that need conversion (like FORMAT_TO_DTYPE in this PR does).

Using these typecodes for dtype specifiers is poor practice anyway: if you have any code like np.array(..., dtype='D') or a mapping like FORMAT_TO_DTYPE, best to replace 'D' with np.complex128.

@skirpichev
Copy link
Copy Markdown
Member

@rgommers, thanks. I've added link to your post in the SC issue.

Am I correct that adding dtypes='Zd' as an alias for dtypes='D' is not an option for NumPy?

If so, I don't think that proposed solution is a good idea for the Python. We will have to keep duplicated "types" forever, this will confuse users.

Better to revert added in the 3.15 changes and deprecate complex types in the struct module. Maybe upcoming PEP could address this issue better.

The 'D'/'Zd' mismatch was deliberately introduced in PEP 3118

Just in case, maybe you can remember any discussions around this? Given that the 'D' type code was already used in NumPy - I don't see good reasons for introducing such mismatch.

@seberg
Copy link
Copy Markdown
Contributor

seberg commented May 4, 2026

I agree with Ralf, these single character codes aren't really the preferred user API from our perspective.
They are, admittedly, convenient when defining structs (structured dtypes) but even that is maybe more an implementation detail.

FWIW, I think I said it before: If numpy and struct should have identical API support, I think NumPy will need a np.dtype.from_struct() or so anyway. It already isn't identical, just similar. (I don't care if someone wants NumPy to silently accept Zd as well if Python decides array.array("Zd", ...) is the only API in Python, but I don't really see much reason.)

As mentioned before, from our perspective both are very much distinct: users don't see the buffer protocol.
If the buffer protocol did suggest D at the time, it would have forced wide-spread adoption of something that was, probably even then, seen as a limiting choice and more of an implementation detail of NumPy.

Copy link
Copy Markdown
Contributor

@seberg seberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for workign on this! I have really only one comment: Right now it seems to me that:

arr = array.array("D", [1j])
memoryview(arr).format == "<D"  # or similar

I am not sure at what point you want to normalize this (i.e. arr.typecode already? But normalizing in any direction seems a choice you can make freely from my PoV).
But it would be good to normalize it to Zd on the buffer protocol side (just a new entry, usually identical to use descr->bufcode rather than `descr->typecode).

(As per discussion, I am not sure we'll add Zd as a type-code short-hand in NumPy without new API, but I don't think you need to worry about that.)

Comment thread Modules/_ctypes/_ctypes.c
stginfo->flags |= TYPEFLAG_ISPOINTER;
case 'Z':
if (proto_str[1] == '\0') {
/* "Z": c_wchar_p */
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't realize Z was already used here, I guess it'll work in practice that this is always a Z on it's own, but won't transfer to struct.struct...

@rgommers
Copy link
Copy Markdown
Contributor

rgommers commented May 4, 2026

Just in case, maybe you can remember any discussions around this? Given that the 'D' type code was already used in NumPy - I don't see good reasons for introducing such mismatch.

The PEP links this mailing list thread, which has 'Z' in multiple places. There isn't explicit discussion, but it's clear from the start of that thread that there are previous threads, not linked from the PEP anymore. I don't have time to do more archeology, and I don't think it's super relevant anymore what that history was. But if I ask Travis, I'm fairly sure the answer will be "people (or I) wanted a more general and extensible design here and preferred that over the Numeric/NumPy legacy".

@skirpichev
Copy link
Copy Markdown
Member

The PEP links this mailing list thread, which has 'Z' in multiple places.

"Multiple places" is the PEP text itself.

I don't have time to do more archeology

Sure, I understand. I did. And, I think - several people from the discussion thread. Without success, unfortunately.

"people (or I) wanted a more general and extensible design here and preferred that over the Numeric/NumPy legacy"

I don't see how preserving compatibility with existing type codes contradicts to this goal. (Including e.g. multiple-letters type codes.) Ok, perhaps it's my bad.

Thanks for explanation.

Comment thread Doc/whatsnew/3.15.rst Outdated
Comment thread Doc/whatsnew/3.15.rst Outdated
Comment thread Doc/library/ctypes.rst
Comment on lines +383 to +385
.. versionchanged:: next
The :py:attr:`~_SimpleCData._type_` types ``F``, ``D`` and ``G`` have been
replaced with ``Zf``, ``Zd`` and ``Zg``.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These _type_ codes were added in 3.14; wouldn't changing them break backwards compatibility?

Maybe in 3.16 we can deprecate the _type_ attribute itself and replace it with something better.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made this change on purpose and it's documented in What's New in Python 3.15 and in the ctypes documentation.

If you use c_double_complex, c_float_complex and c_longdouble_complex of the ctypes module, you shouldn't notice the _type_ change (it just works). If you create your own types by inherit from _SimpleCData, using F, D or G type no longer works with this change: you have to update your code to Zf, Zd and Zg formats.

vstinner added 2 commits May 4, 2026 15:08
* array: add missing 'e' format in an error message
* memoryview: get_native_fmtstr() checks that fmt[1] is not NUL
* ctypes: update error message
* Fix typo
@vstinner
Copy link
Copy Markdown
Member Author

vstinner commented May 4, 2026

Tests / CIFuzz / cpython3 (address) (pull_request)Failing after 10m
Tests / CIFuzz / cpython3 (memory) (pull_request)Failing after 6m

There are network issues. Example: E: Failed to fetch http://archive.ubuntu.com/ubuntu/pool/main/t/tk8.6/tk8.6_8.6.10-1_amd64.deb Connection failed [IP: 185.125.190.81 80].

@vstinner
Copy link
Copy Markdown
Member Author

vstinner commented May 4, 2026

arr = array.array("D", [1j])
memoryview(arr).format == "<D" # or similar
But it would be good to normalize it to Zd on the buffer protocol side

I tried to write a minimum non-controversial change for Python 3.15 beta1. It doesn't put any preference between D and Zd format for example. That can be done later once we reach an agreement.

@vstinner vstinner merged commit 6e6f905 into python:main May 4, 2026
52 of 54 checks passed
@vstinner vstinner deleted the complex_formats branch May 4, 2026 14:14
@vstinner
Copy link
Copy Markdown
Member Author

vstinner commented May 4, 2026

I merged this change in the main branch to make sure that it will be part of Python 3.15 beta1. We can fix remaining bugs and update the documentation between beta1 and Python 3.15 final. Thanks for your very useful reviews!

@vstinner
Copy link
Copy Markdown
Member Author

vstinner commented May 4, 2026

I created a follow-up PR: PR gh-149368 removes F and D formats from array and memoryview.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

10 participants