gh-148675: Add Zd/Zf formats to array, ctypes, memoryview, struct#148676
gh-148675: Add Zd/Zf formats to array, ctypes, memoryview, struct#148676vstinner merged 23 commits intopython:mainfrom
Conversation
This comment was marked as outdated.
This comment was marked as outdated.
|
Maybe >>> import array
>>> array.typecodes
'bBuwhHiIlLqQefdFDZfZd'
>>> "Z" in array.typecodes
True
>>> list(array.typecodes)
['b', 'B', 'u', 'w', 'h', 'H', 'i', 'I', 'l', 'L', 'q', 'Q', 'e', 'f', 'd', 'F', 'D', 'Z', 'f', 'Z', 'd'] |
| IEEE_754_DOUBLE_COMPLEX_BE : IEEE_754_DOUBLE_COMPLEX_LE; | ||
|
|
||
| case 'Z': { | ||
| switch (typecode[1]) { |
There was a problem hiding this comment.
What if only "Z" is given with nothing following? Is that error handled correctly?
There was a problem hiding this comment.
I think it's handled by the default clause. In this case typecode is a null-terminated string like "Z" or something else.
There was a problem hiding this comment.
The string "Z" goes to default: return UNKNOWN_FORMAT; case. I added an explicit test to test_array.
Documentation says it's a string. Probably, it's a way to go, but it's not a backward-compatible change. I more worry that you essentially introduce a second set of format types, a little illustration: Python 3.15.0a8 (heads/test-vstinner-patch:2aabdf41600, May 1 2026, 08:50:37) [GCC 14.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import numpy as np
>>> import array
>>> a1 = array.array('D', [1, 2])
>>> np.array(a1)
Traceback (most recent call last):
...
Traceback (most recent call last):
File "<python-input-3>", line 1, in <module>
np.array(a1)
~~~~~~~~^^^^
ValueError: 'D' is not a valid PEP 3118 buffer format string
>>> a2 = array.array('Zd', [1, 2])
>>> np.array(a2)
array([1.+0.j, 2.+0.j])One is able to interoperate with NumPy & Co, other - not. IMO, this is something, that should be explained in docs. Or this will confuse users. BTW, the ctypes change is not documented. |
Test also that "Z" typecode is invalid.
|
The Steering Council is fine with this change, so I mark it as ready for review:
@encukou @skirpichev @serhiy-storchaka: It would be nice if you could review this change before next Tuesday (Python 3.15 beta1, feature freeze). I updated the PR to change |
c_float_complex, c_double_complex and c_longdouble_complex are not modified by this PR, they still use the formats: "F", "D", and "G". How would you like to document the
This PR adds Zf/Zd formats for compatibility with numpy. It keeps F/D formats for backward compatibility. There were discussions on removing or deprecating F/D formats. I would prefer to defer that to Python 3.16. Some people even asked for a PEP for such change. The timing until Python 3.15 beta1 (next Tuesday!) is too short for such large change. The array and struct documentation can be enhanced after Python 3.15 beta1 to clarify which formats are preferred. |
| TABLE_ENTRY(Zd, &ffi_type_complex_double); | ||
| TABLE_ENTRY(Zf, &ffi_type_complex_float); | ||
| TABLE_ENTRY(Zg, &ffi_type_complex_longdouble); |
There was a problem hiding this comment.
These type codes available in ctypes alongside with old. Shouldn't this be documented, no?
There was a problem hiding this comment.
I documented the change at 3 places: Changelog entry, What's New entry, and I documented the formats change in the ctypes documentation with a versionchanged markup. It should be enough, no?
Documentation build overview
92 files changed ·
|
|
@skirpichev: I'm not sure if it makes sense to add Should I just revert |
|
@hpkfft: Do we really have to change $ python
Python 3.14.4 (main, Apr 8 2026, 00:00:00) [GCC 15.2.1 20260123 (Red Hat 15.2.1-7)] on linux
>>> import ctypes
>>> ctypes.c_double_complex._type_
'D'
>>> import numpy._core._dtype_ctypes
>>> numpy._core._dtype_ctypes.dtype_from_ctypes_type(ctypes.c_double_complex)
dtype('complex128')
>>> numpy._core._dtype_ctypes.dtype_from_ctypes_type(ctypes.c_double_complex * 3)
dtype(('<c16', (3,)))
>>> class MyStruct(ctypes.Structure):
... _fields_ = [('x', ctypes.c_double_complex)]
...
>>> ctypes.sizeof(MyStruct)
16
>>> numpy._core._dtype_ctypes.dtype_from_ctypes_type(MyStruct)
dtype([('x', '<c16')], align=True) |
I suspect like struct & co. Duplicated types.
Ok, probably I should have more close look. No, if your change provide correct type codes for the buffer protocol in the ctypes.
I'm not sure that this improves numpy compatibility. NumPy uses short codes for numpy.array, numpy.dtype and so on. Short codes were used in the CPython for compatibility with NumPy. For me, it seems NumPy people do care about format types, used for the buffer protocol, but have no clear vision on what to do for other interfaces, like the struct module. I'll try to provide a patch, that uses PEP 3118 type codes only for the buffer protocol. But we out of time.
I think the documentation at least should prevent bugs like "why you are using 'Zd' in the struct module - NumPy uses 'D'!".
The brain split with duplicated types should be temporary. If we have no clear proposal on this - lets not merge such change. Will this improve NumPy compatibility? I don't thinks so. NumPy people don't want to get rid of inconsistencies in the type codes. Fine. But lets wait at least some suggestions on how we should handle them in the CPython! Meanwhile, reversion of disturbing changes in the 3.15 - seems to be the best option for me. BTW, the test_buffer.py has many tests for the buffer protocol. I think you should add tests for new formats here too. |
|
Ping @seberg since you brought this up originally from NumPy. |
|
Summary of supported complex formats. Python 3.14:
Python 3.15 with this PR:
|
|
This looks great! I really appreciate everyone's diligence and commitment!
I don't think you need my opinion on this any more, which is good because I don't use ctypes and cannot give an informed answer. It certainly looks good to me that the |
Sure. But how about something like On another hand, if we are going to use two-letters codes, why not drop |
| | ``Zf`` | :c:expr:`float complex` | complex | 8 | \(10) | | ||
| +--------+--------------------------+--------------------+----------------+------------+ | ||
| | ``Zd`` | :c:expr:`double complex` | complex | 16 | \(10) | | ||
| +--------+--------------------------+--------------------+----------------+------------+ |
There was a problem hiding this comment.
I suggest leave one entry per data type with either one-letter or two-letters code. And mention an alternative in the note, perhaps with a deprecation. Technically, no new types here - just aliases.
There was a problem hiding this comment.
I would prefer this decision after Python 3.15 beta1, and don't put any preference for now in the beta1.
@seberg, @rgommers, @mattip, would NumPy change to allow >>> np.array([1, 2, 3], dtype='D')
array([1.+0.j, 2.+0.j, 3.+0.j])
>>> np.array([1, 2, 3], dtype='Zd')
Traceback (most recent call last):
File "<python-input-3>", line 1, in <module>
np.array([1, 2, 3], dtype='Zd')
~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^
TypeError: data type 'Zd' not understoodIf so, dropping |
gpshead
left a comment
There was a problem hiding this comment.
One actual bug (trailing junk chars silently accepted), otherwise just a bunch of minor nits.
| Py_XDECREF(it); | ||
| PyErr_SetString(PyExc_ValueError, | ||
| "bad typecode (must be b, B, u, w, h, H, i, I, l, L, q, Q, f or d)"); | ||
| "bad typecode (must be b, B, u, w, h, H, i, I, l, L, q, Q, f, d, F, D, Zd or Zf)"); |
There was a problem hiding this comment.
add the already missing e to this list... could we unittest that the list matches descriptors[] or just programatically generate the error message? (its an uncommon error, building the message at raise time feels fine)
| return (x[0] == y[0]) && (x[1] == y[1]); | ||
| } | ||
| case 'Z': { | ||
| switch (fmt[1]) { |
There was a problem hiding this comment.
defensive coding, add a default: to this switch. return MV_COMPARE_NOT_IMPL?
| # Format codes supported by array.array | ||
| ARRAY = NATIVE.copy() | ||
| for k in NATIVE: | ||
| if not k in "bBhHiIlLfd": |
There was a problem hiding this comment.
Here is a bug in tests. Maybe you can fix this in the pr? 'e' format type is missing too.
After a first look at this: probably not. That seems to conflate NumPy typecodes (which are all single-character) with buffer protocol format codes. The Trying to unify all this seems like a recipe for more pain, that's just going to keep rippling outward, and we definitely cannot deprecate F/D/G in NumPy. I'd rather just say that these are two sets of character/string codes for data types that are distinct, and that need conversion (like Using these typecodes for dtype specifiers is poor practice anyway: if you have any code like |
|
@rgommers, thanks. I've added link to your post in the SC issue. Am I correct that adding dtypes='Zd' as an alias for dtypes='D' is not an option for NumPy? If so, I don't think that proposed solution is a good idea for the Python. We will have to keep duplicated "types" forever, this will confuse users. Better to revert added in the 3.15 changes and deprecate complex types in the struct module. Maybe upcoming PEP could address this issue better.
Just in case, maybe you can remember any discussions around this? Given that the 'D' type code was already used in NumPy - I don't see good reasons for introducing such mismatch. |
|
I agree with Ralf, these single character codes aren't really the preferred user API from our perspective. FWIW, I think I said it before: If As mentioned before, from our perspective both are very much distinct: users don't see the buffer protocol. |
seberg
left a comment
There was a problem hiding this comment.
Thanks a lot for workign on this! I have really only one comment: Right now it seems to me that:
arr = array.array("D", [1j])
memoryview(arr).format == "<D" # or similar
I am not sure at what point you want to normalize this (i.e. arr.typecode already? But normalizing in any direction seems a choice you can make freely from my PoV).
But it would be good to normalize it to Zd on the buffer protocol side (just a new entry, usually identical to use descr->bufcode rather than `descr->typecode).
(As per discussion, I am not sure we'll add Zd as a type-code short-hand in NumPy without new API, but I don't think you need to worry about that.)
| stginfo->flags |= TYPEFLAG_ISPOINTER; | ||
| case 'Z': | ||
| if (proto_str[1] == '\0') { | ||
| /* "Z": c_wchar_p */ |
There was a problem hiding this comment.
I didn't realize Z was already used here, I guess it'll work in practice that this is always a Z on it's own, but won't transfer to struct.struct...
The PEP links this mailing list thread, which has 'Z' in multiple places. There isn't explicit discussion, but it's clear from the start of that thread that there are previous threads, not linked from the PEP anymore. I don't have time to do more archeology, and I don't think it's super relevant anymore what that history was. But if I ask Travis, I'm fairly sure the answer will be "people (or I) wanted a more general and extensible design here and preferred that over the Numeric/NumPy legacy". |
"Multiple places" is the PEP text itself.
Sure, I understand. I did. And, I think - several people from the discussion thread. Without success, unfortunately.
I don't see how preserving compatibility with existing type codes contradicts to this goal. (Including e.g. multiple-letters type codes.) Ok, perhaps it's my bad. Thanks for explanation. |
| .. versionchanged:: next | ||
| The :py:attr:`~_SimpleCData._type_` types ``F``, ``D`` and ``G`` have been | ||
| replaced with ``Zf``, ``Zd`` and ``Zg``. |
There was a problem hiding this comment.
These _type_ codes were added in 3.14; wouldn't changing them break backwards compatibility?
Maybe in 3.16 we can deprecate the _type_ attribute itself and replace it with something better.
There was a problem hiding this comment.
I made this change on purpose and it's documented in What's New in Python 3.15 and in the ctypes documentation.
If you use c_double_complex, c_float_complex and c_longdouble_complex of the ctypes module, you shouldn't notice the _type_ change (it just works). If you create your own types by inherit from _SimpleCData, using F, D or G type no longer works with this change: you have to update your code to Zf, Zd and Zg formats.
* array: add missing 'e' format in an error message * memoryview: get_native_fmtstr() checks that fmt[1] is not NUL * ctypes: update error message * Fix typo
There are network issues. Example: |
I tried to write a minimum non-controversial change for Python 3.15 beta1. It doesn't put any preference between D and Zd format for example. That can be done later once we reach an agreement. |
|
I merged this change in the main branch to make sure that it will be part of Python 3.15 beta1. We can fix remaining bugs and update the documentation between beta1 and Python 3.15 final. Thanks for your very useful reviews! |
|
I created a follow-up PR: PR gh-149368 removes |
F/D/Gcomplex format withZf/Zd/Zg.array,ctypesandstructmodules to support format strings longer than 1 character (such as"Zd").array.typecodestype fromstrtotuple.ZfandZd#148675📚 Documentation preview 📚: https://cpython-previews--148676.org.readthedocs.build/