Skip to content

refactor: merge 4 C types into single XXHASHObject type#155

Open
ifduyue wants to merge 10 commits into
masterfrom
one-type-multiple-algo
Open

refactor: merge 4 C types into single XXHASHObject type#155
ifduyue wants to merge 10 commits into
masterfrom
one-type-multiple-algo

Conversation

@ifduyue

@ifduyue ifduyue commented May 30, 2026

Copy link
Copy Markdown
Owner

Following CPython _hashlib's pattern (one HASH type for all algorithms), merge the four per-algorithm C types (PYXXH32Object, PYXXH64Object, PYXXH3_64Object, PYXXH3_128Object) into a single XXHASHObject type.

Key changes:

  • Single struct with XXH_Algo enum + union of state pointers
  • Single set of tp_* methods with algo-based dispatch (switch)
  • Single methods/getseters/slots/spec (was 4 copies of each)
  • 4 module-level constructor functions (xxh32, xxh64, xxh3_64, xxh3_128) sharing a single _xxhash_new helper
  • One-shot functions consolidated via DEFINE_ONESHOT macro
  • PyCFunction constructors replace 4 separate type objects
  • Updated .pyi stubs: constructors are functions, not @Final classes

Results:

  • C source reduced by 37% (2271 -> 1421 lines, -845 lines)
  • Zero warnings on all supported Python versions (3.9-3.15, 3.14t)
  • All 124 tests pass across all versions

Following CPython _hashlib's pattern (one HASH type for all algorithms),
merge the four per-algorithm C types (PYXXH32Object, PYXXH64Object,
PYXXH3_64Object, PYXXH3_128Object) into a single XXHASHObject type.

Key changes:
- Single struct with XXH_Algo enum + union of state pointers
- Single set of tp_* methods with algo-based dispatch (switch)
- Single methods/getseters/slots/spec (was 4 copies of each)
- 4 module-level constructor functions (xxh32, xxh64, xxh3_64, xxh3_128)
  sharing a single _xxhash_new helper
- One-shot functions consolidated via DEFINE_ONESHOT macro
- PyCFunction constructors replace 4 separate type objects
- Updated .pyi stubs: constructors are functions, not @Final classes

Results:
- C source reduced by 37% (2271 -> 1421 lines, -845 lines)
- Zero warnings on all supported Python versions (3.9-3.15, 3.14t)
- All 124 tests pass across all versions

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the type stubs in xxhash/__init__.pyi to correctly define the xxh* constructors as functions returning a _Hasher instance rather than distinct classes, and updates the subinterpreter isolation tests accordingly. A review comment correctly identifies an incorrect alias assignment where xxh64 is assigned to xxh3_64 despite already being defined as a distinct function.

Comment thread xxhash/__init__.pyi Outdated
Comment on lines +73 to +74
# Aliases
xxh64 = xxh3_64 # type: ignore[assignment]

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The assignment xxh64 = xxh3_64 is incorrect. xxh64 is a distinct algorithm and has already been correctly defined as a function on line 69. It is not an alias of xxh3_64 (unlike xxh128 which is an alias of xxh3_128). This assignment should be removed to prevent type-checking errors and confusion.

Suggested change
# Aliases
xxh64 = xxh3_64 # type: ignore[assignment]
# Aliases

@codspeed-hq

codspeed-hq Bot commented May 30, 2026

Copy link
Copy Markdown
Contributor

Merging this PR will not alter performance

✅ 148 untouched benchmarks
⏩ 66 skipped benchmarks1


Comparing one-type-multiple-algo (7694b15) with master (945da72)

Open in CodSpeed

Footnotes

  1. 66 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

ifduyue added 2 commits May 30, 2026 16:27
Replace the generic _xxhash_oneshot() dispatcher (switch(algo) x switch(return_type))
with fully specialized per-algorithm/per-return-type functions generated by
DEFINE_ONESHOT_INT and DEFINE_ONESHOT_DH macros. Each function directly calls
the right xxhash C function and formats the result, avoiding the double-dispatch
overhead.

Also extract _xxhash_hexdigest_from_bytes() and _xxhash_xxh128_intdigest_result()
helpers, and remove the unused XXH_OneshotDispatch struct.

Benchmark result (xxh32_hexdigest, 1KB data):
  Before: 421.2 ns/call
  After:  353.3 ns/call  (~16% faster)
…l methods

Define XXH_AlgoDispatch (create_state, free_state, reset, update,
digest, intdigest, copy_state) and a const XXH_DISPATCH[] array
indexed by XXH_Algo, eliminating all 10 switch(self->algo) statements
in the stateful path (_xxhash_init_state, _xxhash_free_state,
_xxhash_reset_state, _xxhash_do_update, _xxhash_new, XXHASH_init,
XXHASH_digest, XXHASH_hexdigest, XXHASH_intdigest, XXHASH_copy).

Also remove the unused _free_module callback (PyType_FromModuleAndSpec
ties heap type lifetime to the module automatically).

Benchmarks show performance at parity or slightly improved
(constructor+hexdigest ~6-20% faster).
ifduyue added 4 commits June 5, 2026 16:30
The comment explains why the extra state copy is needed: xxHash v0.7.3 and
earlier have a bug where states reset with a seed will have a wild pointer
to the original state when copied, causing a use-after-free if the original
is freed.
The update() method is now on a unified XXHASHObject type supporting all
four algorithms (XXH32, XXH64, XXH3_64, XXH3_128), not just xxh32.
… repr

- XXHASH_new (tp_new) now raises TypeError instead of creating instances,
  matching _hashlib.HASH behavior. Only factory functions can create objects.
- Add XXHASH_repr (tp_repr) showing algorithm name: <XXH32 _xxhash.HASH object>
  matching CPython's <md5 _hashlib.HASH object> pattern.
- Rename type from 'xxhash.xxhash' to '_xxhash.HASH' to align with hashlib.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant