Skip to content

whir: avoid serial 256 MiB copy of combined-statement weights#250

Merged
TomWambsgans merged 1 commit into
leanEthereum:devnet4from
lambdaclass:fix/whir-combine-statement-arena-copy
Jun 9, 2026
Merged

whir: avoid serial 256 MiB copy of combined-statement weights#250
TomWambsgans merged 1 commit into
leanEthereum:devnet4from
lambdaclass:fix/whir-combine-statement-arena-copy

Conversation

@MegaRedHand

Copy link
Copy Markdown

Problem

The merge of main into devnet4 (8eec56c) changed MleOwned to hold an ArenaVec, but combine_statement in crates/whir/src/open.rs still returned a glibc-heap Vec<EFPacking<EF>>, which was then bridged with ArenaVec::from_slice(&weights). At n_vars=24 that is a single-threaded memcpy of ~256 MiB per proof while all worker threads sit idle, plus the same data traversing the memory hierarchy twice.

This was found while investigating a machine-specific perf regression: the copy is a fixed serial tax competing against the merge's parallel-kernel gains. On a Zen4 box the gains outweigh the tax; on a Zen5 box (Ryzen 9700X, smaller remaining gains) it showed up as a net −5.3% in XMSS aggregation throughput. --tracing pinpointed it: run_initial_sumcheck_rounds self time went from 0.01% to 7.94% (~25 ms) with identical children, and perf showed ArenaVec::from_slice only in the regressed binary.

Fix

Build the weights directly in an ArenaVec so it is moved, never copied. All writers (compute_eval_eq_packed*, split_at_mut_many) take &mut [T] and work unchanged via deref. The ArenaVec is created inside the same proving phase where it was previously copied into one, so arena-phase semantics are unchanged. This matches how main already does it; the bug is devnet4-only.

Results (Zen5, Ryzen 7 9700X, interleaved runs, mean of --repeat 10)

binary time (s) XMSS/s vs pre-merge c39f0ff
pre-merge c39f0ff 0.2964 215.9
a44bca4 unpatched 0.3129 204.6 −5.3%
a44bca4 + fix 0.2900 220.7 +2.2%

Patched trace shows run_initial_sumcheck_rounds self time back to 0.00%. Proof size unchanged (265 KiB).

Testing

  • cargo test -p mt-whir -p sub_protocols --release passes (incl. end-to-end prove/verify test_run_whir)
  • cargo fmt --check and cargo clippy -p mt-whir clean

The merge of main into devnet4 (8eec56c) changed MleOwned to hold an
ArenaVec, but combine_statement still returned a heap Vec, which was
then bridged with ArenaVec::from_slice. At n_vars=24 that is a
single-threaded memcpy of ~256 MiB of extension elements per proof
while all worker threads sit idle, and the data crosses the memory
hierarchy twice.

Build the weights directly in an ArenaVec so it is moved, never
copied. All writers (compute_eval_eq_packed*, split_at_mut_many) take
&mut [T] and work unchanged via deref. The ArenaVec is created inside
the same proving phase where it was previously copied into one, so
arena-phase semantics are unchanged.

On a Zen5 box (Ryzen 9700X) this turns a -5.3% XMSS-aggregation
regression vs pre-merge into a +2.2% improvement (215.9 -> 220.7
XMSS/s); run_initial_sumcheck_rounds self time drops from 7.94% back
to 0.00%. Proof size is unchanged.
@TomWambsgans TomWambsgans merged commit 2498896 into leanEthereum:devnet4 Jun 9, 2026
@MegaRedHand MegaRedHand deleted the fix/whir-combine-statement-arena-copy branch June 9, 2026 19:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants