Skip to content

Replace BTreeMap in PortConnectivity with a sorted Vec; semi-naive iteration replacing in-place updates#802

Merged
frankmcsherry merged 14 commits into
masterfrom
port_connectivity_vec
Jun 12, 2026
Merged

Replace BTreeMap in PortConnectivity with a sorted Vec; semi-naive iteration replacing in-place updates#802
frankmcsherry merged 14 commits into
masterfrom
port_connectivity_vec

Conversation

@frankmcsherry

@frankmcsherry frankmcsherry commented Jun 12, 2026

Copy link
Copy Markdown
Member

PortConnectivity now stores a sorted Vec<(usize, Antichain<TS>)> with lazy consolidation (appends set a dirty bit; reads expect consolidated data, with freeze points at Builder::add_node and PerOperatorState::new). The summarize_outputs fixed point is restructured into semi-naive rounds over log-structured sorted runs, removing its HashMap accumulator; the changed-bit insert methods it required are removed as dead.

This excises all BTreeMap (and the one HashMap) instantiations from the progress subsystem: generated IR for the bfs example drops 9.1% (254,560 -> 231,310 llvm-lines; btree machinery 8,723 -> 0). Construction remains O(n log n) under any insertion order; shape_scaling timings at 10k/100k operators are unchanged or slightly improved.

(ed: there are some O(n log^2 n) datastructures now, to avoid a one-off LSM implementation)

Measurements

All on master @ 0b23c249 vs this PR @ 1024c32d, release builds,
Apple Silicon (darwin 25.1.0). Scaling tests timed over 3 runs each;
RSS sampled via ps after construction completes, 2 runs each.

Generated code (bfs example):

master PR Δ
cargo llvm-lines total 254,560 lines / 5,092 copies 236,657 / 5,034 −7.0%
post-opt IR, function bodies 252,020 lines 230,458 −8.6%
BTreeMap machinery 8,723 lines / 10 fns 0 / 0 −100%

Construction scaling (tests/shape_scaling.rs, with temporary 1M variants):

Test master PR Δ
operator_scaling 1M 3.87–4.17s 2.63–2.99s −32%
subgraph_scaling 1M 4.67–4.69s 3.07s −34%
operator_scaling 100k 0.34s 0.25–0.26s −24%
subgraph_scaling 100k 0.38–0.39s 0.28–0.29s −26%

Both sides scale ≈11–12× from 100k → 1M (log factors, no quadratic
behavior on either side).

event_driven 1000 × 1000:

master PR Δ
peak RSS after build 5,376 MB 4,321 MB −1,055 MB (−19.6%)
dataflow build time 1.53–1.56s 1.50–1.53s parity / slightly better

frankmcsherry and others added 14 commits June 12, 2026 07:31
…mmarize_outputs

PortConnectivity now stores a sorted Vec<(usize, Antichain<TS>)> with lazy
consolidation (appends set a dirty bit; reads expect consolidated data,
with freeze points at Builder::add_node and PerOperatorState::new). The
summarize_outputs fixed point is restructured into semi-naive rounds over
log-structured sorted runs, removing its HashMap accumulator; the
changed-bit insert methods it required are removed as dead.

This excises all BTreeMap (and the one HashMap) instantiations from the
progress subsystem: generated IR for the bfs example drops 9.1%
(254,560 -> 231,310 llvm-lines; btree machinery 8,723 -> 0). Construction
remains O(n log n) under any insertion order; shape_scaling timings at
10k/100k operators are unchanged or slightly improved.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The cycle check already observed every location, so the map was dense in
content; index in-degree counts by per-node location offsets instead.
This removes the last std collection map from the progress subsystem.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
add_port now merges summaries at a port by antichain insertion, matching
insert's semantics; it is the builder's responsibility to introduce
multiple summaries for a port only when multiple paths exist.

Operate::initialize is now documented to return connectivity in canonical
(consolidated) form, and the in-tree implementations (builder_raw,
Subgraph) establish that form before returning. The consolidation at the
consumer in PerOperatorState::new remains as defense in depth against
foreign Operate implementations.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
PortConnectivity gains is_consolidated(), and a Consolidate extension
trait gives the Connectivity alias consolidate()/is_consolidated() verbs,
so boundaries state the invariant as a predicate rather than prose.
Operate::initialize's contract now reads 'must satisfy is_consolidated';
builder_raw consolidates its summary in place before cloning (leaving the
retained copy canonical as well); the defense-in-depth site in
PerOperatorState::new debug_asserts the contract before repairing.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
insert now literally delegates to add_port (the in-place fast path was an
unproven optimization); type and method docs state requirements without
explaining the implementation; builder_raw's initialize takes mut self
rather than rebinding.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Split PortConnectivity into an append-only PortConnectivityBuilder
(insert/add_port/FromIterator, freeze) and a frozen, always-canonical
PortConnectivity (get/iter_ports only). Deletes the dirty bit, the
read-side debug_asserts, the Consolidate extension trait, the
defense-in-depth consolidation in PerOperatorState::new and
Builder::add_node, and the documented temporal contract on
Operate::initialize, whose return type now carries the invariant.

builder_rc's shared Rc<RefCell<PortConnectivity>> (also held by input
handles and capabilities) stays frozen; new_output_connection re-freezes
it in place via mem::take + into_builder + freeze. handles.rs and
capability.rs are unchanged.

Sizing notes in typed-split-sizing.md. cargo test -p timely passes.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The summaries shared with input handles are created at new_input, mutated
only during construction, and read only at runtime through capabilities.
The builder now accumulates them in PortConnectivityBuilders and freezes
each into a shared OnceCell when the operator is built, so the lifecycle
(write once at build, immutable thereafter) is expressed in the type and
the re-freeze-in-place dance disappears along with RefCell borrows on the
capability read path.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…sizing doc

into_builder's only caller was the re-freeze dance that the OnceCell
sharing removed; freeze is one-way. The summaries field comment now
matches the tuple order, set() failure uses expect, and the exploratory
sizing document leaves the repository.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
merge_disjoint uses the while-let-both-Some then extend pattern, and
summarize_outputs's per-round worklist avoids the name frontier, which
means something else throughout this crate.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Replaces the Vec<Vec<_>> levels in summarize_outputs with BinaryRuns: a
single vector whose length's binary representation reveals its sorted
runs, largest first. Batches of novel keys merge trailing runs as in
binary addition (the smallest run is always the lowest set bit of the
current length), keeping introduction amortized logarithmic and lookups
to at most logarithmically many runs, in one allocation.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Introducing a batch now extends the vector and sort_unstables the suffix
past the runs shared between the old and new lengths' decompositions.
This removes merge_disjoint and the carry cascade entirely, costs a log
factor more in comparisons (n log^2 n total), and runs in place, once
per dataflow construction, on modest n.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The lengths' xor locates the highest bit on which they disagree; runs at
the agreeing bits above it stay put, replacing the bit-scanning loop.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Same-key proposals within a round first collapse into an antichain, so
only elements that survive both the round's own batch and insertion into
the accumulated antichain are shipped as next-round work; previously
every improving step shipped, including ones dominated within the same
round. Also inlines the final into_sorted call.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Antichain::drain yields owned elements while leaving the allocation for
reuse, mirroring MutableAntichain::update_iter's smallvec::Drain. The
proposal-collapsing batch in summarize_outputs is hoisted out of the key
loop: existing keys drain it (no clones, allocation retained), and fresh
keys surrender it via mem::take, which fresh must own anyway.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@frankmcsherry frankmcsherry merged commit 0823096 into master Jun 12, 2026
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant