Fix exponential blow-up in skip_stages on chains of let-bound selects#9147
Conversation
In SkipStages::visit(Select), the two branches' per-Func .used / .loaded predicates were combined as `(t_used && cond) || (f_used && !cond)`. When both branches contributed the same Expr -- which is exactly what happens when both branches read the same let-stashed FuncInfo from an outer let -- make_or could not recognise the And nodes as equivalent (they aren't same_as even when their operands are), so the predicate roughly doubled in size at every nested Select. A long chain of CSE'd lets where each let value contains a Select then drove the predicate size to 2^N, well past the point where allocating the IR is feasible. Combine the two branches with `select(cond, t, f)` instead, and add a make_select helper that collapses `select(c, X, X) -> X` and the constant-cond cases. When both branches contributed the same Expr, make_select drops the condition immediately and the chain stays linear. The new correctness test (many_inlined_selects.cpp) constructs a 500- element CSE'd let chain whose values each carry a Param<bool>-gated Select, then feeds the chain into a final Select. With the bug present this test would not terminate -- skip_stages would crash allocating ~2^500 IR nodes long before any reasonable timeout fired. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
When an id is only touched on one branch of the Select, the previous code passed an undefined Expr to a `combine` helper that then turned `undefined` into const_false and built a `select(cond, X, false)` -- which is just `X && cond` dressed up as a select. Call make_and directly in those cases and keep make_select for the both-branches case, where the `select(c, X, X) -> X` collapse is the whole point. Also factor the "merge into old" body into a small helper to remove the duplication. No behaviour change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Lowercase the Func name, drop the unnecessary top-level select and output schedule, and make each chain entry depend on chain.back() so nothing gets eliminated as dead. The test still reproduces the pre-fix exponential blow-up (verified by reverting the fix: it times out at 30s on a 500-element chain). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #9147 +/- ##
=======================================
Coverage ? 69.77%
=======================================
Files ? 255
Lines ? 77556
Branches ? 18541
=======================================
Hits ? 54118
Misses ? 17962
Partials ? 5476 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
| Expr make_select(const Expr &c, const Expr &t, const Expr &f) { | ||
| if (is_const_one(c)) { | ||
| return t; | ||
| } | ||
| if (is_const_zero(c)) { | ||
| return f; | ||
| } | ||
| if (t.same_as(f)) { | ||
| return t; | ||
| } | ||
| return select(c, t, f); | ||
| } |
There was a problem hiding this comment.
This (and the helpers above) seems like the kind of thing that is likely to be useful elsewhere in the compiler, maybe even duplicated already. I suppose we didn't want to call simplify(Select::make(...)) here for some reason.
There was a problem hiding this comment.
True enough, but in this context eager simplification is absolutely required for this to not blow up. If I just made this the default behavior of operator&&/operator||/select I wouldn't have that clear guarantee at this point in the code.
SkipStages::visit(Select) was combining each branch's `loaded` predicate with the select condition (via make_select / make_and). That's wrong: Halide's Select evaluates both branches and only picks one of the results, so any load inside either branch fires unconditionally. The `loaded` predicate must be the OR of both branches, ungated by the condition. The bug caused allocation bounds inference to size affected buffers down to zero whenever the runtime condition was false, while the generated code still emitted a vectorized load from them -- a heap OOB read that showed up intermittently as a valgrind use-after-free on the truncated_pyramid test. Keep `used` gated by the condition as before (with the same select/and collapse that fixed the exponential blow-up in #9147). Add a regression test in skip_stages.cpp that records the minimum producer allocation size through a custom malloc handler and verifies it's non-zero when the producer's only consumer is inside a runtime-false select branch. Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
In SkipStages::visit(Select), the two branches' per-Func .used / .loaded predicates were combined as
(t_used && cond) || (f_used && !cond). When both branches contributed the same Expr -- which is exactly what happens when both branches read the same let-stashed FuncInfo from an outer let -- make_or could not recognise the And nodes as equivalent (they aren't same_as even when their operands are), so the predicate roughly doubled in size at every nested Select. A long chain of CSE'd lets where each let value contains a Select then drove the predicate size to 2^N, well past the point where allocating the IR is feasible.Combine the two branches with
select(cond, t, f)instead, and add a make_select helper that collapsesselect(c, X, X) -> Xand the constant-cond cases. When both branches contributed the same Expr, make_select drops the condition immediately and the chain stays linear.The new correctness test (many_inlined_selects.cpp) constructs a 500- element CSE'd let chain whose values each carry a Param-gated Select, then feeds the chain into a final Select. With the bug present this test would not terminate -- skip_stages would crash allocating ~2^500 IR nodes long before any reasonable timeout fired.