Fix pushed cast nullability conversion#8498
Conversation
Use DataFusion's physical expression nullability when converting pushed casts into Vortex expressions. This keeps target-field metadata from turning nullable physical casts into non-null Vortex runtime assertions. Signed-off-by: Nicholas Gates <nick@nickgates.com>
| pub(crate) fn make_vortex_predicate( | ||
| expr_convertor: &dyn ExpressionConvertor, | ||
| predicate: &[Arc<dyn PhysicalExpr>], | ||
| input_schema: &Schema, |
There was a problem hiding this comment.
should we replace the current convert?
Merging this PR will improve performance by 13.89%
|
| Mode | Benchmark | BASE |
HEAD |
Efficiency | |
|---|---|---|---|---|---|
| ❌ | Simulation | chunked_varbinview_canonical_into[(100, 100)] |
223.8 µs | 258.7 µs | -13.49% |
| ❌ | Simulation | chunked_varbinview_opt_canonical_into[(100, 100)] |
304.7 µs | 339.7 µs | -10.3% |
| ⚡ | Simulation | decompress_rd[f64, (10000, 0.01)] |
139 µs | 108.4 µs | +28.22% |
| ⚡ | Simulation | decompress_rd[f64, (10000, 0.1)] |
139.3 µs | 108.7 µs | +28.17% |
| ⚡ | Simulation | decompress_rd[f64, (10000, 0.0)] |
138.9 µs | 108.4 µs | +28.14% |
| ⚡ | Simulation | chunked_varbinview_opt_canonical_into[(1000, 10)] |
206.7 µs | 170.3 µs | +21.33% |
| ⚡ | Simulation | take_10k_first_chunk_only |
271.7 µs | 226.9 µs | +19.72% |
| ⚡ | Simulation | take_10k_dispersed |
285.5 µs | 240.8 µs | +18.53% |
| ⚡ | Simulation | decompress_rd[f32, (100000, 0.0)] |
583.6 µs | 495.6 µs | +17.76% |
| ⚡ | Simulation | decompress_rd[f32, (10000, 0.1)] |
91 µs | 77.7 µs | +17.14% |
| ⚡ | Simulation | decompress_rd[f32, (10000, 0.01)] |
90.8 µs | 77.7 µs | +16.81% |
| ⚡ | Simulation | decompress_rd[f32, (10000, 0.0)] |
91 µs | 78.2 µs | +16.41% |
| ⚡ | Simulation | bitwise_not_vortex_buffer_mut[128] |
244.4 ns | 215.3 ns | +13.55% |
| ⚡ | Simulation | patched_take_10k_adversarial |
260.3 µs | 230.2 µs | +13.05% |
| ⚡ | Simulation | chunked_varbinview_into_canonical[(100, 100)] |
307.3 µs | 272 µs | +12.98% |
| ⚡ | Simulation | patched_take_10k_first_chunk_only |
303.4 µs | 273.2 µs | +11.06% |
| ⚡ | Simulation | bitwise_not_vortex_buffer_mut[1024] |
304.7 ns | 275.6 ns | +10.58% |
| ⚡ | Simulation | patched_take_10k_dispersed |
317.5 µs | 287.3 µs | +10.49% |
Tip
Investigate this regression by commenting @codspeedbot fix this regression on this PR, or directly use the CodSpeed MCP with your agent.
Comparing ngates/fix-cast-nullability (1f97053) with develop (02e085f)
| let child = self.convert(cast_expr.expr().as_ref())?; | ||
| let mut cast_dtype = DType::from_arrow(cast_expr.target_field().as_ref()); | ||
| if let Some(input_schema) = input_schema { | ||
| cast_dtype = cast_dtype.with_nullability(cast_expr.nullable(input_schema)?.into()); |
There was a problem hiding this comment.
Inside CastExpr::nullable there's this comment:
A cast is nullable if either the child is nullable or the
target field allows nulls. This conservative rule prevents
optimizers from assuming a non-null result when a null input could
still propagate.return_field()continues to expose the exact
target metadata separately.
So I think this is correct? Did you run into it with some real world example or just a random AI thing?
Use DataFusion's physical expression nullability when converting pushed casts into Vortex expressions. This keeps target-field metadata from turning nullable physical casts into non-null Vortex runtime assertions.