feat: adds array_add function#22459
Conversation
|
Heads up — the One-command fix: ./dev/update_function_docs.sh
git add docs/source/user-guide/sql/scalar_functions.mdThen amend or new-commit + push. |
| Ok(arg_types[0].clone()) | ||
| } | ||
|
|
||
| fn coerce_types(&self, arg_types: &[DataType]) -> Result<Vec<DataType>> { |
There was a problem hiding this comment.
The logic in here seems to differ from what we usually have, e.g.
datafusion/datafusion/functions-nested/src/cosine_distance.rs
Lines 100 to 139 in ba240b2
Is there a reason for this?
There was a problem hiding this comment.
@Jefffrey I have added more checks like
- List of List is handled upfront. List<List> throws error as non numeric
- Decimal128 and Decimal256 types throws error as downcasting it would loose precision.
- throws error if element type is non-numeric to avoid automatic typecasting of boolean or string type to numeric.
Let me know if I should get rid of these checks.
There was a problem hiding this comment.
cc @crm26
We should try to ensure these array math functions are consistent with each other on the signature they accept
There was a problem hiding this comment.
@Jefffrey can we make this a util function?
There was a problem hiding this comment.
That could be useful to keep them aligned, until we have better support in the signature API for these types of behaviour
There was a problem hiding this comment.
@Jefffrey Create utility function coerce_array_math_arg_types with same implementation as other existing functions. We can add more checks to common function if needed.
## Which issue does this PR close? Partial of apache#21536 — `array_scale` (the list+scalar arithmetic function in the vector math series). ## Rationale for this change Continues the per-function split requested by @alamb on apache#21536. Three sibling PRs already merged: `cosine_distance` (apache#21542), `inner_product` (apache#21861), `array_normalize` (apache#22013). `array_add` is in flight as apache#22459 by @SubhamSinghal. Adds element-wise scalar multiplication for numeric arrays, returning a list of the same shape. Aliased as `list_scale` to match the `array_X` / `list_X` precedent in this crate. ## What changes are included in this PR? - New scalar UDF `array_scale(array, scalar)` in `datafusion/functions-nested/src/array_scale.rs` - Module wire-up + registration in `datafusion/functions-nested/src/lib.rs` - SLT tests at `datafusion/sqllogictest/test_files/array_scale.slt` - Auto-generated function docs entry in `docs/source/user-guide/sql/scalar_functions.md` **Signature:** first arg `List/LargeList/FixedSizeList<numeric>`, second arg numeric scalar. Both coerce to `Float64`. Same list-widening rules as the binary-op siblings. **NULL semantics:** - NULL row in array → NULL row out - NULL scalar → NULL row out (whole-row, because the scalar applies uniformly) - NULL element at position \`i\` → NULL element at \`i\` out (per-element propagation) - Empty array → empty array **Builders:** uses \`OffsetBufferBuilder\` + \`NullBufferBuilder\` per the pattern adopted in the round-1 review of apache#22013. ## Are these changes tested? Yes. \`array_scale.slt\` covers: - Happy paths (positive, negative, zero, fractional, single-element) - NULL propagation at all three levels (NULL row, NULL scalar, NULL element) - All list type variants (\`List\`, \`LargeList\`, \`FixedSizeList\`) - Numeric inner type coercion (Float32, Int64, integer literals) - Multi-row queries with both constant-scalar broadcast and per-row column scalar - Error paths (non-numeric scalar, non-list first arg, wrong arity) - Empty array - \`list_scale\` alias ## Are there any user-facing changes? Yes — new SQL scalar function \`array_scale(array, scalar)\` and its alias \`list_scale\`. Documented in \`docs/source/user-guide/sql/scalar_functions.md\`.
|
@Jefffrey Resolved merge conflict. |
## Which issue does this PR close? Partial of apache#21536 — `array_sum` (first of the array aggregates in the series). ## Rationale for this change Continues the per-function split sequence requested by @alamb on apache#21536. Four sibling PRs already merged: `cosine_distance` (apache#21542), `inner_product` (apache#21861), `array_normalize` (apache#22013), `array_scale` (apache#22466). `array_add` is in flight as apache#22459 by @SubhamSinghal. `array_sum` is the first of the three array-aggregate functions (sum, product, avg). Its semantics set the pattern for the other two aggregates. ## What changes are included in this PR? - New scalar UDF `array_sum(array)` in `datafusion/functions-nested/src/array_sum.rs` - Module wire-up + registration in `datafusion/functions-nested/src/lib.rs` - SLT tests at `datafusion/sqllogictest/test_files/array_sum.slt` - Auto-generated docs entry in `docs/source/user-guide/sql/scalar_functions.md` **Signature:** \`List/LargeList/FixedSizeList<numeric>\` in, \`Float64\` out (one scalar per row). Numeric inner types coerced to \`Float64\`. **NULL semantics — SQL aggregate convention (deliberate divergence from binary-op siblings):** - NULL row → NULL row out - NULL elements are **skipped**, matching PostgreSQL \`array_sum\`, DuckDB \`list_sum\`, Spark \`aggregate\`. Binary-op siblings (\`inner_product\`, \`array_normalize\`) null-row on NULL element because their per-element operation is undefined on NULL; aggregates conventionally skip NULLs in SQL. - All-NULL row → NULL out (matches \`SUM(...)\` over an all-NULL column) - **Empty array → NULL** (matches sibling `array_product` apache#22703, PostgreSQL, DuckDB `list_sum`, SQL Standard SUM-of-empty-set) **Alias:** \`list_sum\` (matches the precedent of \`array_normalize\`→\`list_normalize\`, \`array_scale\`→\`list_scale\`). ## Are these changes tested? Yes. SLT covers happy paths, empty arrays, NULL row, NULL elements (mix + all-NULL), all list variants (List/LargeList/FixedSizeList), numeric coercion (Float32/Int64/integer literals), multi-row composition, error paths, return type, and the \`list_sum\` alias. ## Are there any user-facing changes? Yes — new SQL scalar function \`array_sum(array)\` and its alias \`list_sum\`. --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
## Which issue does this PR close? Partial of apache#21536 — `array_sum` (first of the array aggregates in the series). ## Rationale for this change Continues the per-function split sequence requested by @alamb on apache#21536. Four sibling PRs already merged: `cosine_distance` (apache#21542), `inner_product` (apache#21861), `array_normalize` (apache#22013), `array_scale` (apache#22466). `array_add` is in flight as apache#22459 by @SubhamSinghal. `array_sum` is the first of the three array-aggregate functions (sum, product, avg). Its semantics set the pattern for the other two aggregates. ## What changes are included in this PR? - New scalar UDF `array_sum(array)` in `datafusion/functions-nested/src/array_sum.rs` - Module wire-up + registration in `datafusion/functions-nested/src/lib.rs` - SLT tests at `datafusion/sqllogictest/test_files/array_sum.slt` - Auto-generated docs entry in `docs/source/user-guide/sql/scalar_functions.md` **Signature:** \`List/LargeList/FixedSizeList<numeric>\` in, \`Float64\` out (one scalar per row). Numeric inner types coerced to \`Float64\`. **NULL semantics — SQL aggregate convention (deliberate divergence from binary-op siblings):** - NULL row → NULL row out - NULL elements are **skipped**, matching PostgreSQL \`array_sum\`, DuckDB \`list_sum\`, Spark \`aggregate\`. Binary-op siblings (\`inner_product\`, \`array_normalize\`) null-row on NULL element because their per-element operation is undefined on NULL; aggregates conventionally skip NULLs in SQL. - All-NULL row → NULL out (matches \`SUM(...)\` over an all-NULL column) - **Empty array → NULL** (matches sibling `array_product` apache#22703, PostgreSQL, DuckDB `list_sum`, SQL Standard SUM-of-empty-set) **Alias:** \`list_sum\` (matches the precedent of \`array_normalize\`→\`list_normalize\`, \`array_scale\`→\`list_scale\`). ## Are these changes tested? Yes. SLT covers happy paths, empty arrays, NULL row, NULL elements (mix + all-NULL), all list variants (List/LargeList/FixedSizeList), numeric coercion (Float32/Int64/integer literals), multi-row composition, error paths, return type, and the \`list_sum\` alias. ## Are there any user-facing changes? Yes — new SQL scalar function \`array_sum(array)\` and its alias \`list_sum\`. --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Which issue does this PR close?
Rationale for this change
What changes are included in this PR?
Are these changes tested?
Yes, via SLT only (
array_add.slt). Coverage:and different positions.
LargeList×LargeList, mixedList+LargeList,FixedSizeList→Listcoercion,Float32leaf,Int64leaf.Decimal128/Decimal256rejected at planning; explicitcast to DOUBLEopt-in works.leaf (plan), nested list (plan), wrong arg count.
list_addsingle-row + multi-row.array_add(array_add(...), ...)chained — single-row, with element NULLs propagating across bothlayers, and multi-row with row-level NULL.
Are there any user-facing changes?
Yes — two new functions:
array_add(array1, array2) → List<Float64>/LargeList<Float64>list_add(...)aliasBoth exposed via
expr_fnand registered inall_default_nested_functions(). Documented inline via#[user_doc](description, syntax, SQL example, argument descriptions).
No breaking API changes.