Skip to content

feat: add array_sum scalar function#22542

Merged
alamb merged 4 commits into
apache:mainfrom
crm26:feat/array-sum
Jun 8, 2026
Merged

feat: add array_sum scalar function#22542
alamb merged 4 commits into
apache:mainfrom
crm26:feat/array-sum

Conversation

@crm26

@crm26 crm26 commented May 26, 2026

Copy link
Copy Markdown
Contributor

Which issue does this PR close?

Partial of #21536array_sum (first of the array aggregates in the series).

Rationale for this change

Continues the per-function split sequence requested by @alamb on #21536. Four sibling PRs already merged: cosine_distance (#21542), inner_product (#21861), array_normalize (#22013), array_scale (#22466). array_add is in flight as #22459 by @SubhamSinghal.

array_sum is the first of the three array-aggregate functions (sum, product, avg). Its semantics set the pattern for the other two aggregates.

What changes are included in this PR?

  • New scalar UDF array_sum(array) in datafusion/functions-nested/src/array_sum.rs
  • Module wire-up + registration in datafusion/functions-nested/src/lib.rs
  • SLT tests at datafusion/sqllogictest/test_files/array_sum.slt
  • Auto-generated docs entry in docs/source/user-guide/sql/scalar_functions.md

Signature: `List/LargeList/FixedSizeList` in, `Float64` out (one scalar per row). Numeric inner types coerced to `Float64`.

NULL semantics — SQL aggregate convention (deliberate divergence from binary-op siblings):

  • NULL row → NULL row out

  • NULL elements are skipped, matching PostgreSQL `array_sum`, DuckDB `list_sum`, Spark `aggregate`. Binary-op siblings (`inner_product`, `array_normalize`) null-row on NULL element because their per-element operation is undefined on NULL; aggregates conventionally skip NULLs in SQL.

  • All-NULL row → NULL out (matches `SUM(...)` over an all-NULL column)

  • Empty array → NULL (matches sibling array_product Add array_product UDF #22703, PostgreSQL, DuckDB list_sum, SQL Standard SUM-of-empty-set)

Alias: `list_sum` (matches the precedent of `array_normalize`→`list_normalize`, `array_scale`→`list_scale`).

Are these changes tested?

Yes. SLT covers happy paths, empty arrays, NULL row, NULL elements (mix + all-NULL), all list variants (List/LargeList/FixedSizeList), numeric coercion (Float32/Int64/integer literals), multi-row composition, error paths, return type, and the `list_sum` alias.

Are there any user-facing changes?

Yes — new SQL scalar function `array_sum(array)` and its alias `list_sum`.

@github-actions github-actions Bot added documentation Improvements or additions to documentation sqllogictest SQL Logic Tests (.slt) functions Changes to functions implementation labels May 26, 2026
}

// `slice` resets the logical offset to 0, so `i` below is 0-based within the slice.
let slice = values.slice(start, len);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can probably get away without need a slice here, just manually iterate over the range start..end

let mut sum = 0.0_f64;
let mut any_valid = false;
for i in 0..len {
if !slice.is_null(i) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if !slice.is_null(i) {
if slice.is_valid(i) {

crm26 added a commit to crm26/datafusion that referenced this pull request Jun 8, 2026
…emantics

Address Jefffrey's review on apache#22542:
- Drop the per-row Float64Array::slice allocation; iterate start..end
  directly on values
- Use values.is_valid(i) instead of !slice.is_null(i)

Also pre-empt the empty-array question Jefffrey raised on the merged
sibling apache#22703 (array_product): empty arrays now return NULL via the
existing any_valid=false path, matching PostgreSQL, DuckDB list_sum,
and SQL Standard SUM-of-empty-set. Updated SLT empty-array
expectations and PR body accordingly.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@crm26

crm26 commented Jun 8, 2026

Copy link
Copy Markdown
Contributor Author

Thanks for the review.

Addressed both asks: dropped the intermediate slice (iterate start..end directly on values) and switched to is_valid per your suggestion.

Also pre-emptively matched the empty-array semantics to the merged array_product sibling (#22703) — your "what's this behaviour based on?" question there applies here too. Empty → NULL now, matching PostgreSQL, DuckDB list_sum, and SQL Standard SUM-of-empty-set. SLT and PR body updated accordingly.

crm26 and others added 3 commits June 7, 2026 20:13
Adds `array_sum(array)` returning the sum of elements in a numeric array.
Aliased as `list_sum`. Part of the per-function split sequence on
tracking issue apache#21536, following the pattern of the already-merged PRs
in this series (cosine_distance apache#21542, inner_product apache#21861,
array_normalize apache#22013, array_scale apache#22466).

Semantics:
- NULL row in array -> NULL row out
- NULL elements are skipped (SQL aggregate convention; matches
  PostgreSQL array_sum, DuckDB list_sum, Spark aggregate). A row whose
  every element is NULL yields NULL.
- Empty array -> 0.0 (additive identity, matches SQL SUM over no rows
  conceptually, and DuckDB list_sum([]) = 0)

Input is List/LargeList/FixedSizeList of any numeric type; elements
are coerced to Float64. Output is Float64.
…emantics

Address Jefffrey's review on apache#22542:
- Drop the per-row Float64Array::slice allocation; iterate start..end
  directly on values
- Use values.is_valid(i) instead of !slice.is_null(i)

Also pre-empt the empty-array question Jefffrey raised on the merged
sibling apache#22703 (array_product): empty arrays now return NULL via the
existing any_valid=false path, matching PostgreSQL, DuckDB list_sum,
and SQL Standard SUM-of-empty-set. Updated SLT empty-array
expectations and PR body accordingly.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
After rebasing onto apache/main, the scalar_functions.md merge conflict
was resolved by taking main's version. Re-ran dev/update_function_docs.sh
to reinsert the array_sum entry (with NULL empty-array semantics).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

@Jefffrey Jefffrey left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be good once conflicts resolved

The round-2 refactor changed the empty-array behavior from 0.0 to NULL
in the Rust code and SLT, but missed the #[user_doc] description macro
which still claimed "Returns 0.0 for an empty array." That stale text
propagated through dev/update_function_docs.sh into scalar_functions.md.

Updated the docstring to match the actual behavior; re-ran the doc
regen to refresh scalar_functions.md.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@crm26 crm26 force-pushed the feat/array-sum branch from 14c2c8b to 512992e Compare June 8, 2026 12:23
@crm26

crm26 commented Jun 8, 2026

Copy link
Copy Markdown
Contributor Author

Thanks for the approval. Just rebased onto main and pushed — conflicts in lib.rs (function registration list) and scalar_functions.md resolved. Targeted validation green; full CI running.

@alamb alamb added this pull request to the merge queue Jun 8, 2026
@alamb

alamb commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

Thanks @crm26 and @Jefffrey -- I added it to the merge queue

Merged via the queue into apache:main with commit 1d740ed Jun 8, 2026
36 checks passed
AdamGS pushed a commit to AdamGS/arrow-datafusion that referenced this pull request Jun 11, 2026
## Which issue does this PR close?

Partial of apache#21536 — `array_sum` (first of the array aggregates in the
series).

## Rationale for this change

Continues the per-function split sequence requested by @alamb on apache#21536.
Four sibling PRs already merged: `cosine_distance` (apache#21542),
`inner_product` (apache#21861), `array_normalize` (apache#22013), `array_scale`
(apache#22466). `array_add` is in flight as apache#22459 by @SubhamSinghal.

`array_sum` is the first of the three array-aggregate functions (sum,
product, avg). Its semantics set the pattern for the other two
aggregates.

## What changes are included in this PR?

- New scalar UDF `array_sum(array)` in
`datafusion/functions-nested/src/array_sum.rs`
- Module wire-up + registration in
`datafusion/functions-nested/src/lib.rs`
- SLT tests at `datafusion/sqllogictest/test_files/array_sum.slt`
- Auto-generated docs entry in
`docs/source/user-guide/sql/scalar_functions.md`

**Signature:** \`List/LargeList/FixedSizeList<numeric>\` in, \`Float64\`
out (one scalar per row). Numeric inner types coerced to \`Float64\`.

**NULL semantics — SQL aggregate convention (deliberate divergence from
binary-op siblings):**
- NULL row → NULL row out
- NULL elements are **skipped**, matching PostgreSQL \`array_sum\`,
DuckDB \`list_sum\`, Spark \`aggregate\`. Binary-op siblings
(\`inner_product\`, \`array_normalize\`) null-row on NULL element
because their per-element operation is undefined on NULL; aggregates
conventionally skip NULLs in SQL.
- All-NULL row → NULL out (matches \`SUM(...)\` over an all-NULL column)

- **Empty array → NULL** (matches sibling `array_product` apache#22703,
PostgreSQL, DuckDB `list_sum`, SQL Standard SUM-of-empty-set)

**Alias:** \`list_sum\` (matches the precedent of
\`array_normalize\`→\`list_normalize\`, \`array_scale\`→\`list_scale\`).

## Are these changes tested?

Yes. SLT covers happy paths, empty arrays, NULL row, NULL elements (mix
+ all-NULL), all list variants (List/LargeList/FixedSizeList), numeric
coercion (Float32/Int64/integer literals), multi-row composition, error
paths, return type, and the \`list_sum\` alias.

## Are there any user-facing changes?

Yes — new SQL scalar function \`array_sum(array)\` and its alias
\`list_sum\`.

---------

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation functions Changes to functions implementation sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants