Skip to content

Add list_length scalar function#8495

Open
mhk197 wants to merge 2 commits into
developfrom
mk/list-length
Open

Add list_length scalar function#8495
mhk197 wants to merge 2 commits into
developfrom
mk/list-length

Conversation

@mhk197

@mhk197 mhk197 commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

Adds a list_length scalar function returning the number of elements in each list of a List array.

  • Computed purely from the list's offsets/sizes — it never reads elements. Different paths for List and ListView arrays.
  • Returns a U64 array; a null list yields a null length.
  • Registered as a built-in (vortex.list.length) alongside list_contains, and exposed via the list_length(expr) expression constructor.

@mhk197 mhk197 requested a review from a team June 18, 2026 16:11
@mhk197 mhk197 marked this pull request as draft June 18, 2026 16:19
@codspeed-hq

codspeed-hq Bot commented Jun 18, 2026

Copy link
Copy Markdown

Merging this PR will not alter performance

⚠️ Unknown Walltime execution environment detected

Using the Walltime instrument on standard Hosted Runners will lead to inconsistent data.

For the most accurate results, we recommend using CodSpeed Macro Runners: bare-metal machines fine-tuned for performance measurement consistency.

⚡ 10 improved benchmarks
❌ 9 regressed benchmarks
✅ 1562 untouched benchmarks
🆕 6 new benchmarks

Warning

Please fix the performance issues or acknowledge them on CodSpeed.

Performance Changes

Mode Benchmark BASE HEAD Efficiency
Simulation decompress_rd[f64, (10000, 0.1)] 109.2 µs 139.4 µs -21.67%
Simulation decompress_rd[f64, (10000, 0.01)] 108.9 µs 139 µs -21.67%
Simulation decompress_rd[f64, (10000, 0.0)] 108.9 µs 139 µs -21.63%
Simulation decompress_rd[f32, (100000, 0.0)] 496.1 µs 583.7 µs -15%
Simulation decompress_rd[f32, (10000, 0.1)] 78.2 µs 91 µs -14.1%
Simulation decompress_rd[f32, (10000, 0.01)] 78.2 µs 90.7 µs -13.83%
Simulation decompress_rd[f32, (10000, 0.0)] 78.7 µs 91 µs -13.56%
Simulation patched_take_10k_contiguous_patches 259.6 µs 289.9 µs -10.47%
Simulation patched_take_10k_random 272 µs 302.3 µs -10.04%
Simulation bitwise_not_vortex_buffer_mut[128] 244.4 ns 186.1 ns +31.34%
Simulation bitwise_not_vortex_buffer_mut[1024] 304.7 ns 246.4 ns +23.68%
Simulation chunked_varbinview_opt_canonical_into[(1000, 10)] 206.5 µs 170.5 µs +21.15%
Simulation take_10k_first_chunk_only 251.3 µs 208.9 µs +20.29%
Simulation take_10k_dispersed 264.8 µs 222.5 µs +19.02%
Simulation bitwise_not_vortex_buffer_mut[2048] 427.8 ns 369.4 ns +15.79%
Simulation chunked_varbinview_into_canonical[(100, 100)] 307.3 µs 273.6 µs +12.34%
Simulation patched_take_10k_adversarial 259.7 µs 231.3 µs +12.28%
Simulation patched_take_10k_first_chunk_only 282.9 µs 255.1 µs +10.88%
Simulation patched_take_10k_dispersed 297 µs 269.3 µs +10.29%
🆕 Simulation list_large N/A 9.9 ms N/A
... ... ... ... ... ...

ℹ️ Only the first 20 benchmarks are displayed. Go to the app to view all benchmarks.

Tip

Investigate this regression by commenting @codspeedbot fix this regression on this PR, or directly use the CodSpeed MCP with your agent.


Comparing mk/list-length (12c7eea) with develop (9814173)

Open in CodSpeed

@mhk197 mhk197 force-pushed the mk/list-length branch 2 times, most recently from 0a2f1f1 to 1ed27e1 Compare June 18, 2026 17:19
@mhk197 mhk197 added the changelog/feature A new feature label Jun 18, 2026
Computes the number of elements in each list from the offsets/sizes only (never reading element values), returning a U64 array; a null list yields a null length. Registered as a built-in scalar function (vortex.list.length) alongside list_contains.

Signed-off-by: Matt Katz <mhkatz97@gmail.com>
@mhk197 mhk197 marked this pull request as ready for review June 18, 2026 20:46
@mhk197 mhk197 changed the title Add list_length scalar function Add list_length scalar function Jun 18, 2026
@mhk197 mhk197 requested review from AdamGS and gatesn June 18, 2026 20:47
Comment thread vortex-array/benches/list_length.rs
fn return_dtype(&self, _options: &Self::Options, arg_dtypes: &[DType]) -> VortexResult<DType> {
match &arg_dtypes[0] {
DType::List(_, nullable) => Ok(DType::Primitive(PType::U64, *nullable)),
other => vortex_bail!("list_length() requires List, got {other}"),

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May as well support FixedList as well, then implement reduce to collapse it into the constant

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implemented reduce for nonnullable fsl, delegated nullable to execute since we can't easily get validity (talked offline)

Comment thread vortex-array/src/scalar_fn/fns/list_length.rs Outdated
Signed-off-by: Matt Katz <mhkatz97@gmail.com>
@mhk197 mhk197 requested a review from gatesn June 19, 2026 18:52
struct AnyList;

impl Matcher for AnyList {
type Match<'a> = ();

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should define a enum AnyListView { List(...), FixedList(...) } , then you can just match on it above in the execute_until

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we want to execute FixedList? We can just get the size from the dtype

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're not executing the FixedList itself, you're basically saying, run execution one step at a time until it matches one of these encodings.

So there may be some scalar function that happens to return a FixedList, then you will terminate and have access to it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

changelog/feature A new feature

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants