GH-40024: [C++][Gandiva] Selectively register external C functions based on expression usage#49900
GH-40024: [C++][Gandiva] Selectively register external C functions based on expression usage#49900Reranko05 wants to merge 5 commits intoapache:mainfrom
Conversation
|
Hi @dmitry-chirkov-dremio, this extracts only the selective C-function mapping portion discussed around #40031 as a smaller scoped first step. I intentionally kept this PR limited to the C-function filtering path (without the broader bitcode-loading changes from the earlier PR) to make review narrower and isolate this part of the optimization. I also added microbenchmarks from that earlier work to help evaluate the effect. If this scoped approach looks reasonable, I will follow up separately with the bitcode-side optimization as a next step. |
|
Quick question regarding test expectations: With this change, With selective mapping, the expected behavior would be that only used functions (plus required internal helpers) are registered. Would you prefer updating these tests to validate only the functions explicitly passed via |
d859e2c to
d2236c1
Compare
d2236c1 to
420cacf
Compare
|
Following up on this — after looking into it more, I think we should preserve the existing test expectations. The current tests validate the default Engine initialization behavior, which registers all functions, and that should remain unchanged. The intended behavior of this PR is:
So selective mapping should only apply to the LLVMGenerator path, and should not affect existing tests. I'll ensure the implementation keeps this separation so that tests continue to pass without modification. |
420cacf to
37b2321
Compare
Rationale for this change
This PR extracts a reduced-scope improvement from the earlier work discussed in #40031, focusing specifically on selective external C function mapping during Gandiva engine initialization.
The initial exploration and broader optimization direction were introduced in #40031, and this PR builds on that work by isolating and implementing the C-function mapping portion as a smaller, reviewable step.
Currently, Gandiva registers all external C functions during engine initialization, even when an expression only uses a small subset of functions. This results in unnecessary mappings and does not reflect actual usage.
This change delays Engine initialization until expression decomposition has collected the functions used by the expression set, and then registers only those functions (along with required internal helpers).
This aligns initialization with actual usage and removes unnecessary work, while also providing a foundation for future optimizations.
The existing
Engine::Init()path remains unchanged to preserve current behavior for tests and other call sites. Selective mapping is only applied when initializing the engine with explicitly provided used functions.What changes are included in this PR?
Are these changes tested?
Yes.
Benchmark results
Microbenchmarks were run to validate behavior. Due to the variability inherent in LLVM JIT compilation workloads, results fall within measurement noise and do not show a consistent regression.
This change is therefore positioned as an architectural improvement rather than a guaranteed performance optimization.
Are there any user-facing changes?
No
Future work
This change enables follow-up improvements such as:
GitHub Issue
Related: #40024