Skip to content

GH-40024: [C++][Gandiva] Selectively register external C functions based on expression usage#49900

Open
Reranko05 wants to merge 5 commits intoapache:mainfrom
Reranko05:gh-40024-investigation
Open

GH-40024: [C++][Gandiva] Selectively register external C functions based on expression usage#49900
Reranko05 wants to merge 5 commits intoapache:mainfrom
Reranko05:gh-40024-investigation

Conversation

@Reranko05
Copy link
Copy Markdown
Contributor

@Reranko05 Reranko05 commented Apr 29, 2026

Rationale for this change

This PR extracts a reduced-scope improvement from the earlier work discussed in #40031, focusing specifically on selective external C function mapping during Gandiva engine initialization.

The initial exploration and broader optimization direction were introduced in #40031, and this PR builds on that work by isolating and implementing the C-function mapping portion as a smaller, reviewable step.

Currently, Gandiva registers all external C functions during engine initialization, even when an expression only uses a small subset of functions. This results in unnecessary mappings and does not reflect actual usage.

This change delays Engine initialization until expression decomposition has collected the functions used by the expression set, and then registers only those functions (along with required internal helpers).

This aligns initialization with actual usage and removes unnecessary work, while also providing a foundation for future optimizations.

The existing Engine::Init() path remains unchanged to preserve current behavior for tests and other call sites. Selective mapping is only applied when initializing the engine with explicitly provided used functions.

What changes are included in this PR?

  • Implement selective external C-function mapping
  • Track functions used during expression decomposition
  • Propagate used function set through the LLVMGenerator pipeline
  • Delay Engine initialization until used functions are known
  • Register only required functions (plus internal helpers)
  • Add compilation microbenchmarks for evaluation

Are these changes tested?

Yes.

  • Existing tests pass without modification
  • Added unit tests to validate function tracking during expression decomposition
  • Microbenchmarks were used to validate behavior

Benchmark results

Microbenchmarks were run to validate behavior. Due to the variability inherent in LLVM JIT compilation workloads, results fall within measurement noise and do not show a consistent regression.

This change is therefore positioned as an architectural improvement rather than a guaranteed performance optimization.

Are there any user-facing changes?

No

Future work

This change enables follow-up improvements such as:

  • Direct lookup of required functions instead of scanning all registered functions
  • Reduced LLVM module size and faster optimization passes
  • Further selective loading of bitcode functions

GitHub Issue

Related: #40024

@Reranko05
Copy link
Copy Markdown
Contributor Author

Hi @dmitry-chirkov-dremio, this extracts only the selective C-function mapping portion discussed around #40031 as a smaller scoped first step.

I intentionally kept this PR limited to the C-function filtering path (without the broader bitcode-loading changes from the earlier PR) to make review narrower and isolate this part of the optimization. I also added microbenchmarks from that earlier work to help evaluate the effect.

If this scoped approach looks reasonable, I will follow up separately with the bitcode-side optimization as a next step.

@Reranko05
Copy link
Copy Markdown
Contributor Author

Quick question regarding test expectations:

With this change, Engine::AddGlobalMappingForFunc now skips registering functions that are not in used_functions_ (except internal ones). This causes TestLLVMGenerator.VerifyPCFunctions and VerifyExtendedCFunctions to fail, as they currently expect all functions from the registry to be present in the module.

With selective mapping, the expected behavior would be that only used functions (plus required internal helpers) are registered.

Would you prefer updating these tests to validate only the functions explicitly passed via used_functions, rather than the entire registry?

@Reranko05 Reranko05 force-pushed the gh-40024-investigation branch from d859e2c to d2236c1 Compare May 1, 2026 16:04
@Reranko05 Reranko05 changed the title GH-40024: [C++][Gandiva] Skip unused C function mappings during engine initialization GH-40024: [C++][Gandiva] Selectively register external C functions based on expression usage May 1, 2026
@Reranko05 Reranko05 force-pushed the gh-40024-investigation branch from d2236c1 to 420cacf Compare May 1, 2026 16:14
@Reranko05
Copy link
Copy Markdown
Contributor Author

Following up on this — after looking into it more, I think we should preserve the existing test expectations.

The current tests validate the default Engine initialization behavior, which registers all functions, and that should remain unchanged.

The intended behavior of this PR is:

  • Engine::Init() → unchanged (full mapping, used by tests and existing paths)
  • Engine::Init(used_functions) → selective mapping (used after expression decomposition)

So selective mapping should only apply to the LLVMGenerator path, and should not affect existing tests.

I'll ensure the implementation keeps this separation so that tests continue to pass without modification.

@Reranko05 Reranko05 force-pushed the gh-40024-investigation branch from 420cacf to 37b2321 Compare May 1, 2026 16:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant