Skip to content

[WIP] Implement an intrinsic for delegate lambdas#125901

Draft
MichalPetryka wants to merge 58 commits into
dotnet:mainfrom
MichalPetryka:lambda-prototype
Draft

[WIP] Implement an intrinsic for delegate lambdas#125901
MichalPetryka wants to merge 58 commits into
dotnet:mainfrom
MichalPetryka:lambda-prototype

Conversation

@MichalPetryka

@MichalPetryka MichalPetryka commented Mar 22, 2026

Copy link
Copy Markdown
Contributor

Implements a basic intrinsic for creating delegate singletons, to be used by Roslyn for lambdas and method group conversions.

Creates delegates closed over null instances to save on memory, this makes it reject instance methods on generic types since those need an instance.

Uses a field for caching non frozen delegates since otherwise we'd have a noticeable perf regression on every access for cases that can't be expanded in the JIT (shared generics, unloadable assemblies). This also significantly simplifies the implementation.

TODO:

  • Decide on final name and signature
  • Decide if instance methods on generic types need to be supported
  • Avoid performance regressions for unexpanded case
  • Handle unloading properly
  • Implement Mono support
  • Cleanup NAOT compilation handling
  • Implement support in NAOT .cctor interpreter (optional)

cc @jkotas @MichalStrehovsky @EgorBo

Depends on #99200 (without it this is a GC hole)

Blocked by #126284

Closes #85014

@github-actions github-actions Bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Mar 22, 2026
@dotnet-policy-service dotnet-policy-service Bot added the community-contribution Indicates that the PR has been added by a community member label Mar 22, 2026
@jkotas

jkotas commented Mar 23, 2026

Copy link
Copy Markdown
Member

Uses a field for caching non frozen delegates since otherwise we'd have a noticeable perf regression on every access for cases

The idea behind the original proposal was that the codegen is going take care of the caching behind the scenes to minimize the binary size (and startup) overheads. If the IL is required to have a field, it dilutes the benefit of the special intrinsic. It may be better to give up a bit more and just go with the alternative in the proposal. This needs numbers to decide.

this makes it reject instance methods on generic types since those need an instance.

What is Roslyn expected to generate for lambdas in generic types with this design?

Comment thread src/coreclr/tools/aot/ILCompiler.RyuJit/JitInterface/CorInfoImpl.RyuJit.cs Outdated
Comment thread src/coreclr/tools/aot/ILCompiler.RyuJit/JitInterface/CorInfoImpl.RyuJit.cs Outdated
@pentp

pentp commented Mar 23, 2026

Copy link
Copy Markdown
Contributor

Creates delegates closed over null instances to save on memory, this makes it reject instance methods on generic types since those need an instance.

Why not use a default instance (no .ctor call, just allocated) for shared generics? It would be the most efficient option for generic types.

Uses a field for caching non frozen delegates since otherwise we'd have a noticeable perf regression on every access for cases that can't be expanded in the JIT (shared generics, unloadable assemblies).

A field would be required for only shared generics and unloadable assemblies, right?

  • Implement support in NAOT .cctor interpreter (optional)

If delegates could be made frozen, then NAOT wouldn't need this?

@MichalPetryka

Copy link
Copy Markdown
Contributor Author

The idea behind the original proposal was that the codegen is going take care of the caching behind the scenes to minimize the binary size (and startup) overheads. If the IL is required to have a field, it dilutes the benefit of the special intrinsic. It may be better to give up a bit more and just go with the alternative in the proposal.

The field caching idea is not a fundamental requirement for this implementation, I'm just not aware of any way to avoid overhead on every access for cases where we can't expand otherwise.
I assumed that the runtime cost for that would be bigger of an issue than paying like 30B more per delegate.
I'd still say the intrinsic makes sense here since it removes the need for cctors and tiering.

This needs numbers to decide.

Do you have any specific way of benchmarking in mind? I'm not sure what would be the best way to compare, file size checks aren't too easy without Roslyn support since we need a bigger assembly for the difference to be meaningful and comparing access perf for unexpanded is also non trivial cause of needing correct dictionary keys.

What is Roslyn expected to generate for lambdas in generic types with this design?

The idea would be to generate a single non generic class for all lambda methods and non-generic fields and put generic methods in there (fields for them would need separate classes).
I'm not exactly sure in what cases are instantiation stubs needed so this might be a no-go due to execution perf.

@MichalPetryka

Copy link
Copy Markdown
Contributor Author

Why not use a default instance (no .ctor call, just allocated) for shared generics? It would be the most efficient option for generic types.

That'd be the way I'd implement this, it'd just add a bit of code to the implementation (since we'd ideally cache the instances for all delegates and such) and I wanted to wait for that until we're sure it will be neeeded.

A field would be required for only shared generics and unloadable assemblies, right?

AFAIR yes, other than when the GC fails to allocate frozen instances (unless we'd complicate even further like string literals do and allocate on POH/use pinned handles then and still hardcode the instance in assembly.)

If delegates could be made frozen, then NAOT wouldn't need this?

This already allocates delegates as frozen, the question would rather be if Roslyn would use the intrinsic in cctor bodies, if yes we don't want to block interpreting them cause of the intrinsic.

@MichalPetryka

Copy link
Copy Markdown
Contributor Author

@jkotas @MichalStrehovsky After converting my tests from reflection to IL (for NAOT to be able to track them properly), I've noticed that ldftn on abstract/interface non-DIM methods causes the JIT to throw BadImageFormatException, while methodInfo.MethodHandle.GetFunctionPointer() on them worked just fine.
I'd expect both things to have the same behaviour here, can you explain what is intended for both cases? The ECMA doesn't document the ldftn as illegal and GetFunctionPointer docs don't mention this.

@jkotas

jkotas commented Mar 24, 2026

Copy link
Copy Markdown
Member

while methodInfo.MethodHandle.GetFunctionPointer() on them worked just fine.

I assume that you will get an exception if you try to call the function pointer returned by GetFunctionPointer(). Is that right it? Then the difference is just in how eager the error handling is. One path throws the exception eagerly and the other path throws the exception lazily.

@MichalStrehovsky

Copy link
Copy Markdown
Member

@jkotas @MichalStrehovsky After converting my tests from reflection to IL (for NAOT to be able to track them properly), I've noticed that ldftn on abstract/interface non-DIM methods causes the JIT to throw BadImageFormatException, while methodInfo.MethodHandle.GetFunctionPointer() on them worked just fine. I'd expect both things to have the same behaviour here, can you explain what is intended for both cases? The ECMA doesn't document the ldftn as illegal and GetFunctionPointer docs don't mention this.

ECMA-335 spec covers this in "II.15.2 Static, instance, and virtual methods":

Abstract virtual methods (which shall only be defined in abstract classes or interfaces) shall be called
only with a callvirt instruction. Similarly, the address of an abstract virtual method shall be computed
with the ldvirtftn instruction, and the ldftn instruction shall not be used.

RuntimeMethodHandle.GetFunctionPointer docs say: For instance method handles, the value is not easily usable from user code and is meant exclusively for usage within the runtime.

So this checks out.

@MichalPetryka

Copy link
Copy Markdown
Contributor Author

I assume that you will get an exception if you try to call the function pointer returned by GetFunctionPointer(). Is that right it? Then the difference is just in how eager the error handling is. One path throws the exception eagerly and the other path throws the exception lazily.

I did not test calling it, only using it to create a delegate which did work fine.
Should I make the tests for those use reflection again or should I remove them?

@jkotas

jkotas commented Mar 25, 2026

Copy link
Copy Markdown
Member

Do you have any specific way of benchmarking in mind?

Measure cost of an (unexecuted) lambda that just returns a unique integer: IL binary size, memory footprint in JIT, NativeAOT binary size. Before/after. The easiest way to do that is by creating a test with like million lambdas.

@MichalPetryka

Copy link
Copy Markdown
Contributor Author

@jkotas While working on implementing instance support for generic classes, I've realised that since they don't use instantiation stubs, the NonVirtualEntry2MethodDesc lookup returns a shared desc and we can't get the instance type that way.

Do we need to make the signature RuntimeHelpers.GetDelegate<TDelegate, TCapture>(nint, ref TDelegate) then or is there any other way to get it in CoreCLR?

@jkotas

jkotas commented Jun 19, 2026

Copy link
Copy Markdown
Member

The metadata has the exact type in ldftn. If you always expand the intrinsic in the JIT, I think it should be possible to get it from ldftn.

@MichalPetryka

Copy link
Copy Markdown
Contributor Author

The metadata has the exact type in ldftn. If you always expand the intrinsic in the JIT, I think it should be possible to get it from ldftn.

Yeah I already did that in NativeAOT but I assumed that for CoreCLR we want to handle the unexpanded case too.

Would the additional generic have any noticeable overhead here though considering that we'd always expand it away in the JIT?

@jkotas

jkotas commented Jun 20, 2026

Copy link
Copy Markdown
Member

Yes, it is extra overhead along the way - extra bytes in IL binary, extra generic instantiations at runtime. More importantly, it does not feel like a good design to duplicate the information between two IL Instructions that are next to each other.

@MichalPetryka

Copy link
Copy Markdown
Contributor Author

Yes, it is extra overhead along the way - extra bytes in IL binary, extra generic instantiations at runtime. More importantly, it does not feel like a good design to duplicate the information between two IL Instructions that are next to each other.

Another issue, probably bigger here, is that it makes the implementation way worse for Mono since we'd need to add an intrinsic there too.

As such, this would probably need to wait for it to be removed in 12 and would remove any chances of this getting in 11.

@jkotas

jkotas commented Jun 20, 2026

Copy link
Copy Markdown
Member

We want both the runtime and Roslyn parts to ship in the same version to ensure that the feature works end-to-end. There is not enough time for that in .NET 11.

@MichalPetryka

Copy link
Copy Markdown
Contributor Author

We want both the runtime and Roslyn parts to ship in the same version to ensure that the feature works end-to-end. There is not enough time for that in .NET 11.

I'll remove the Mono impl here then and make the tests ignored then.

@MichalPetryka

MichalPetryka commented Jun 21, 2026

Copy link
Copy Markdown
Contributor Author

Yes, it is extra overhead along the way - extra bytes in IL binary, extra generic instantiations at runtime. More importantly, it does not feel like a good design to duplicate the information between two IL Instructions that are next to each other.

Would switching to ldtoken here work instead too btw?

EDIT: it doesn't help, method handles are shared too apparently.

@MichalPetryka

Copy link
Copy Markdown
Contributor Author

@jkotas As you requested, I've changed the intrinsic to rely on the JIT.

The signature is now:

public static Delegate GetDelegate(nint method, ref Delegate? storage);

It seems possible to implement it like this, with no generics.

Currenty it seems there are only 2 things left for this to be complete:

  1. I need to implement a way to generate lookups for shared generic storage field.
  2. Shared generics need expandRawHandleIntrinsic implemented in CoreCLR and Crossgen2.

For 2. I'd like to ask somebody from the VM team to help out.

Otherwise I think we can send this to API review like this now.

@MichalPetryka

Copy link
Copy Markdown
Contributor Author

I've realised that I forgot about interpreter here, do you know how possible will it be to get the necessary method tables there? @jkotas

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI community-contribution Indicates that the PR has been added by a community member

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[API Proposal]: Introduce an intrinsic for more efficient lambda generation

4 participants