-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Emit shorter opcodes in ILGenerator.Emit(OpCode, int) #35427
Conversation
The Ldloc, Stloc, Ldloca, Ldc_I4, Ldarg, Ldarga, and Starg opcodes all have shorter variants that take up less space in the IL instruction stream. ILGenerator.Emit(OpCode, LocalBuilder) already special cases Ldloc, Stloc, and Ldloca to automatically translate those into their shorter forms where applicable, but similar logic doesn't exist in Emit(OpCode, int) for Ldc_I4, Ldarg, Ldarga, and Starg. Instead, various other libraries higher in the stack that use reflection emit either end up doing all the special-casing with their own helper routines to do the shrinking, or they just forego it and end up with larger IL than is necessary. This PR just moves the logic down into Emit(OpCode, int) such that all uses can benefit, and removes the special-casing duplication from the other libraries.
src/coreclr/src/System.Private.CoreLib/src/System/Reflection/Emit/ILGenerator.cs
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice optimization, how did this not happen already over the years?
Could one imagine an analogous abstraction that saves me having to pick from |
Maybe for backward branches. Not for forward, at least not without a lot more complexity / post-processing of the IL stream. |
@stephentoub if you’re looking at ref emit can you also nuke the allocations 😁? |
Which allocations here are impactful, and how? |
Using an array poll in ILGenerator might be a good start. |
I understand techniques could be used to avoid allocation. I'm asking what does it help, other than lowering an allocation number? Is ref emit being used in ASP.NET somewhere on a hot path? |
Yes, DependencyInjection uses it to emit the service resolvers: https://github.com/dotnet/runtime/blob/e3ffd343ad5bd3a999cb9515f59e6e7a777b2c34/src/libraries/Microsoft.Extensions.DependencyInjection/src/ServiceLookup/ILEmit/ILEmitResolverBuilder.cs BTW, it has the same optimizations as this PR addressed Lines 431 to 466 in e3ffd34
|
On a hot path? That's not a one-time / infrequent action? And you had evidence that reducing managed allocation here moved a throughout / startup measurement, rather than just lowering a bytes allocated counter? |
That code is for ldloc/stloc. Specialization of those already existed, even on .NET Framework: |
Most of the service resolver building happens in parallel with app startup so it's not a recurring hot path but improvements in it might improve the startup time.
Adding that optimization did improve resolver service building performance at the time of writing I don't think I did app startup time measurements at the time.
Huh. I wrote some useless code then. |
Pushing it all the way to 0. I'm going to turn the GC into a heap scanner 😄 |
Pooling has costs. Extra code to avoid allocation needs to be loaded and JIT'd and has costs (beyond complexity and maintenance costs). Allocation counts as a sole metric can easily tune for the wrong thing, and an occasional allocation can yield better all-up performance than the extra code to avoid it.
Thanks. I'm trying to gauge the benefits to actually investing more here. Those wins you saw, that was from reducing allocation alone, or were there other aspects of the change that were the most impactful? Link to a PR? Thanks. |
I assume that you measured it by microbenchmarks. The microbenchmarks do not account for static footprint and startup time that matter quite bit for real apps too.
This suggest that you may have just shifted the cost from one place to a different place, while increasing complexity.
Well, we may be getting to the point where the focus on reducing allocations is starting to hurt more than help. I am finding myself telling people to allocate more quite often recently to avoid writing hard to maintain overengineered code (e.g. https://github.com/dotnet/runtime/pull/35383/files#r414296004). |
It was done as part of working on the larger PR:
What other place? The optimization is precalculating the approximate size of IL body to avoid extensive resizing/allocations. It happens inline with operation being benchmarked. |
I was obviously kidding as I don’t want to make things worse but it’s gonna be hard to convince m to not allocate if it’s easy to do so and there’s no negative performance impact. If the code is much harder to maintain and there’s no benefit, or if the pooling overhead defeats the purpose then sure. Otherwise don’t allocate |
If this happens for almost free, it is fine. It was not obvious to me. It looked like there is a special pass for pre-calculating from brief look. |
The Ldloc, Stloc, Ldloca, Ldc_I4, Ldarg, Ldarga, and Starg opcodes all have shorter variants that take up less space in the IL instruction stream. ILGenerator.Emit(OpCode, LocalBuilder) already special cases Ldloc, Stloc, and Ldloca to automatically translate those into their shorter forms where applicable, but similar logic doesn't exist in Emit(OpCode, int) for Ldc_I4, Ldarg, Ldarga, and Starg. As a result, various other libraries higher in the stack that use reflection emit end up doing all the special-casing with their own helper routines to do the shrinking (or they just forego it and end up with larger IL than is necessary).
This PR just consolidates the special-casing logic down into Emit(OpCode, int).