Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Emit shorter opcodes in ILGenerator.Emit(OpCode, int) #35427

Merged
merged 1 commit into from
Apr 25, 2020

Conversation

stephentoub
Copy link
Member

The Ldloc, Stloc, Ldloca, Ldc_I4, Ldarg, Ldarga, and Starg opcodes all have shorter variants that take up less space in the IL instruction stream. ILGenerator.Emit(OpCode, LocalBuilder) already special cases Ldloc, Stloc, and Ldloca to automatically translate those into their shorter forms where applicable, but similar logic doesn't exist in Emit(OpCode, int) for Ldc_I4, Ldarg, Ldarga, and Starg. As a result, various other libraries higher in the stack that use reflection emit end up doing all the special-casing with their own helper routines to do the shrinking (or they just forego it and end up with larger IL than is necessary).

This PR just consolidates the special-casing logic down into Emit(OpCode, int).

The Ldloc, Stloc, Ldloca, Ldc_I4, Ldarg, Ldarga, and Starg opcodes all have shorter variants that take up less space in the IL instruction stream.  ILGenerator.Emit(OpCode, LocalBuilder) already special cases Ldloc, Stloc, and Ldloca to automatically translate those into their shorter forms where applicable, but similar logic doesn't exist in Emit(OpCode, int) for Ldc_I4, Ldarg, Ldarga, and Starg.  Instead, various other libraries higher in the stack that use reflection emit either end up doing all the special-casing with their own helper routines to do the shrinking, or they just forego it and end up with larger IL than is necessary.

This PR just moves the logic down into Emit(OpCode, int) such that all uses can benefit, and removes the special-casing duplication from the other libraries.
@stephentoub stephentoub added this to the 5.0 milestone Apr 24, 2020
Copy link
Member

@danmoseley danmoseley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice optimization, how did this not happen already over the years?

@danmoseley
Copy link
Member

Could one imagine an analogous abstraction that saves me having to pick from Br vs Br_S etc? That is one of those things that makes it harder to maintain code that emits IL - as you add or remove instructions, you have to potentially adjust the size of nearby jumps

@stephentoub
Copy link
Member Author

Could one imagine an analogous abstraction that saves me having to pick from Br vs Br_S etc?

Maybe for backward branches. Not for forward, at least not without a lot more complexity / post-processing of the IL stream.

@jkotas
Copy link
Member

jkotas commented Apr 25, 2020

Test failures are known issues #35438 #35451

@jkotas jkotas merged commit cfaaf77 into dotnet:master Apr 25, 2020
@stephentoub stephentoub deleted the specializeilgen branch April 25, 2020 13:57
@davidfowl
Copy link
Member

@stephentoub if you’re looking at ref emit can you also nuke the allocations 😁?

@stephentoub
Copy link
Member Author

can you also nuke the allocations

Which allocations here are impactful, and how?

@davidfowl
Copy link
Member

// This class walks the service call site tree and tries to calculate approximate
// code size to avoid array resizings during IL generation

This was an optimization @pakrym did to reduce allocations while building the callsite for object creation during DI.

@pakrym
Copy link
Contributor

pakrym commented Apr 25, 2020

Using an array poll in ILGenerator might be a good start.

@stephentoub
Copy link
Member Author

I understand techniques could be used to avoid allocation. I'm asking what does it help, other than lowering an allocation number? Is ref emit being used in ASP.NET somewhere on a hot path?

@pakrym
Copy link
Contributor

pakrym commented Apr 25, 2020

Yes, DependencyInjection uses it to emit the service resolvers: https://github.com/dotnet/runtime/blob/e3ffd343ad5bd3a999cb9515f59e6e7a777b2c34/src/libraries/Microsoft.Extensions.DependencyInjection/src/ServiceLookup/ILEmit/ILEmitResolverBuilder.cs

BTW, it has the same optimizations as this PR addressed

private void Ldloc(ILGenerator generator, int index)
{
switch (index)
{
case 0: generator.Emit(OpCodes.Ldloc_0);
return;
case 1: generator.Emit(OpCodes.Ldloc_1);
return;
case 2: generator.Emit(OpCodes.Ldloc_2);
return;
case 3: generator.Emit(OpCodes.Ldloc_3);
return;
}
if (index < byte.MaxValue)
{
generator.Emit(OpCodes.Ldloc_S, (byte)index);
return;
}
generator.Emit(OpCodes.Ldloc, index);
}
private void Stloc(ILGenerator generator, int index)
{
switch (index)
{
case 0: generator.Emit(OpCodes.Stloc_0);
return;
case 1: generator.Emit(OpCodes.Stloc_1);
return;
case 2: generator.Emit(OpCodes.Stloc_2);
return;
case 3: generator.Emit(OpCodes.Stloc_3);
return;
}

@stephentoub
Copy link
Member Author

Yes, DependencyInjection uses it to emit the service resolvers:

On a hot path? That's not a one-time / infrequent action? And you had evidence that reducing managed allocation here moved a throughout / startup measurement, rather than just lowering a bytes allocated counter?

@stephentoub
Copy link
Member Author

BTW, it has the same optimizations as this PR addressed

That code is for ldloc/stloc. Specialization of those already existed, even on .NET Framework:
https://referencesource.microsoft.com/#mscorlib/system/reflection/emit/ilgenerator.cs,876

@pakrym
Copy link
Contributor

pakrym commented Apr 25, 2020

On a hot path? That's not a one-time / infrequent action?

Most of the service resolver building happens in parallel with app startup so it's not a recurring hot path but improvements in it might improve the startup time.

And you had evidence that reducing managed allocation here moved a throughout / startup measurement, rather than just lowering a bytes allocated counter?

Adding that optimization did improve resolver service building performance at the time of writing I don't think I did app startup time measurements at the time.

That code is for ldloc/stloc. Specialization of those already existed, even on .NET Framework:

Huh. I wrote some useless code then.

@davidfowl
Copy link
Member

rather than just lowering a bytes allocated counter

Pushing it all the way to 0. I'm going to turn the GC into a heap scanner 😄

@stephentoub
Copy link
Member Author

stephentoub commented Apr 25, 2020

Pushing it all the way to 0.

Pooling has costs. Extra code to avoid allocation needs to be loaded and JIT'd and has costs (beyond complexity and maintenance costs). Allocation counts as a sole metric can easily tune for the wrong thing, and an occasional allocation can yield better all-up performance than the extra code to avoid it.

Adding that optimization did improve resolver service building performance at the time of writing

Thanks. I'm trying to gauge the benefits to actually investing more here. Those wins you saw, that was from reducing allocation alone, or were there other aspects of the change that were the most impactful? Link to a PR? Thanks.

@jkotas
Copy link
Member

jkotas commented Apr 25, 2020

Adding that optimization did improve resolver service building performance at the time of writing

I assume that you measured it by microbenchmarks. The microbenchmarks do not account for static footprint and startup time that matter quite bit for real apps too.

I don't think I did app startup time measurements at the time.

This suggest that you may have just shifted the cost from one place to a different place, while increasing complexity.

Pushing it all the way to 0

Well, we may be getting to the point where the focus on reducing allocations is starting to hurt more than help. I am finding myself telling people to allocate more quite often recently to avoid writing hard to maintain overengineered code (e.g. https://github.com/dotnet/runtime/pull/35383/files#r414296004).

@pakrym
Copy link
Contributor

pakrym commented Apr 25, 2020

Thanks. I'm trying to gauge the benefits to actually investing more here. Those wins you saw, that was from reducing allocation alone, or were there other aspects of the change that were the most impactful? Link to a PR? Thanks.

It was done as part of working on the larger PR:
aspnet/DependencyInjection#630 so I'm afraid the precise data on specific changes is lost.

This suggest that you may have just shifted the cost from one place to a different place, while increasing complexity.

What other place? The optimization is precalculating the approximate size of IL body to avoid extensive resizing/allocations. It happens inline with operation being benchmarked.

@davidfowl
Copy link
Member

I was obviously kidding as I don’t want to make things worse but it’s gonna be hard to convince m to not allocate if it’s easy to do so and there’s no negative performance impact. If the code is much harder to maintain and there’s no benefit, or if the pooling overhead defeats the purpose then sure. Otherwise don’t allocate

@jkotas
Copy link
Member

jkotas commented Apr 25, 2020

The optimization is precalculating the approximate size of IL body to avoid extensive resizing/allocations.

If this happens for almost free, it is fine. It was not obvious to me. It looked like there is a special pass for pre-calculating from brief look.

@ghost ghost locked as resolved and limited conversation to collaborators Dec 9, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants