-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[TieredCompilation] Cold methods with hot loops may run slower with tiering #11006
[TieredCompilation] Cold methods with hot loops may run slower with tiering #11006
Comments
OSR looks like the best solution in general if the engineering cost is acceptable 😄 |
Related to https://github.com/dotnet/corefx/issues/32235 Workaround (and probably a long-term fix) for https://github.com/dotnet/coreclr/issues/19751 - For a method flagged with AggressiveOptimization, tiering would use a foreground tier 1 JIT on first call to the method, skipping the tier 0 JIT and call counting
Part of fix for https://github.com/dotnet/corefx/issues/32235 Workaround for https://github.com/dotnet/coreclr/issues/19751 - For a method flagged with AggressiveOptimization, tiering would use a foreground tier 1 JIT on first call to the method, skipping the tier 0 JIT and call counting
Part of fix for https://github.com/dotnet/corefx/issues/32235 Workaround for https://github.com/dotnet/coreclr/issues/19751 - Added and set CORJIT_FLAG_AGGRESSIVE_OPT to indicate that a method is flagged with AggressiveOptimization - For a method flagged with AggressiveOptimization, tiering uses a foreground tier 1 JIT on first call to the method, skipping the tier 0 JIT and call counting - When tiering is disabled, a method flagged with AggressiveOptimization does not use r2r-pregenerated code - R2r crossgen does not generate code for a method flagged with AggressiveOptimization
Part of fix for https://github.com/dotnet/corefx/issues/32235 Workaround for https://github.com/dotnet/coreclr/issues/19751 - Added and set CORJIT_FLAG_AGGRESSIVE_OPT to indicate that a method is flagged with AggressiveOptimization - For a method flagged with AggressiveOptimization, tiering uses a foreground tier 1 JIT on first call to the method, skipping the tier 0 JIT and call counting - When tiering is disabled, a method flagged with AggressiveOptimization does not use r2r-pregenerated code - R2r crossgen does not generate code for a method flagged with AggressiveOptimization
Part of fix for https://github.com/dotnet/corefx/issues/32235 Workaround for https://github.com/dotnet/coreclr/issues/19751 - Added and set CORJIT_FLAG_AGGRESSIVE_OPT to indicate that a method is flagged with AggressiveOptimization - For a method flagged with AggressiveOptimization, tiering uses a foreground tier 1 JIT on first call to the method, skipping the tier 0 JIT and call counting - When tiering is disabled, a method flagged with AggressiveOptimization does not use r2r-pregenerated code - R2r crossgen does not generate code for a method flagged with AggressiveOptimization
…20009) Add MethodImplOptions.AggressiveOptimization and use it for tiering Part of fix for https://github.com/dotnet/corefx/issues/32235 Workaround for https://github.com/dotnet/coreclr/issues/19751 - Added and set CORJIT_FLAG_AGGRESSIVE_OPT to indicate that a method is flagged with AggressiveOptimization - For a method flagged with AggressiveOptimization, tiering uses a foreground tier 1 JIT on first call to the method, skipping the tier 0 JIT and call counting - When tiering is disabled, a method flagged with AggressiveOptimization does not use r2r-pregenerated code - R2r crossgen does not generate code for a method flagged with AggressiveOptimization
…#20009) Add MethodImplOptions.AggressiveOptimization and use it for tiering Part of fix for https://github.com/dotnet/corefx/issues/32235 Workaround for https://github.com/dotnet/coreclr/issues/19751 - Added and set CORJIT_FLAG_AGGRESSIVE_OPT to indicate that a method is flagged with AggressiveOptimization - For a method flagged with AggressiveOptimization, tiering uses a foreground tier 1 JIT on first call to the method, skipping the tier 0 JIT and call counting - When tiering is disabled, a method flagged with AggressiveOptimization does not use r2r-pregenerated code - R2r crossgen does not generate code for a method flagged with AggressiveOptimization Signed-off-by: dotnet-bot <[email protected]>
…#20009) Add MethodImplOptions.AggressiveOptimization and use it for tiering Part of fix for https://github.com/dotnet/corefx/issues/32235 Workaround for https://github.com/dotnet/coreclr/issues/19751 - Added and set CORJIT_FLAG_AGGRESSIVE_OPT to indicate that a method is flagged with AggressiveOptimization - For a method flagged with AggressiveOptimization, tiering uses a foreground tier 1 JIT on first call to the method, skipping the tier 0 JIT and call counting - When tiering is disabled, a method flagged with AggressiveOptimization does not use r2r-pregenerated code - R2r crossgen does not generate code for a method flagged with AggressiveOptimization Signed-off-by: dotnet-bot <[email protected]>
…#20009) Add MethodImplOptions.AggressiveOptimization and use it for tiering Part of fix for https://github.com/dotnet/corefx/issues/32235 Workaround for https://github.com/dotnet/coreclr/issues/19751 - Added and set CORJIT_FLAG_AGGRESSIVE_OPT to indicate that a method is flagged with AggressiveOptimization - For a method flagged with AggressiveOptimization, tiering uses a foreground tier 1 JIT on first call to the method, skipping the tier 0 JIT and call counting - When tiering is disabled, a method flagged with AggressiveOptimization does not use r2r-pregenerated code - R2r crossgen does not generate code for a method flagged with AggressiveOptimization Signed-off-by: dotnet-bot <[email protected]>
…otnet#20009) Add MethodImplOptions.AggressiveOptimization and use it for tiering Part of fix for https://github.com/dotnet/corefx/issues/32235 Workaround for https://github.com/dotnet/coreclr/issues/19751 - Added and set CORJIT_FLAG_AGGRESSIVE_OPT to indicate that a method is flagged with AggressiveOptimization - For a method flagged with AggressiveOptimization, tiering uses a foreground tier 1 JIT on first call to the method, skipping the tier 0 JIT and call counting - When tiering is disabled, a method flagged with AggressiveOptimization does not use r2r-pregenerated code - R2r crossgen does not generate code for a method flagged with AggressiveOptimization
…otnet#20009) Add MethodImplOptions.AggressiveOptimization and use it for tiering Part of fix for https://github.com/dotnet/corefx/issues/32235 Workaround for https://github.com/dotnet/coreclr/issues/19751 - Added and set CORJIT_FLAG_AGGRESSIVE_OPT to indicate that a method is flagged with AggressiveOptimization - For a method flagged with AggressiveOptimization, tiering uses a foreground tier 1 JIT on first call to the method, skipping the tier 0 JIT and call counting - When tiering is disabled, a method flagged with AggressiveOptimization does not use r2r-pregenerated code - R2r crossgen does not generate code for a method flagged with AggressiveOptimization
…#20009) Add MethodImplOptions.AggressiveOptimization and use it for tiering Part of fix for https://github.com/dotnet/corefx/issues/32235 Workaround for https://github.com/dotnet/coreclr/issues/19751 - Added and set CORJIT_FLAG_AGGRESSIVE_OPT to indicate that a method is flagged with AggressiveOptimization - For a method flagged with AggressiveOptimization, tiering uses a foreground tier 1 JIT on first call to the method, skipping the tier 0 JIT and call counting - When tiering is disabled, a method flagged with AggressiveOptimization does not use r2r-pregenerated code - R2r crossgen does not generate code for a method flagged with AggressiveOptimization Signed-off-by: dotnet-bot <[email protected]>
…otnet#20009) Add MethodImplOptions.AggressiveOptimization and use it for tiering Part of fix for https://github.com/dotnet/corefx/issues/32235 Workaround for https://github.com/dotnet/coreclr/issues/19751 - Added and set CORJIT_FLAG_AGGRESSIVE_OPT to indicate that a method is flagged with AggressiveOptimization - For a method flagged with AggressiveOptimization, tiering uses a foreground tier 1 JIT on first call to the method, skipping the tier 0 JIT and call counting - When tiering is disabled, a method flagged with AggressiveOptimization does not use r2r-pregenerated code - R2r crossgen does not generate code for a method flagged with AggressiveOptimization
First I woudl like to highlight that the mitigation mention above of using an attribute has been implemented. In particular if you add the AggresiveInlining flags to the 'RunIteration' method like so
The original repro will be fixed. Thus any particular instance of the issue can be mitigated in a straightfoward way. You just have to realize that you need to do it. Second I should mention that it is relatively rare that this situation (where a perf-critical method is only run once) is pretty rare in real code. Much more typically perf-critical methods are called repeatedly. The sad exception to this is microbenchmarks (they are typically designed to be a hot loop, and if the important code is in the loop (rather that what was called from the loop), you will get this issue. Having benchmarks add the AggressiveInlininng switch. It has been suggested above that On Stack Replacement (OSR) is the solution to this bug (since it would allow methods that have not returned to updated. This of course would solve the problem but figuring how how to map local state is non-trivial and likely error prone, which is why the cost-benefit of actually fixing this bug that way is low. However I would like to suggest two other possible mitigations. They are not as comprehensive as OSR, but they are much easier, and likely to get use most of the way.
My main point here is that we have a way that users can fix things for sure, so the only issue is that they have to discover that, and that may not happen. We DON'T have to do a perfect job trying to detect the rest, and it is OK to be very heuristic and it is OK if the heuristics look 'ugly' because they are a 'best effort' kind of a thing. I personally think we should do (1) for V3.0 |
Updated workaround info above to include the attribute. I think you meant RE (1):
|
RE (3), OSR-like strategies would also be useful for profile-based speculative optimizations along with solving this issue. So it could be considered that fixing this issue with such a strategy is just a side-effect. |
…ames - Tier 0 JIT is being called quick JIT in config options, renamed DisableTier0Jit to StartupTierQuickJit - Disabled quick JIT by default, the current plan is to do that for preview 4 - Concerns were that code produced by quick JIT may be slow, may allocate more, may use more stack space, and may be much larger than optimized code, and there there may be many cases where these things lead to regressions when the span of time between startup and steady-state is important - The thought was that with quick JIT disabled, tiering overhead from call counting and backgorund jitting with optimizations would be less, and perf during any point in time would be closer to 2.x releases - This mostly loses the startup perf gains from tiering. It may also be slightly slower compared with tiering off due to some overhead. When quick JIT is disabled for the startup tier, made a change to disable tiered compilation for methods in modules that are not R2R'ed since they will not be tiered currently anyway. The overhead and regression in R2R'ed modules will be looked into separately to see if it can be reduced. - Renamed tier 0 / tier 1 to StartupTier, OptimizedTier - Added config option ForceQuickJit, which uses quick JIT instead of the normal JIT. Off by default. Disables tiering. - Added config option QuickJitForLoops, which determines whether quick JIT, when enabled, may be used for methods that contain loops. Off by default, so StartupTierQuickJit=1 or ForceQuickJit=1 would still not use quick JIT for methods that contain loops by default. Fixes https://github.com/dotnet/coreclr/issues/22998 Fixes https://github.com/dotnet/coreclr/issues/19751
…ames - Tier 0 JIT is being called quick JIT in config options, renamed DisableTier0Jit to StartupTierQuickJit - Disabled quick JIT by default, the current plan is to do that for preview 4 - Concerns were that code produced by quick JIT may be slow, may allocate more, may use more stack space, and may be much larger than optimized code, and there there may be many cases where these things lead to regressions when the span of time between startup and steady-state is important - The thought was that with quick JIT disabled, tiering overhead from call counting and backgorund jitting with optimizations would be less, and perf during any point in time would be closer to 2.x releases - This mostly loses the startup perf gains from tiering. It may also be slightly slower compared with tiering off due to some overhead. When quick JIT is disabled for the startup tier, made a change to disable tiered compilation for methods in modules that are not R2R'ed since they will not be tiered currently anyway. The overhead and regression in R2R'ed modules will be looked into separately to see if it can be reduced. - Renamed tier 0 / tier 1 to StartupTier, OptimizedTier - Added config option ForceQuickJit, which uses quick JIT instead of the normal JIT. Off by default. Disables tiering. - Added config option QuickJitForLoops, which determines whether quick JIT, when enabled, may be used for methods that contain loops. Off by default, so StartupTierQuickJit=1 or ForceQuickJit=1 would still not use quick JIT for methods that contain loops by default. Fixes https://github.com/dotnet/coreclr/issues/22998 Fixes https://github.com/dotnet/coreclr/issues/19751
- Tier 0 JIT is being called quick JIT in config options, renamed DisableTier0Jit to StartupTierQuickJit - Disabled quick JIT by default, the current plan is to do that for preview 4 - Concerns were that code produced by quick JIT may be slow, may allocate more, may use more stack space, and may be much larger than optimized code, and there there may be many cases where these things lead to regressions when the span of time between startup and steady-state is important - The thought was that with quick JIT disabled, tiering overhead from call counting and backgorund jitting with optimizations would be less, and perf during any point in time would be closer to 2.x releases - This mostly loses the startup perf gains from tiering. It may also be slightly slower compared with tiering off due to some overhead. When quick JIT is disabled for the startup tier, made a change to disable tiered compilation for methods in modules that are not R2R'ed since they will not be tiered currently anyway. The overhead and regression in R2R'ed modules will be looked into separately to see if it can be reduced. - Added config option ForceQuickJit, which uses quick JIT instead of the normal JIT. Off by default. Disables tiering. Fixes https://github.com/dotnet/coreclr/issues/22998 Fixes https://github.com/dotnet/coreclr/issues/19751
- Tier 0 JIT is being called quick JIT in config options, renamed DisableTier0Jit to StartupTierQuickJit - Disabled quick JIT by default, the current plan is to do that for preview 4 - Concerns were that code produced by quick JIT may be slow, may allocate more, may use more stack space, and may be much larger than optimized code, and there there may be many cases where these things lead to regressions when the span of time between startup and steady-state is important - The thought was that with quick JIT disabled, tiering overhead from call counting and backgorund jitting with optimizations would be less, and perf during any point in time would be closer to 2.x releases - This mostly loses the startup perf gains from tiering. It may also be slightly slower compared with tiering off due to some overhead. When quick JIT is disabled for the startup tier, made a change to disable tiered compilation for methods in modules that are not R2R'ed since they will not be tiered currently anyway. The overhead and regression in R2R'ed modules will be looked into separately to see if it can be reduced. - Added config option ForceQuickJit, which uses quick JIT instead of the normal JIT. Off by default. Disables tiering. Fixes https://github.com/dotnet/coreclr/issues/22998 Fixes https://github.com/dotnet/coreclr/issues/19751
…option (#23599) Disable tier 0 JIT (quick JIT) by default, rename config option - Tier 0 JIT is being called quick JIT in config options, renamed DisableTier0Jit to StartupTierQuickJit - Disabled quick JIT by default, the current plan is to do that for preview 4 - Concerns were that code produced by quick JIT may be slow, may allocate more, may use more stack space, and may be much larger than optimized code, and there there may be many cases where these things lead to regressions when the span of time between startup and steady-state is important - The thought was that with quick JIT disabled, tiering overhead from call counting and backgorund jitting with optimizations would be less, and perf during any point in time would be closer to 2.x releases - This mostly loses the startup perf gains from tiering. It may also be slightly slower compared with tiering off due to some overhead. When quick JIT is disabled for the startup tier, made a change to disable tiered compilation for methods in modules that are not R2R'ed since they will not be tiered currently anyway. The overhead and regression in R2R'ed modules will be looked into separately to see if it can be reduced. Fixes https://github.com/dotnet/coreclr/issues/22998 Fixes https://github.com/dotnet/coreclr/issues/19751
…y default Fixes https://github.com/dotnet/coreclr/issues/19751 by default when QuickJit is enabled - Added config variable TC_QuickJitForLoops. When disabled (the default), the JIT identifies loops and explicit tail calls and switches to tier 1 JIT. - This would prevent the possibility of spending too long in QuickJit code, but may decrease startup time a bit when QuickJit is enabled - Removed TC_StartupTier_OptimizeCode, as now that there is TC_QuickJit, I didn't see a good use for it - Removed references to "StartupTier" in config variables because we had previously decided not to call it that. - When QuickJit is disabled, avoid creating native code slots for methods in non-R2R'ed modules, as tiering would be disabled for those anyway
…y default (#24252) When QuickJit is enabled, disable it for methods that contain loops by default Fixes https://github.com/dotnet/coreclr/issues/19751 by default when QuickJit is enabled - Added config variable TC_QuickJitForLoops. When disabled (the default), the JIT identifies loops and explicit tail calls and switches to tier 1 JIT. - This would prevent the possibility of spending too long in QuickJit code, but may decrease startup time a bit when QuickJit is enabled - Removed TC_StartupTier_OptimizeCode, as now that there is TC_QuickJit, I didn't see a good use for it - Removed references to "StartupTier" in config variables because we had previously decided not to call it that. - When QuickJit is disabled, avoid creating native code slots for methods in non-R2R'ed modules, as tiering would be disabled for those anyway - Marked TC_QuickJit config var as external
Average iterations per ms with tiering disabled:
2775.05
Tiering enabled:
2045.84
A comparison of PerfView profiles shows that some inlining is not happening:
The JITStats summary shows that the only JIT trigger for
Main
isFG
(foreground), which when tiering is enabled, is tier 0 (minopts), which does not do inlining. There is noTC
trigger to indicate tier 1 forMain
.A workaround is to move the iteration code into a separate method:
Average iterations per ms with tiering disabled:
2775.55
Tiering enabled:
2728.70
The PerfView profile now shows most of the time spent is exclusively in
RunIteration
as expected:List.Add
is still showing up, and that must be whenRunIteration
was at tier 0, as the JITStats summary shows:The last sample in
List.Add
in the profile was at1,792.351
. The tier 1 JIT forRunIteration
was initiated at1,791.821
and would have completed at around1,792.421
.Other workarounds:
COMPlus_TieredCompilation_DisableTier0Jit=1
or in project file<DisableTier0Jit>true</DisableTier0Jit>
). In this mode, methods that don't have pregenerated code would be optimized initially. It may be useful as a global workaround for a suite of benchmarks where there may be several instances of cold methods with hot loops. For apps, it would avoid the worst-case situations where a cold method jitted at tier 0 contains a hot loop that runs for a long time. It would still be possible to be running a long-running hot loop in a cold method that has not yet been jitted at tier 1, but it would be running optimized pregenerated code, so the perf may be reasonable and the issue may not be as severe.MethodImplOptions.AggressiveOptimization
. In the first example above, that would be:COMPlus_TieredCompilation=0
or in project file<TieredCompilation>false</TieredCompilation>
) for such types of benchmarksConsiderations:
The text was updated successfully, but these errors were encountered: