-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Perform reachability analysis before codegen #66967
Conversation
Tagging subscribers to this area: @JulieLeeMSFT Issue DetailsPerform reachability analysis one last time before doing code gen to eliminate unreachable blocks.
|
@dotnet/jit-contrib |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you actually removing the blocks anywhere?
fgComputeReachability
is not particularly efficient (it blindly unions all preds instead of just the ones with updates).
It is overkill for what I think you are looking for here -- you only want to know if a block A is reachable from method entry, not whether block A is reachable from block B.
fgComputeReachability
also updates the gc safe block flags, which, while likely harmless is probably not something we want to be doing.
So I think you can get this information more cheaply and with less collateral impact.
src/coreclr/jit/compiler.cpp
Outdated
EnsureBasicBlockEpoch(); | ||
fgComputeReachability(/* computeDoms */ false, /* doRenumbering */ false); | ||
}; | ||
DoPhase(this, PHASE_COMPUTE_REACHABILITY, computeReachability); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would use a different local name and a different phase name (reusing a phase name likely messes up some stats computations).
src/coreclr/jit/compiler.cpp
Outdated
@@ -4769,7 +4769,8 @@ void Compiler::compCompile(void** methodCodePtr, uint32_t* methodCodeSize, JitFl | |||
|
|||
// Compute reachability sets and dominators. | |||
// | |||
DoPhase(this, PHASE_COMPUTE_REACHABILITY, &Compiler::fgComputeReachability); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does this need to change?
src/coreclr/jit/emit.cpp
Outdated
@@ -6556,7 +6556,8 @@ unsigned emitter::emitEndCodeGen(Compiler* comp, | |||
#ifdef DEBUG | |||
if (emitComp->opts.disAsm || emitComp->verbose) | |||
{ | |||
printf("\t\t\t\t\t\t;; bbWeight=%s PerfScore %.2f", refCntWtd2str(ig->igWeight), ig->igPerfScore); | |||
printf("\t\t\t\t\t\t;; size=%d bbWeight=%s PerfScore %.2f", ig->igSize, refCntWtd2str(ig->igWeight), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: why so much whitespace?
src/coreclr/jit/fgopt.cpp
Outdated
// Arguments: | ||
// renumberingDone -- `true` if block renumbering was done. | ||
// | ||
void Compiler::fgComputeEnterBlocksSet(DEBUG_ARG1(const bool renumberingDone)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is kind of annoying (adding an arg just for this). Why not just remove the offending assert instead? Doesn't seem like it has much value anyway.
src/coreclr/jit/fgopt.cpp
Outdated
@@ -543,10 +546,14 @@ bool Compiler::fgRemoveUnreachableBlocks() | |||
// Also, compute the list of return blocks `fgReturnBlocks` and set of enter blocks `fgEnterBlks`. | |||
// Delete unreachable blocks. | |||
// | |||
// Arguments: | |||
// computeDoms - Whether to compute doms or not. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be useful to understand (and add to this comment block) when renumbering is required, and why. And when it is not. Same for creating doms.
Yes, after it tracks the runtime/src/coreclr/jit/fgopt.cpp Lines 455 to 464 in 8b42dff
|
@AndyAyersMS - Just to answer your original question, yes it does remove block in that method as seen in asmdiffs. |
I have verified the diffs between the new implementation and previous (exhaustive) implementation and there is no diff. |
Ping. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the approach is good.
I have one question and also wondered if you had new TP measurements.
I guess there are superpmi replay for linux-arm failures that I need to look into. |
Latest pin numbers on superpmi libraries_pmi collection:
|
failure related to #59542 |
Improvements in System.Text.Json.Tests.Perf_Booleans dotnet/perf-autofiling-issues#4364 on x64 |
Perform reachability analysis one last time before doing code gen to eliminate unreachable blocks.
While investigating, #66578, I noticed that there was lot of dead blocks that were kept around and that keeps variables alive longer and creating unnecessary refpositions. Having extra refpositions is not problematic, but the liveness of variables was not tracked correctly because of unreachable blocks. The liveness of variables decides if an interval should be spilled or not. As seen in my analysis in #66578 (comment), there were 3 refpositions (def, use, use) created for the problematic variable, but because of missing liveness, both the uses were marked as
lastUse
leading to making the interval as inactive instead of spilled.In this PR, I have added the
computeReachability()
phase after all the optimizations are done so it can remove any unreachable blocks. It performs the block renumbering, but since LSRA is sensitive to block renumbering (see #66994), I didn't want to see its impact as part of this change. Hence, I have added an option to skip the block renumbering in the final phase.PIN numbers on windows/x64 SPMI
Also emit the block size after every block in order to spot the code size diff at block level.
Fixes: #66578