-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JIT: Support some assignment decomposition in physical promotion #85105
JIT: Support some assignment decomposition in physical promotion #85105
Conversation
Add support for directly initializing and copying into replacements instead of doing a struct local and read back. Physically promoted struct locals used as sources are still handled conservatively (by first writing them back to stack, then doing the copy). For example, for a case like ``` void Foo() { S s = _field; s.A = s.B + 3; Consume(s); } struct S { public int A, B; } ``` We see the following: ``` STMT00000 ( 0x000[E-] ... 0x006 ) [000003] -A-XG------ ▌ ASG struct (copy) [000002] D------N--- ├──▌ LCL_VAR struct<Program+S, 8> V01 loc0 [000001] ---XG------ └──▌ FIELD struct Program:_field [000000] ----------- └──▌ LCL_VAR ref V00 this (last use) Processing block operation [000003] that involves replacements New statement: STMT00000 ( 0x000[E-] ... 0x006 ) [000029] -A-XG------ ▌ COMMA int [000021] -A-XG------ ├──▌ ASG int [000015] D------N--- │ ├──▌ LCL_VAR int V03 tmp1 [000020] ---XG------ │ └──▌ IND int [000018] ----------- │ └──▌ ADD ref [000016] ----------- │ ├──▌ LCL_VAR ref V00 this [000017] ----------- │ └──▌ CNS_INT long 8 [000028] -A-XG------ └──▌ ASG int [000022] D------N--- ├──▌ LCL_VAR int V04 tmp2 [000027] ---XG------ └──▌ IND int [000025] ----------- └──▌ ADD ref [000023] ----------- ├──▌ LCL_VAR ref V00 this [000024] ----------- └──▌ CNS_INT long 12 ``` The logic is currently quite rudimentary when it comes to holes/uncovered parts of the struct. For example, in the above case if we add another unused field at the end of S then the result is: ``` STMT00000 ( 0x000[E-] ... 0x006 ) [000003] -A-XG------ ▌ ASG struct (copy) [000002] D------N--- ├──▌ LCL_VAR struct<Program+S, 12> V01 loc0 [000001] ---XG------ └──▌ FIELD struct Program:_field [000000] ----------- └──▌ LCL_VAR ref V00 this (last use) Processing block operation [000003] that involves replacements Struct operation is not fully covered by replaced fields. Keeping struct operation. New statement: STMT00000 ( 0x000[E-] ... 0x006 ) [000030] -A-XG------ ▌ COMMA struct [000021] -A-XG------ ├──▌ ASG int [000015] D------N--- │ ├──▌ LCL_VAR int V03 tmp1 [000020] ---XG------ │ └──▌ IND int [000018] ----------- │ └──▌ ADD ref [000016] ----------- │ ├──▌ LCL_VAR ref V00 this [000017] ----------- │ └──▌ CNS_INT long 8 [000029] -A-XG------ └──▌ COMMA struct [000028] -A-XG------ ├──▌ ASG int [000022] D------N--- │ ├──▌ LCL_VAR int V04 tmp2 [000027] ---XG------ │ └──▌ IND int [000025] ----------- │ └──▌ ADD ref [000023] ----------- │ ├──▌ LCL_VAR ref V00 this [000024] ----------- │ └──▌ CNS_INT long 12 [000003] -A-XG------ └──▌ ASG struct (copy) [000002] D------N--- ├──▌ LCL_VAR struct<Program+S, 12> V01 loc0 [000001] ---XG------ └──▌ FIELD struct Program:_field [000000] ----------- └──▌ LCL_VAR ref V00 this ``` In this case it would be significantly more efficient to copy only the remainder, which is just a small part of the struct. However, in the general case it is not easy to predict the most efficient way to do this, and in some cases we cannot even represent the hole in JIT IR (if it involves GC pointers), so I have left this for a future change for now. Liveness should also be beneficial for that as there are many cases where we would expect the remainder to be dead.
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch, @kunalspathak Issue DetailsAdd support for directly initializing and copying into replacements For example, for a case like
We see the following:
The logic is currently quite rudimentary when it comes to
In this case it would be significantly more efficient to copy only the
|
/azp run runtime-jit-experimental |
Azure Pipelines successfully started running 1 pipeline(s). |
/azp run runtime-jit-experimental |
Azure Pipelines successfully started running 1 pipeline(s). |
…ion-decompose-assignments
cc @dotnet/jit-contrib PTAL @AndyAyersMS This passed jit-experimental. Diffs with physical promotion. Diffs without old promotion. Looks like there's a ton of generated regex code that has the same kind of regression in x86 that ends up disproportionally affecting the benchmarks diffs. Appears to be a case where the promoted fields end up without registers, and the moved def of the field appears to no longer be a candidate for having a single spill def location. I will investigate further tomorrow, but this PR should be good to review now (also need to address some formatting issues). |
/azp run runtime-coreclr jitstress, runtime-coreclr libraries-jitstress |
Azure Pipelines successfully started running 2 pipeline(s). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall I am not clear on how the apparently flow-sensitive values of Replacement.NeedsReadBack
and 'Replacement.NeedsWriteBack` are handled. Say a struct with promotions is assigned to differently in then/else and then some promoted piece is accessed below the join point, how does that access know if it needs to be read back?
I suppose the answer is that immediately after each assignment if a read back is needed then you just do it there instead of trying to defer, instead of trying to find "optimal" placement downstream somewhere...?
// Parameters: | ||
// lclNum - the local | ||
// | ||
void IncrementRefCount(unsigned lclNum) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Share this with local address visitor?
Also the local address visitor's UpdateEarlyRefCount
does things with weighted ref counts which would be difficult to duplicate here -- is there any adverse interaction caused by not updating them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Share this with local address visitor?
Will do.
Also the local address visitor's UpdateEarlyRefCount does things with weighted ref counts which would be difficult to duplicate here -- is there any adverse interaction caused by not updating them?
The ref counts we are updating here are only for "existing" locals that appear under assignments, so I don't think the weighted ref counts are problematic.
The remainder of the pass is not handling the ref counts properly for the locals it introduces, I will submit a separate change to get that sorted out. There won't be an adverse effect of not updating the weighted ref count since we only use that to undo "old" promotion (the other use of them, for omitting some implicit byref copies, was replaced by liveness if you recall).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I introduced an LclVarDsc::incLvRefCntSaturating
and used it from both. Didn't seem too easy to factor the rest of the promotion-related common logic due to the weighted ref-count logic you mentioned.
{ | ||
JITDUMP("Copy [%06u] is between two physically promoted locals with replacements\n", | ||
Compiler::dspTreeID(asg)); | ||
JITDUMP("*** Conservative: Phys<->phys copies not yet supported; inserting conservative write-back\n"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this case uncommon or something you want to work on later?
Would it be interesting to check for self assignment or do we just not see those?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I want to handle sources in an upcoming change (I have most of this code written). Will keep in mind to add a check for self assignments then.
if (!covered) | ||
{ | ||
JITDUMP("Struct operation is not fully covered by replaced fields. Keeping struct operation.\n"); | ||
result.AddStatement(asg); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I am confused here -- it seems like result
already accumulated IR for the handled cases and now we're appending a full copy, but some of the copied bits may be stale if we didn't write them back to the source -- but we know not to read from the corresponding bits in the destination?
If so maybe expand on the comment below to say "For handled replacements, we must not read back..."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The copied bits won't be stale here since this PR always (conservatively) writes the source replacements back to its struct local to begin with, but yes, for the replacements that were handled we won't read them back from the struct local due to the loop below, so it wouldn't matter even if they were.
Once we get to handling sources this "covering" copy will need to happen before the copying of the source replacements since otherwise we would overwrite the fresh bits (from the source's replacements) with stale bits (from the source's struct local).
Currently the invariant is that at the entrance to all BBs the replacement locals always contain the freshest value. So there is "resolution" done at the end of each BB to make that so if runtime/src/coreclr/jit/promotion.cpp Lines 1146 to 1170 in 2c9e483
There's definitely more clever stuff we can do to optimize the insertion of the read backs/write backs. |
/azp run runtime-coreclr superpmi-replay |
Azure Pipelines successfully started running 1 pipeline(s). |
#85251 is an attempt at fixing the problem and resolves the regression for the collection: [15:31:32] 10,193 contexts with diffs (9,045 improvements, 522 regressions, 626 same size)
[15:31:32] -46,981/+7,885 bytes |
…ion-decompose-assignments
New superpmi-diffs runs with #85251. |
Add support for directly initializing and copying into replacements
instead of doing a struct local and read back. Physically promoted
struct locals used as sources are still handled conservatively (by first
writing them back to stack, then doing the copy).
For example, for a case like
We see the following:
The logic is currently quite rudimentary when it comes to
holes/uncovered parts of the struct. For example, in the above case if
we add another unused field at the end of S then the result is:
In this case it would be significantly more efficient to copy only the
remainder, which is just a small part of the struct. However, in the
general case it is not easy to predict the most efficient way to do
this, and in some cases we cannot even represent the hole in JIT IR (if
it involves GC pointers), so I have left this for a future change for
now. Liveness should also be beneficial for that as there are many cases
where we would expect the remainder to be dead.