-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JIT: enable tail calls and copy omission for implicit byref structs #33004
JIT: enable tail calls and copy omission for implicit byref structs #33004
Conversation
Handle cases where a caller passes an implicit byref argument struct to a callee in tail position. There is no need to create a local copy or to block tail calling. This pattern comes up increasingly often as we write more layered code operating on spans and similar structs.
cc @dotnet/jit-contrib Addresses issues seen in #13824 (see comment) and #32843 (see comment). This modifies the promotion heuristic for implicit byrefs to not consider uses at (some) call sites as significant for promotion, and modifies the legality checks for tail calls to allow implicit byref params to be passed as args in tail calls. To make this work we then must avoid copying the param to a local struct at the call. Jit diffs shows a lot of hits in the json parser and other places where spans are passed through various layers.
Lots of interesting diffs, here's simple one:
More is possible here:
|
Needs an extra safety check to make sure we're not introducing aliasing -- eg same byref passed to callee in two param slots, and callee writes to one.... hopefully doesn't happen often. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm minus a nit
src/coreclr/src/jit/morph.cpp
Outdated
// | ||
// Otherwise, see how many appearances there are. We keep two early ref counts: total | ||
// number of references to the struct or some field, and how many of these are | ||
// arguments to calls. We undo promotion unless we see enougn non-call uses. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: enougn
-> enough
Raised issue for "Linux arm checked" #33012 |
As with anything aliasing related, checking is costly... still working on it. |
It’d be nice to test & implement similar for mono. /cc @lambdageek |
Note for coreclr, this is windows-x64 and windows-arm64 only. More register-intensive ABIs like SysV should (ideally) already be able to optimize some of these cases -- that is, if we keep passed-via-registers structs in registers, they can just flow through to callees. But if we materialize the struct on the frame (as has been our habit) then we will miss these cases on SysV too. The fix for that is quite different than the fix in this PR. |
A first cut at an alias analysis is in, not sure if its complexity matches its value, though. Belive there is a bug in morph that I'm working around for now; will chase this down separately, and if/when fixed we can get rid of the check for Have changed the "do not return" case to require that there be just one reference to the implicit byref in the method. We're only doing the alias analysis for tail calls, so those are the only cases where we can safely deal with multiple references. This version has basically the same diffs as before (we lose 9 cases); we don't expect to see true aliases very often. Magic arg limit of 6 chosen so we still get the copy elimination and tail calls in SPC's runtime/src/libraries/System.Private.CoreLib/src/System/Number.Formatting.cs Lines 747 to 757 in 7c7e8ed
;;; BEFORE
; Assembly listing for method Number:TryFormatInt32(int,int,ReadOnlySpan`1,IFormatProvider,Span`1,byref):bool
; Emitting BLENDED_CODE for X64 CPU with AVX - Windows
; optimized code
; rsp based frame
; partially interruptible
; Final local variable assignments
;
; V00 arg0 [V00,T01] ( 6, 4 ) int -> rsi
; V01 arg1 [V01,T03] ( 3, 2.50) int -> rdx
; V02 arg2 [V02,T00] ( 4, 7 ) byref -> r8 ld-addr-op
; V03 arg3 [V03,T02] ( 5, 3.50) ref -> r9 class-hnd
; V04 arg4 [V04,T04] ( 2, 4 ) byref -> rcx
; V05 arg5 [V05,T16] ( 3, 1.50) byref -> rdi
; V06 OutArgs [V06 ] ( 1, 1 ) lclBlk (48) [rsp+0x00] "OutgoingArgSpace"
; V07 tmp1 [V07,T17] ( 3, 1.50) ref -> rax class-hnd "Inline return value spill temp"
;* V08 tmp2 [V08 ] ( 0, 0 ) byref -> zero-ref V12._pointer(offs=0x00) P-INDEP "field V02._pointer (fldOffset=0x0)"
;* V09 tmp3 [V09 ] ( 0, 0 ) int -> zero-ref V12._length(offs=0x08) P-INDEP "field V02._length (fldOffset=0x8)"
; V10 tmp4 [V10,T08] ( 4, 2.50) byref -> rbx V13._pointer(offs=0x00) P-INDEP "field V04._pointer (fldOffset=0x0)"
; V11 tmp5 [V11,T09] ( 4, 2.50) int -> rbp V13._length(offs=0x08) P-INDEP "field V04._length (fldOffset=0x8)"
;* V12 tmp6 [V12 ] ( 0, 0 ) struct (16) zero-ref "Promoted implicit byref"
;* V13 tmp7 [V13 ] ( 0, 0 ) struct (16) zero-ref "Promoted implicit byref"
; V14 tmp8 [V14 ] ( 6, 6 ) struct (16) [rsp+0x48] do-not-enreg[XSB] must-init addr-exposed "by-value struct argument"
; V15 tmp9 [V15,T05] ( 3, 3 ) byref -> r9 stack-byref "BlockOp address local"
; V16 tmp10 [V16,T12] ( 2, 2 ) int -> rcx "argument with side effect"
; V17 tmp11 [V17,T10] ( 2, 2 ) ref -> r8 "argument with side effect"
; V18 tmp12 [V18,T06] ( 3, 3 ) byref -> r8 stack-byref "BlockOp address local"
; V19 tmp13 [V19,T13] ( 2, 2 ) int -> rcx "argument with side effect"
; V20 tmp14 [V20 ] ( 2, 2 ) struct (16) [rsp+0x38] do-not-enreg[XSB] must-init addr-exposed "by-value struct argument"
; V21 tmp15 [V21,T07] ( 3, 3 ) byref -> r8 stack-byref "BlockOp address local"
; V22 tmp16 [V22,T14] ( 2, 2 ) int -> rcx "argument with side effect"
; V23 tmp17 [V23,T15] ( 2, 2 ) int -> rdx "argument with side effect"
; V24 tmp18 [V24,T11] ( 2, 2 ) ref -> r9 "argument with side effect"
;
; Lcl frame size = 88
G_M23990_IG01:
push rdi
push rsi
push rbp
push rbx
sub rsp, 88
vzeroupper
xor rax, rax
mov qword ptr [rsp+48H], rax
mov qword ptr [rsp+38H], rax
mov esi, ecx
mov rcx, bword ptr [rsp+A0H]
mov rdi, bword ptr [rsp+A8H]
;; bbWeight=1 PerfScore 9.75
G_M23990_IG02:
mov rbx, bword ptr [rcx]
mov ebp, dword ptr [rcx+8]
;; bbWeight=1 PerfScore 4.00
G_M23990_IG03:
cmp dword ptr [r8+8], 0
jne SHORT G_M23990_IG10
;; bbWeight=1 PerfScore 3.00
G_M23990_IG04:
test esi, esi
jge SHORT G_M23990_IG08
test r9, r9
je SHORT G_M23990_IG05
mov rcx, r9
call NumberFormatInfo:<GetInstance>g__GetProviderNonNull|42_0(IFormatProvider):NumberFormatInfo
jmp SHORT G_M23990_IG06
;; bbWeight=0.50 PerfScore 2.88
G_M23990_IG05:
call NumberFormatInfo:get_CurrentInfo():NumberFormatInfo
;; bbWeight=0.50 PerfScore 0.50
G_M23990_IG06:
mov ecx, esi
mov r8, gword ptr [rax+40]
lea r9, bword ptr [rsp+48H]
mov bword ptr [r9], rbx
mov dword ptr [r9+8], ebp
mov bword ptr [rsp+20H], rdi
lea r9, bword ptr [rsp+48H]
mov edx, -1
call Number:TryNegativeInt32ToDecStr(int,int,String,Span`1,byref):bool
nop
;; bbWeight=0.50 PerfScore 3.88
G_M23990_IG07:
add rsp, 88
pop rbx
pop rbp
pop rsi
pop rdi
ret
;; bbWeight=0.50 PerfScore 1.63
G_M23990_IG08:
mov ecx, esi
lea r8, bword ptr [rsp+48H]
mov bword ptr [r8], rbx
mov dword ptr [r8+8], ebp
lea r8, bword ptr [rsp+48H]
mov r9, rdi
mov edx, -1
call Number:TryUInt32ToDecStr(int,int,Span`1,byref):bool
nop
;; bbWeight=0.50 PerfScore 2.50
G_M23990_IG09:
add rsp, 88
pop rbx
pop rbp
pop rsi
pop rdi
ret
;; bbWeight=0.50 PerfScore 1.63
G_M23990_IG10:
mov ecx, esi
;; bbWeight=0.50 PerfScore 0.13
G_M23990_IG11:
vmovdqu xmm0, xmmword ptr [r8]
vmovdqu xmmword ptr [rsp+38H], xmm0
;; bbWeight=0.50 PerfScore 1.50
G_M23990_IG12:
lea r8, bword ptr [rsp+48H]
mov bword ptr [r8], rbx
mov dword ptr [r8+8], ebp
mov bword ptr [rsp+28H], rdi
lea r8, bword ptr [rsp+38H]
lea rax, bword ptr [rsp+48H]
mov bword ptr [rsp+20H], rax
call Number:<TryFormatInt32>g__TryFormatInt32Slow|33_0(int,int,ReadOnlySpan`1,IFormatProvider,Span`1,byref):bool
nop
;; bbWeight=0.50 PerfScore 3.38
G_M23990_IG13:
add rsp, 88
pop rbx
pop rbp
pop rsi
pop rdi
ret
;; bbWeight=0.50 PerfScore 1.63
; Total bytes of code 228, prolog size 23, PerfScore 59.28, (MethodHash=5b40a24c) for method Number:TryFormatInt32(int,int,ReadOnlySpan`1,IFormatProvider,Span`1,byref):bool
;;; AFTER
; Assembly listing for method Number:TryFormatInt32(int,int,ReadOnlySpan`1,IFormatProvider,Span`1,byref):bool
; Emitting BLENDED_CODE for X64 CPU with AVX - Windows
; optimized code
; rsp based frame
; fully interruptible
; Final local variable assignments
;
; V00 arg0 [V00,T01] ( 6, 4 ) int -> rsi
; V01 arg1 [V01,T03] ( 3, 2.50) int -> rdx
; V02 arg2 [V02,T00] ( 4, 7 ) byref -> r8 ld-addr-op
; V03 arg3 [V03,T02] ( 5, 3.50) ref -> r9 class-hnd
; V04 arg4 [V04,T04] ( 3, 3 ) byref -> rdi
; V05 arg5 [V05,T06] ( 3, 1.50) byref -> rbx
; V06 OutArgs [V06 ] ( 1, 1 ) lclBlk (32) [rsp+0x00] "OutgoingArgSpace"
; V07 tmp1 [V07,T07] ( 3, 1.50) ref -> rax class-hnd "Inline return value spill temp"
;* V08 tmp2 [V08 ] ( 0, 0 ) byref -> zero-ref V12._pointer(offs=0x00) P-INDEP "field V02._pointer (fldOffset=0x0)"
;* V09 tmp3 [V09 ] ( 0, 0 ) int -> zero-ref V12._length(offs=0x08) P-INDEP "field V02._length (fldOffset=0x8)"
;* V10 tmp4 [V10 ] ( 0, 0 ) byref -> zero-ref V13._pointer(offs=0x00) P-INDEP "field V04._pointer (fldOffset=0x0)"
;* V11 tmp5 [V11 ] ( 0, 0 ) int -> zero-ref V13._length(offs=0x08) P-INDEP "field V04._length (fldOffset=0x8)"
;* V12 tmp6 [V12 ] ( 0, 0 ) struct (16) zero-ref "Promoted implicit byref"
;* V13 tmp7 [V13 ] ( 0, 0 ) struct (16) zero-ref "Promoted implicit byref"
; V14 rat0 [V14,T05] ( 2, 2 ) byref -> r9 "Fast tail call lowering is creating a new local variable"
;
; Lcl frame size = 32
G_M23990_IG01:
push rdi
push rsi
push rbx
sub rsp, 32
mov esi, ecx
mov rdi, bword ptr [rsp+60H]
mov rbx, bword ptr [rsp+68H]
;; bbWeight=1 PerfScore 5.50
G_M23990_IG02:
cmp dword ptr [r8+8], 0
jne SHORT G_M23990_IG09
;; bbWeight=1 PerfScore 3.00
G_M23990_IG03:
test esi, esi
jge SHORT G_M23990_IG07
test r9, r9
je SHORT G_M23990_IG04
mov rcx, r9
call NumberFormatInfo:<GetInstance>g__GetProviderNonNull|42_0(IFormatProvider):NumberFormatInfo
jmp SHORT G_M23990_IG05
;; bbWeight=0.50 PerfScore 2.88
G_M23990_IG04:
call NumberFormatInfo:get_CurrentInfo():NumberFormatInfo
;; bbWeight=0.50 PerfScore 0.50
G_M23990_IG05:
mov r9, rdi
mov bword ptr [rsp+60H], rbx
mov r8, gword ptr [rax+40]
mov ecx, esi
mov edx, -1
;; bbWeight=0.50 PerfScore 1.88
G_M23990_IG06:
add rsp, 32
pop rbx
pop rsi
pop rdi
jmp Number:TryNegativeInt32ToDecStr(int,int,String,Span`1,byref):bool
;; bbWeight=0.50 PerfScore 1.88
G_M23990_IG07:
mov ecx, esi
mov r9, rbx
mov r8, rdi
mov edx, -1
;; bbWeight=0.50 PerfScore 0.50
G_M23990_IG08:
add rsp, 32
pop rbx
pop rsi
pop rdi
jmp Number:TryUInt32ToDecStr(int,int,Span`1,byref):bool
;; bbWeight=0.50 PerfScore 1.88
G_M23990_IG09:
mov bword ptr [rsp+60H], rdi
mov bword ptr [rsp+68H], rbx
mov ecx, esi
;; bbWeight=0.50 PerfScore 1.13
G_M23990_IG10:
add rsp, 32
pop rbx
pop rsi
pop rdi
jmp Number:<TryFormatInt32>g__TryFormatInt32Slow|33_0(int,int,ReadOnlySpan`1,IFormatProvider,Span`1,byref):bool
;; bbWeight=0.50 PerfScore 1.88
; Total bytes of code 130, prolog size 19, PerfScore 34.00, (MethodHash=5b40a24c) for method Number:TryFormatInt32(int,int,ReadOnlySpan`1,IFormatProvider,Span`1,byref):bool Note we could tighten up the latter a bit more, there's no need to copy the span in to RDI/RBX and back. |
@AndyAyersMS I rebased #32843 on top of this change (and removed the in weirdness that was contentious in that PR) and I'm seeing significantly improved codegen. 😄
The only remaining nit is the "jump-to-jump" pattern that appears in the below codegen, but I don't know if that's related to this PR or if it's a separate opportunity for further cleanup. (Edit: maybe it's for predict-not-taken scenarios?) 00007ffe`449db320 4585c9 test r9d,r9d
00007ffe`449db323 7505 jne System_Private_CoreLib!System.Buffers.Text.Utf8Parser.TryParse(System.ReadOnlySpan`1<Byte>, Int32 ByRef, Int32 ByRef, Int32)+0xffffffff`a0ca905a (00007ffe`449db32a)
00007ffe`449db325 e9dedaffff jmp CLRStub[MethodDescPrestub]@7ffe449d8e08 (00007ffe`449d8e08)
00007ffe`449db32a 4183c920 or r9d,20h
00007ffe`449db32e 4183f967 cmp r9d,67h
00007ffe`449db332 7f0e jg System_Private_CoreLib!System.Buffers.Text.Utf8Parser.TryParse(System.ReadOnlySpan`1<Byte>, Int32 ByRef, Int32 ByRef, Int32)+0xffffffff`a0ca9072 (00007ffe`449db342)
00007ffe`449db334 4183f964 cmp r9d,64h
00007ffe`449db338 74eb je System_Private_CoreLib!System.Buffers.Text.Utf8Parser.TryParse(System.ReadOnlySpan`1<Byte>, Int32 ByRef, Int32 ByRef, Int32)+0xffffffff`a0ca9055 (00007ffe`449db325)
00007ffe`449db33a 4183f967 cmp r9d,67h
00007ffe`449db33e 74e5 je System_Private_CoreLib!System.Buffers.Text.Utf8Parser.TryParse(System.ReadOnlySpan`1<Byte>, Int32 ByRef, Int32 ByRef, Int32)+0xffffffff`a0ca9055 (00007ffe`449db325)
00007ffe`449db340 eb18 jmp System_Private_CoreLib!System.Buffers.Text.Utf8Parser.TryParse(System.ReadOnlySpan`1<Byte>, Int32 ByRef, Int32 ByRef, Int32)+0xffffffff`a0ca908a (00007ffe`449db35a)
00007ffe`449db342 4183f96e cmp r9d,6Eh
00007ffe`449db346 7408 je System_Private_CoreLib!System.Buffers.Text.Utf8Parser.TryParse(System.ReadOnlySpan`1<Byte>, Int32 ByRef, Int32 ByRef, Int32)+0xffffffff`a0ca9080 (00007ffe`449db350)
00007ffe`449db348 4183f978 cmp r9d,78h
00007ffe`449db34c 7407 je System_Private_CoreLib!System.Buffers.Text.Utf8Parser.TryParse(System.ReadOnlySpan`1<Byte>, Int32 ByRef, Int32 ByRef, Int32)+0xffffffff`a0ca9085 (00007ffe`449db355)
00007ffe`449db34e eb0a jmp System_Private_CoreLib!System.Buffers.Text.Utf8Parser.TryParse(System.ReadOnlySpan`1<Byte>, Int32 ByRef, Int32 ByRef, Int32)+0xffffffff`a0ca908a (00007ffe`449db35a)
00007ffe`449db350 e9d3daffff jmp CLRStub[MethodDescPrestub]@7ffe449d8e28 (00007ffe`449d8e28)
00007ffe`449db355 e94edbffff jmp CLRStub[MethodDescPrestub]@7ffe449d8ea8 (00007ffe`449d8ea8)
00007ffe`449db35a e991f3ffff jmp CLRStub[MethodDescPrestub]@7ffe449da6f0 (00007ffe`449da6f0) For reference, here's the C# that was used for this codegen: |
@GrabYourPitchforks thanks. For the jump intensive G_M5679_IG02:
4585C9 test r9d, r9d
7505 jne SHORT G_M5679_IG04
G_M5679_IG03:
E9D613A5FF jmp TryParseInt32D(System.ReadOnlySpan`1[Byte],...) and emit something like: G_M5679_IG02:
4585C9 test r9d, r9d
xxxxxxxx je TryParseInt32D(System.ReadOnlySpan`1[Byte],...)
G_M5679_IG04: and similarly elsewhere, but this sort of construct would be a new concept for the jit. And I'm not sure how often we see tail calls with no need for argument setup. Will keep it on my list of things to watch for. SysV codegen looks like it needs some TLC here too. On SysV spans are passed in registers, and here the jit can't resist spilling them to the stack. G_M51636_IG01:
push rbp
sub rsp, 16
lea rbp, [rsp+10H]
mov bword ptr [rbp-10H], rdi
mov qword ptr [rbp-08H], rsi
G_M51636_IG02:
test r8d, r8d
jne SHORT G_M51636_IG05
G_M51636_IG03:
mov rdi, bword ptr [rbp-10H]
mov rsi, qword ptr [rbp-08H]
G_M51636_IG04:
lea rsp, [rbp]
pop rbp
jmp TryParseInt32D(ReadOnlySpan`1,byref,byref):bool |
@dotnet/jit-contrib any other comments? I figured some of this might need re-expressing... |
I'll review this tomorrow. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't find any holes in the algorithm but left some questions and suggestions.
src/coreclr/src/jit/morph.cpp
Outdated
{ | ||
JITDUMP("Arg [%06u] is unpromoted implicit byref V%02u, seeing if we can still tail call\n", | ||
dspTreeID(arg->GetNode()), lcl->GetLclNum()); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would move this JITDUMP
to under the next else
.
return result; | ||
} | ||
|
||
// Will return different answers if x and y refer to same struct |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
y
is not a struct here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
'y' is a ref so it parsed ok to me, but will revise to make it clearer.
src/coreclr/tests/src/JIT/opt/Tailcall/ImplicitByrefTailCallsAliasing.cs
Outdated
Show resolved
Hide resolved
lcl = addr->AsLclVarCommon(); | ||
} | ||
else if (addr->OperIs(GT_ADDR)) | ||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a semantic difference between OBJ(lcl)
and OBJ(ADDR(lcl))
where lcl
is an implicit byref parameter? Do they both represent a copy of the struct? When do we end up with one vs. the other?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What this means changes over time, and we're straddling that change.
See above?
// Here the call to ZZZ would need to be a slow tail call, | ||
// so we won't tail call at all. But the only | ||
// reference to x is at the call, and it's not in | ||
// a loop, so we can still can avoid making a copy. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we also have a test case when the call is in a loop and a test case for NoReturnLastUse?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, will add these.
hasByrefParameter = true; | ||
break; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would rename this var to something like copyByRefStructParameter
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a few questions.
src/coreclr/src/jit/gentree.cpp
Outdated
|
||
if (addr->OperIs(GT_LCL_VAR)) | ||
{ | ||
lcl = addr->AsLclVarCommon(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand this case, LCL_VAR
is used as an address here (in other cases it is used as a value), so it has to be 'byref' or 'ref' type. Then how can it be lvaIsImplicitByRefLocal
if it always should return false for TYP_IMPL
size lclVars?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is potentially confusing.
The short answer is that before morph, arguments passed implicitly by-reference are represented by-value in the IR.
This gets changed in fgMorphImplicitByRefArgs
where we make the by-reference aspect explicit. But the above helper gets called both before and after, and so needs to recognize cases where the local either represents a struct or a pointer to a struct.
Note the first two "match" cases here are pre-existing logic; all I did was move this to a helper method and add the last case.
src/coreclr/src/jit/gentree.cpp
Outdated
// Return Value: | ||
// GenTreeLclVarCommon node for the local, or nullptr. | ||
|
||
GenTreeLclVarCommon* GenTree::IsImplicitByrefParameterValue(Compiler* compiler) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can change GenTreeLclVarCommon
to GenTreeLclVar
, it is more precise for this case.
src/coreclr/src/jit/morph.cpp
Outdated
if (!call->IsTailCallViaHelper() && (varDsc->lvRefCnt(RCS_EARLY) == 1) && !fgMightHaveLoop()) | ||
{ | ||
assert(!call->IsTailCall()); | ||
unsigned varNum = lcl->AsLclVarCommon()->GetLclNum(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need for AsLclVarCommon
.
assert(!call->IsTailCall()); | ||
unsigned varNum = lcl->AsLclVarCommon()->GetLclNum(); | ||
LclVarDsc* varDsc = lvaGetDesc(varNum); | ||
const unsigned short totalAppearances = varDsc->lvRefCnt(RCS_EARLY); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question: how do you decide what is marked as const and what is not? Like why varNum
is not in this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I try to mark all locals that won't be redefined as const, but am more likely to do this for lines I explicitly edit.
{ | ||
// First, see if this arg is a byref param. | ||
GenTreeLclVarCommon* const lcl = arg->GetNode()->IsImplicitByrefParameterValue(this); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: create a temp for lcl->GetLclNum()
?
src/coreclr/src/jit/morph.cpp
Outdated
// a local struct assembled from the promoted fields. | ||
if (lcl != nullptr) | ||
{ | ||
JITDUMP("Arg [%06u] is unpromoted implicit byref V%02u, seeing if we can still tail call\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sentence is confusing: "Arg 000001 is unpromoted implicit byref V002, seeing if we can still tail call ... no, V002 was promoted. "
Can we check varDsc->lvPromoted
before we print that it was not promoted?
JITDUMP("... yes, arg is the only appearance of V%02u\n", lcl->GetLclNum()); | ||
hasByrefParameter = false; | ||
} | ||
// If no other call arg refers to this byref, and no other arg is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't find it in https://github.com/dotnet/runtime/blob/master/docs/coding-guidelines/clr-jit-coding-conventions.md#73-commenting-code-blocks, but my personal opinion is that we should not allow comments after }
and before 'else {`.
// Do we pass 'lcl' more than once to the callee? | ||
if (arg2->isStruct && arg2->passedByRef) | ||
{ | ||
GenTreeLclVarCommon* const lcl2 = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, in the previous use of IsImplicitByrefParameterValue
was for optimizations, so if it was not able to find the lcl
it was correct, it was just rejecting the optimization. Here it looks different: if it doesn't find the local, then it is optimizing. There are cases where locals are buried under IND(ADDR(IND(ADDR()))
sequences, I think IsImplicitByrefParameterValue
won't give us the right result for such cases and it could lead to an incorrect optimization.
Note that if IND long(ADDR(IND long(ADDR(LCL_VAR long)))
doesn't widen, then lclmorph
doesn't mark LCL_VAR
as lvaSetVarAddrExposed
, so the next check won't catch it also.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here it looks different: if it doesn't find the local, then it is optimizing.
No, if we don't find the local, we are being conservative and checking arg
for exposure if arg2
is a TYP_BYREF
or TYP_I_IMPL
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that if IND long(ADDR(IND long(ADDR(LCL_VAR long))) doesn't widen, then lclmorph doesn't mark LCL_VAR as lvaSetVarAddrExposed, so the next check won't catch it also.
I think that's fine, there is no real address exposure here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean a case like call(LCL_VAR v01, IND(ADDR(IND(ADDR(LCL_VAR v01))) )
, when we are checking arg2
for interference with arg1
, how will we catch it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. Yes, that may be a problem. Also, can we have call(LCL_VAR V01, ADD(LCL_VAR V01, INT x))
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, won't we have lvHasLdAddrOp
set in these cases?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't set lvHasLdAddrOp
if the next instruction consumes it:
runtime/src/coreclr/src/jit/flowgraph.cpp
Lines 4797 to 4809 in 73268c7
if (notStruct && notLastInstr && notDebugCode && | |
impILConsumesAddr(codeAddr + sz, impTokenLookupContextHandle, info.compScopeHnd)) | |
{ | |
// We can skip the addrtaken, as next IL instruction consumes | |
// the address. | |
} | |
else | |
{ | |
lvaTable[varNum].lvHasLdAddrOp = 1; | |
if (!info.compIsStatic && (varNum == 0)) | |
{ | |
// Addr taken on "this" pointer is significant, | |
// go ahead to mark it as permanently addr-exposed here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we don't recognize an implicit byref arg via IsImplicitByrefParameterValue
we won't tail call. So for instance, passing a sub-struct of an implicit byref param currently blocks tail calling. I thought about unblocking this but left it as a future enhancement.
If there's an explicit byref arg1 that could alias some other implicit byref arg2 (which is also an implicit byref param), then the only way for arg1 to alias arg2 is for the param local to be address exposed or address taken. This must be an "unconsumed" case per above.
Co-Authored-By: Eugene Rozenfeld <[email protected]>
Use no-opt to ensure tail called methods don't promote their struct args and so undo the aliasing checks. Note that one of the tests that could tail call doesn't because of the way morph reuses struct temps.
OSX builds failing with
|
Fixed this morning with #33481 |
|
||
GenTreeLclVarCommon* GenTree::IsImplicitByrefParameterValue(Compiler* compiler) | ||
{ | ||
#if defined(TARGET_AMD64) || defined(TARGET_ARM64) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be limited to windows amd64?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Windows, yes.
We seem to think arm64 can have these too. I don't know the ABI well enough to say one way or another.
runtime/src/coreclr/src/jit/compiler.h
Lines 444 to 446 in 5ea5b57
#if defined(TARGET_AMD64) || defined(TARGET_ARM64) | |
unsigned char lvIsImplicitByRef : 1; // Set if the argument is an implicit byref. | |
#endif // defined(TARGET_AMD64) || defined(TARGET_ARM64) |
Perhaps we need a define just for this and a cleanup of all references?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like windows arm64 uses the standard ARM64 EABI, so there are no implicit byrefs there. I'll put up a separate PR to clean this up here and throughout the rest of the jit codebase.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was under the impression that arm64 has implicit byrefs, but defining it for unix amd64 seems incorrect for sure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
> 16 byte structs on arm64 windows or linux I believe are passed by reference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see it now.. (from the EABI)
If the argument type is a Composite Type larger than 16 bytes, then the argument is copied to memory allocated by the caller, and the argument is replaced by a pointer to the copy.
So the right condition is windows x64, and all arm64.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for making the changes.
Handle cases where a caller passes an implicit byref argument struct to
a callee in tail position. There is no need to create a local copy or
to block tail calling.
This pattern comes up increasingly often as we write more layered code
operating on spans and similar structs.