-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[NativeAOT] Introduce pointer-based CompareExchange intrinsic and use operating with syncblock bits. #106703
Conversation
cc @dotnet/jit-contrib This adds pointer-based overload for CompareExchange. Could you please check that the JIT is going to handle it correctly? |
@dotnet/jit-contrib Even though no JIT parts were touched, the change introduces a new JIT intrinsic for CompareExchange. It just that the the expansion is exactly the same as the existing one. I could not find any places that need to be updated in JIT code for this change, but want to be sure. |
src/coreclr/nativeaot/System.Private.CoreLib/src/System/Threading/Interlocked.cs
Outdated
Show resolved
Hide resolved
For CompareExchange intrinsic everything seems to be driven by the return type, which is the same As a result, once the pointer-taking CompareExchange overload is introduced, JIT emits the same expansion for it as for byref-based one, modulo GC tracking of the ref. That is without any changes to the JIT. Interlocked:CompareExchange(byref,int,int) ; Assembly listing for method System.Threading.Interlocked:CompareExchange(byref,int,int):int (MinOpts)
; Emitting BLENDED_CODE for generic X64 - Windows
; MinOpts code
; NativeAOT compilation
; debuggable code
; rbp based frame
; fully interruptible
; Final local variable assignments
;
; V00 arg0 [V00 ] ( 1, 1 ) byref -> [rbp+0x10] do-not-enreg[]
; V01 arg1 [V01 ] ( 1, 1 ) int -> [rbp+0x18] do-not-enreg[]
; V02 arg2 [V02 ] ( 1, 1 ) int -> [rbp+0x20] do-not-enreg[]
; V03 loc0 [V03 ] ( 1, 1 ) int -> [rbp-0x0C] do-not-enreg[]
;# V04 OutArgs [V04 ] ( 1, 1 ) struct ( 0) [rsp+0x00] do-not-enreg[XS] addr-exposed "OutgoingArgSpace"
; V05 tmp1 [V05 ] ( 1, 1 ) int -> [rbp-0x10] do-not-enreg[] "impSpillStackEnsure"
;
; Lcl frame size = 8
G_M59963_IG01: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, nogc <-- Prolog IG
push rbp
push rdi
push rax
lea rbp, [rsp+0x10]
mov bword ptr [rbp+0x10], rcx
mov dword ptr [rbp+0x18], edx
mov dword ptr [rbp+0x20], r8d
;; size=19 bbWeight=1 PerfScore 6.50
G_M59963_IG02: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
nop
mov rcx, bword ptr [rbp+0x10]
; byrRegs +[rcx] <--- tracks a ByRef here. (we do not want this when operating with syncblocks)
mov edx, dword ptr [rbp+0x18]
mov eax, dword ptr [rbp+0x20]
lock
cmpxchg dword ptr [rcx], edx
mov dword ptr [rbp-0x10], eax
mov eax, dword ptr [rbp-0x10]
mov dword ptr [rbp-0x0C], eax
nop
mov eax, dword ptr [rbp-0x0C]
;; size=28 bbWeight=1 PerfScore 25.50
G_M59963_IG03: ; bbWeight=1, epilog, nogc, extend
add rsp, 8
pop rdi
pop rbp
ret
;; size=7 bbWeight=1 PerfScore 2.25
; Total bytes of code 54, prolog size 19, PerfScore 34.25, instruction count 22, allocated bytes for code 54 (MethodHash=599f15c4) for method System.Threading.Interlocked:CompareExchange(byref,int,int):int (MinOpts)
; ============================================================ CompareExchange(int*,int,int) ; Assembly listing for method System.Threading.Interlocked:CompareExchange(ulong,int,int):int (MinOpts)
; Emitting BLENDED_CODE for generic X64 - Windows
; MinOpts code
; NativeAOT compilation
; debuggable code
; rbp based frame
; fully interruptible
; Final local variable assignments
;
; V00 arg0 [V00 ] ( 1, 1 ) long -> [rbp+0x10] do-not-enreg[]
; V01 arg1 [V01 ] ( 1, 1 ) int -> [rbp+0x18] do-not-enreg[]
; V02 arg2 [V02 ] ( 1, 1 ) int -> [rbp+0x20] do-not-enreg[]
; V03 loc0 [V03 ] ( 1, 1 ) int -> [rbp-0x0C] do-not-enreg[]
;# V04 OutArgs [V04 ] ( 1, 1 ) struct ( 0) [rsp+0x00] do-not-enreg[XS] addr-exposed "OutgoingArgSpace"
; V05 tmp1 [V05 ] ( 1, 1 ) int -> [rbp-0x10] do-not-enreg[] "impSpillStackEnsure"
;
; Lcl frame size = 8
G_M23566_IG01: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, nogc <-- Prolog IG
push rbp
push rdi
push rax
lea rbp, [rsp+0x10]
mov qword ptr [rbp+0x10], rcx
mov dword ptr [rbp+0x18], edx
mov dword ptr [rbp+0x20], r8d
;; size=19 bbWeight=1 PerfScore 6.50
G_M23566_IG02: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
nop
mov rcx, qword ptr [rbp+0x10] <--- no ByRef. Just like we wanted.
mov edx, dword ptr [rbp+0x18]
mov eax, dword ptr [rbp+0x20]
lock
cmpxchg dword ptr [rcx], edx
mov dword ptr [rbp-0x10], eax
mov eax, dword ptr [rbp-0x10]
mov dword ptr [rbp-0x0C], eax
nop
mov eax, dword ptr [rbp-0x0C]
;; size=28 bbWeight=1 PerfScore 25.50
G_M23566_IG03: ; bbWeight=1, epilog, nogc, extend
add rsp, 8
pop rdi
pop rbp
ret
;; size=7 bbWeight=1 PerfScore 2.25
; Total bytes of code 54, prolog size 19, PerfScore 34.25, instruction count 22, allocated bytes for code 54 (MethodHash=7cdba3f1) for method System.Threading.Interlocked:CompareExchange(ulong,int,int):int (MinOpts)
; ============================================================ |
// The important part is avoiding `ref *location` in the unexpanded scenario, like | ||
// in a case when compiling the Debug flavor of the app. | ||
[Intrinsic] | ||
internal static unsafe int CompareExchange(int* location1, int value, int comparand) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also wonder why some of CompareExchange overloads are AggressiveInlining and some are not? Is that intentional or just does not matter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is needed for the ones that contain non-trivial amount of code. For the trivial ones, it is probably just copy&paste.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I will keep the new one without AgressiveInlining and existing ones unchanged.
It seems like in perf-interesting scenarios this will not matter either way.
Co-authored-by: Jan Kotas <[email protected]>
NOTE: since the failure that we are trying to fix happens while building tests with an ILC that was built using the toolset ILC, just merging this will not make the bug go away. The change needs to make it into toolset. I've verified locally that it works for the repro scenario that I have. (takes quite a few steps of building one ILC, patching toolset, deleting cached artifacts, tweaking ILC sources for more stress, building another ILC, building tests, rebuilding tests in a loop,.. ). |
I am seeing this for runtime/src/coreclr/jit/codegenarm64.cpp Lines 4100 to 4106 in 09d8171
|
oops, wrong link, fixed ^ |
I think if we hardcoded tracking a byref here, we'd need to adjust for the pointer-based case, but since it takes the arg type into account, I think it will do the right thing. It does the right thing on x64, but I did not check on arm64. |
Ah, you right, |
This is on arm64 (no LSE). I think we see here what we want to see: Interlocked:CompareExchange(byref,int,int) ; Assembly listing for method System.Threading.Interlocked:CompareExchange(byref,int,int):int (MinOpts)
; Emitting BLENDED_CODE for generic ARM64 - Windows
; MinOpts code
; NativeAOT compilation
; debuggable code
; fp based frame
; fully interruptible
; Final local variable assignments
;
; V00 arg0 [V00 ] ( 1, 1 ) byref -> [fp+0x28] do-not-enreg[]
; V01 arg1 [V01 ] ( 1, 1 ) int -> [fp+0x24] do-not-enreg[]
; V02 arg2 [V02 ] ( 1, 1 ) int -> [fp+0x20] do-not-enreg[]
; V03 loc0 [V03 ] ( 1, 1 ) int -> [fp+0x1C] do-not-enreg[]
;# V04 OutArgs [V04 ] ( 1, 1 ) struct ( 0) [sp+0x00] do-not-enreg[XS] addr-exposed "OutgoingArgSpace"
; V05 tmp1 [V05 ] ( 1, 1 ) int -> [fp+0x18] do-not-enreg[] "impSpillStackEnsure"
;
; Lcl frame size = 32
G_M59963_IG01: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, nogc <-- Prolog IG
stp fp, lr, [sp, #-0x30]!
mov fp, sp
str x0, [fp, #0x28] // [V00 arg0]
str w1, [fp, #0x24] // [V01 arg1]
str w2, [fp, #0x20] // [V02 arg2]
;; size=20 bbWeight=1 PerfScore 4.50
G_M59963_IG02: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
nop
ldr x0, [fp, #0x28] // [V00 arg0]
; byrRegs +[x0] <--- tracks a ByRef here.
ldr w1, [fp, #0x24] // [V01 arg1]
ldr w2, [fp, #0x20] // [V02 arg2]
;; size=16 bbWeight=1 PerfScore 6.50
G_M59963_IG03: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0001 {x0}, byref, isz
ldaxr w4, [x0]
cmp w4, w2
bne G_M59963_IG04
stlxr w3, w1, [x0]
cbnz w3, G_M59963_IG03
;; size=20 bbWeight=1 PerfScore 8.50
G_M59963_IG04: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0001 {x0}, byref
dmb ish
str w4, [fp, #0x18] // [V05 tmp1]
ldr w0, [fp, #0x18] // [V05 tmp1]
; byrRegs -[x0]
str w0, [fp, #0x1C] // [V03 loc0]
nop
ldr w0, [fp, #0x1C] // [V03 loc0]
;; size=24 bbWeight=1 PerfScore 16.50
G_M59963_IG05: ; bbWeight=1, epilog, nogc, extend
ldp fp, lr, [sp], #0x30
ret lr
;; size=8 bbWeight=1 PerfScore 2.00
; Total bytes of code 88, prolog size 20, PerfScore 38.00, instruction count 22, allocated bytes for code 88 (MethodHash=599f15c4) for method System.Threading.Interlocked:CompareExchange(byref,int,int):int (MinOpts)
; ============================================================ CompareExchange(int*,int,int) ; Assembly listing for method System.Threading.Interlocked:CompareExchange(ulong,int,int):int (MinOpts)
; Emitting BLENDED_CODE for generic ARM64 - Windows
; MinOpts code
; NativeAOT compilation
; debuggable code
; fp based frame
; fully interruptible
; Final local variable assignments
;
; V00 arg0 [V00 ] ( 1, 1 ) long -> [fp+0x28] do-not-enreg[]
; V01 arg1 [V01 ] ( 1, 1 ) int -> [fp+0x24] do-not-enreg[]
; V02 arg2 [V02 ] ( 1, 1 ) int -> [fp+0x20] do-not-enreg[]
; V03 loc0 [V03 ] ( 1, 1 ) int -> [fp+0x1C] do-not-enreg[]
;# V04 OutArgs [V04 ] ( 1, 1 ) struct ( 0) [sp+0x00] do-not-enreg[XS] addr-exposed "OutgoingArgSpace"
; V05 tmp1 [V05 ] ( 1, 1 ) int -> [fp+0x18] do-not-enreg[] "impSpillStackEnsure"
;
; Lcl frame size = 32
G_M23566_IG01: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, nogc <-- Prolog IG
stp fp, lr, [sp, #-0x30]!
mov fp, sp
str x0, [fp, #0x28] // [V00 arg0]
str w1, [fp, #0x24] // [V01 arg1]
str w2, [fp, #0x20] // [V02 arg2]
;; size=20 bbWeight=1 PerfScore 4.50
G_M23566_IG02: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
nop
ldr x0, [fp, #0x28] // [V00 arg0] <-- no ref tracking. Just as we want
ldr w1, [fp, #0x24] // [V01 arg1]
ldr w2, [fp, #0x20] // [V02 arg2]
;; size=16 bbWeight=1 PerfScore 6.50
G_M23566_IG03: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, isz
ldaxr w4, [x0]
cmp w4, w2
bne G_M23566_IG04
stlxr w3, w1, [x0]
cbnz w3, G_M23566_IG03
;; size=20 bbWeight=1 PerfScore 8.50
G_M23566_IG04: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
dmb ish
str w4, [fp, #0x18] // [V05 tmp1]
ldr w0, [fp, #0x18] // [V05 tmp1]
str w0, [fp, #0x1C] // [V03 loc0]
nop
ldr w0, [fp, #0x1C] // [V03 loc0]
;; size=24 bbWeight=1 PerfScore 16.50
G_M23566_IG05: ; bbWeight=1, epilog, nogc, extend
ldp fp, lr, [sp], #0x30
ret lr
;; size=8 bbWeight=1 PerfScore 2.00
; Total bytes of code 88, prolog size 20, PerfScore 38.00, instruction count 22, allocated bytes for code 88 (MethodHash=7cdba3f1) for method System.Threading.Interlocked:CompareExchange(ulong,int,int):int (MinOpts)
; ============================================================ |
Thanks!!! |
/backport to release/9.0 |
Started backporting to release/9.0: https://github.com/dotnet/runtime/actions/runs/10479543523 |
Fixes: #104340
Historically the object's header is not a part of an object, for the purpose of object marking and updating byrefs, and its location belongs to the previous object in the heap, if such exists. As a result, while it is ok to have a pointer to a syncblock of a pinned object, a byref pointing to the same location is unsafe (even if the object is pinned).
NativeAOT runtime uses pointers to operate with syncblocks, but the
Interlocked.CompareExchange
takes a byref. When the intrinsic is not expanded there is a possibility for the JIT to introduce transient temps holding corresponding byref and even make it reportable to GC, which is unsafe.To make sure the byref never comes to play, regardless if intrinsic is expanded or not, we introduce a pointer-based intrinsic, which NativeAOT can use when operating with object header bits.