Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[NativeAOT] Introduce pointer-based CompareExchange intrinsic and use operating with syncblock bits. #106703

Merged
merged 3 commits into from
Aug 20, 2024

Conversation

VSadov
Copy link
Member

@VSadov VSadov commented Aug 20, 2024

Fixes: #104340

Historically the object's header is not a part of an object, for the purpose of object marking and updating byrefs, and its location belongs to the previous object in the heap, if such exists. As a result, while it is ok to have a pointer to a syncblock of a pinned object, a byref pointing to the same location is unsafe (even if the object is pinned).

NativeAOT runtime uses pointers to operate with syncblocks, but the Interlocked.CompareExchange takes a byref. When the intrinsic is not expanded there is a possibility for the JIT to introduce transient temps holding corresponding byref and even make it reportable to GC, which is unsafe.

To make sure the byref never comes to play, regardless if intrinsic is expanded or not, we introduce a pointer-based intrinsic, which NativeAOT can use when operating with object header bits.

@jkotas
Copy link
Member

jkotas commented Aug 20, 2024

cc @dotnet/jit-contrib This adds pointer-based overload for CompareExchange. Could you please check that the JIT is going to handle it correctly?

@VSadov
Copy link
Member Author

VSadov commented Aug 20, 2024

@dotnet/jit-contrib Even though no JIT parts were touched, the change introduces a new JIT intrinsic for CompareExchange. It just that the the expansion is exactly the same as the existing one. I could not find any places that need to be updated in JIT code for this change, but want to be sure.

@VSadov
Copy link
Member Author

VSadov commented Aug 20, 2024

For CompareExchange intrinsic everything seems to be driven by the return type, which is the same int. And GC-liveness of the location argument depends on the type of the argument.

As a result, once the pointer-taking CompareExchange overload is introduced, JIT emits the same expansion for it as for byref-based one, modulo GC tracking of the ref. That is without any changes to the JIT.
And that is basically what we want.

Interlocked:CompareExchange(byref,int,int)

; Assembly listing for method System.Threading.Interlocked:CompareExchange(byref,int,int):int (MinOpts)
; Emitting BLENDED_CODE for generic X64 - Windows
; MinOpts code
; NativeAOT compilation
; debuggable code
; rbp based frame
; fully interruptible
; Final local variable assignments
;
;  V00 arg0         [V00    ] (  1,  1   )   byref  ->  [rbp+0x10]  do-not-enreg[]
;  V01 arg1         [V01    ] (  1,  1   )     int  ->  [rbp+0x18]  do-not-enreg[]
;  V02 arg2         [V02    ] (  1,  1   )     int  ->  [rbp+0x20]  do-not-enreg[]
;  V03 loc0         [V03    ] (  1,  1   )     int  ->  [rbp-0x0C]  do-not-enreg[]
;# V04 OutArgs      [V04    ] (  1,  1   )  struct ( 0) [rsp+0x00]  do-not-enreg[XS] addr-exposed "OutgoingArgSpace"
;  V05 tmp1         [V05    ] (  1,  1   )     int  ->  [rbp-0x10]  do-not-enreg[] "impSpillStackEnsure"
;
; Lcl frame size = 8

G_M59963_IG01:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, nogc <-- Prolog IG
       push     rbp
       push     rdi
       push     rax
       lea      rbp, [rsp+0x10]
       mov      bword ptr [rbp+0x10], rcx
       mov      dword ptr [rbp+0x18], edx
       mov      dword ptr [rbp+0x20], r8d
                                                ;; size=19 bbWeight=1 PerfScore 6.50
G_M59963_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
       nop
       mov      rcx, bword ptr [rbp+0x10]
                            ; byrRegs +[rcx]  <---  tracks a ByRef here. (we do not want this when operating with syncblocks)
       mov      edx, dword ptr [rbp+0x18]
       mov      eax, dword ptr [rbp+0x20]
       lock
       cmpxchg  dword ptr [rcx], edx
       mov      dword ptr [rbp-0x10], eax
       mov      eax, dword ptr [rbp-0x10]
       mov      dword ptr [rbp-0x0C], eax
       nop
       mov      eax, dword ptr [rbp-0x0C]
                                                ;; size=28 bbWeight=1 PerfScore 25.50
G_M59963_IG03:        ; bbWeight=1, epilog, nogc, extend
       add      rsp, 8
       pop      rdi
       pop      rbp
       ret
                                                ;; size=7 bbWeight=1 PerfScore 2.25

; Total bytes of code 54, prolog size 19, PerfScore 34.25, instruction count 22, allocated bytes for code 54 (MethodHash=599f15c4) for method System.Threading.Interlocked:CompareExchange(byref,int,int):int (MinOpts)
; ============================================================

CompareExchange(int*,int,int)

; Assembly listing for method System.Threading.Interlocked:CompareExchange(ulong,int,int):int (MinOpts)
; Emitting BLENDED_CODE for generic X64 - Windows
; MinOpts code
; NativeAOT compilation
; debuggable code
; rbp based frame
; fully interruptible
; Final local variable assignments
;
;  V00 arg0         [V00    ] (  1,  1   )    long  ->  [rbp+0x10]  do-not-enreg[]
;  V01 arg1         [V01    ] (  1,  1   )     int  ->  [rbp+0x18]  do-not-enreg[]
;  V02 arg2         [V02    ] (  1,  1   )     int  ->  [rbp+0x20]  do-not-enreg[]
;  V03 loc0         [V03    ] (  1,  1   )     int  ->  [rbp-0x0C]  do-not-enreg[]
;# V04 OutArgs      [V04    ] (  1,  1   )  struct ( 0) [rsp+0x00]  do-not-enreg[XS] addr-exposed "OutgoingArgSpace"
;  V05 tmp1         [V05    ] (  1,  1   )     int  ->  [rbp-0x10]  do-not-enreg[] "impSpillStackEnsure"
;
; Lcl frame size = 8

G_M23566_IG01:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, nogc <-- Prolog IG
       push     rbp
       push     rdi
       push     rax
       lea      rbp, [rsp+0x10]
       mov      qword ptr [rbp+0x10], rcx
       mov      dword ptr [rbp+0x18], edx
       mov      dword ptr [rbp+0x20], r8d
                                                ;; size=19 bbWeight=1 PerfScore 6.50
G_M23566_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
       nop
       mov      rcx, qword ptr [rbp+0x10]           <---  no ByRef. Just like we wanted.
       mov      edx, dword ptr [rbp+0x18]
       mov      eax, dword ptr [rbp+0x20]
       lock
       cmpxchg  dword ptr [rcx], edx
       mov      dword ptr [rbp-0x10], eax
       mov      eax, dword ptr [rbp-0x10]
       mov      dword ptr [rbp-0x0C], eax
       nop
       mov      eax, dword ptr [rbp-0x0C]
                                                ;; size=28 bbWeight=1 PerfScore 25.50
G_M23566_IG03:        ; bbWeight=1, epilog, nogc, extend
       add      rsp, 8
       pop      rdi
       pop      rbp
       ret
                                                ;; size=7 bbWeight=1 PerfScore 2.25

; Total bytes of code 54, prolog size 19, PerfScore 34.25, instruction count 22, allocated bytes for code 54 (MethodHash=7cdba3f1) for method System.Threading.Interlocked:CompareExchange(ulong,int,int):int (MinOpts)
; ============================================================

// The important part is avoiding `ref *location` in the unexpanded scenario, like
// in a case when compiling the Debug flavor of the app.
[Intrinsic]
internal static unsafe int CompareExchange(int* location1, int value, int comparand)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also wonder why some of CompareExchange overloads are AggressiveInlining and some are not? Is that intentional or just does not matter?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is needed for the ones that contain non-trivial amount of code. For the trivial ones, it is probably just copy&paste.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I will keep the new one without AgressiveInlining and existing ones unchanged.
It seems like in perf-interesting scenarios this will not matter either way.

@VSadov
Copy link
Member Author

VSadov commented Aug 20, 2024

NOTE: since the failure that we are trying to fix happens while building tests with an ILC that was built using the toolset ILC, just merging this will not make the bug go away. The change needs to make it into toolset.

I've verified locally that it works for the repro scenario that I have.

(takes quite a few steps of building one ILC, patching toolset, deleting cached artifacts, tweaking ILC sources for more stress, building another ILC, building tests, rebuilding tests in a loop,.. ).

@EgorBo
Copy link
Member

EgorBo commented Aug 20, 2024

I am seeing this for CompareExchange on arm64 (without LSE) in JIT:

// NOTE: `genConsumeAddress` marks the consumed register as not a GC pointer, as it assumes that the input
// registers
// die at the first instruction generated by the node. This is not the case for these atomics as the input
// registers are multiply-used. As such, we need to mark the addr register as containing a GC pointer until
// we are finished generating the code for this node.
gcInfo.gcMarkRegPtrVal(addrReg, addr->TypeGet());
is this just a redundant gc info in this case or correctness issue?

@EgorBo
Copy link
Member

EgorBo commented Aug 20, 2024

oops, wrong link, fixed ^

@VSadov
Copy link
Member Author

VSadov commented Aug 20, 2024

is this just a redundant gc info in this case or correctness issue?

I think addr->TypeGet() will be a pointer here.
Basically gcInfo.gcMarkRegPtrVal(addrReg, addr->TypeGet()); will be a no-op.

if we hardcoded tracking a byref here, we'd need to adjust for the pointer-based case, but since it takes the arg type into account, I think it will do the right thing.

It does the right thing on x64, but I did not check on arm64.

@EgorBo
Copy link
Member

EgorBo commented Aug 20, 2024

I think addr->TypeGet() will be a pointer here.

Ah, you right, gcMarkRegPtrVal handles that.

@VSadov
Copy link
Member Author

VSadov commented Aug 20, 2024

This is on arm64 (no LSE). I think we see here what we want to see:

Interlocked:CompareExchange(byref,int,int)

; Assembly listing for method System.Threading.Interlocked:CompareExchange(byref,int,int):int (MinOpts)
; Emitting BLENDED_CODE for generic ARM64 - Windows
; MinOpts code
; NativeAOT compilation
; debuggable code
; fp based frame
; fully interruptible
; Final local variable assignments
;
;  V00 arg0         [V00    ] (  1,  1   )   byref  ->  [fp+0x28]  do-not-enreg[]
;  V01 arg1         [V01    ] (  1,  1   )     int  ->  [fp+0x24]  do-not-enreg[]
;  V02 arg2         [V02    ] (  1,  1   )     int  ->  [fp+0x20]  do-not-enreg[]
;  V03 loc0         [V03    ] (  1,  1   )     int  ->  [fp+0x1C]  do-not-enreg[]
;# V04 OutArgs      [V04    ] (  1,  1   )  struct ( 0) [sp+0x00]  do-not-enreg[XS] addr-exposed "OutgoingArgSpace"
;  V05 tmp1         [V05    ] (  1,  1   )     int  ->  [fp+0x18]  do-not-enreg[] "impSpillStackEnsure"
;
; Lcl frame size = 32

G_M59963_IG01:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, nogc <-- Prolog IG
            stp     fp, lr, [sp, #-0x30]!
            mov     fp, sp
            str     x0, [fp, #0x28]     // [V00 arg0]
            str     w1, [fp, #0x24]     // [V01 arg1]
            str     w2, [fp, #0x20]     // [V02 arg2]
                                                ;; size=20 bbWeight=1 PerfScore 4.50
G_M59963_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
            nop
            ldr     x0, [fp, #0x28]     // [V00 arg0]
                             ; byrRegs +[x0]                               <---  tracks a ByRef here.
            ldr     w1, [fp, #0x24]     // [V01 arg1]
            ldr     w2, [fp, #0x20]     // [V02 arg2]
                                                ;; size=16 bbWeight=1 PerfScore 6.50
G_M59963_IG03:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0001 {x0}, byref, isz
            ldaxr   w4, [x0]
            cmp     w4, w2
            bne     G_M59963_IG04
            stlxr   w3, w1, [x0]
            cbnz    w3, G_M59963_IG03
                                                ;; size=20 bbWeight=1 PerfScore 8.50
G_M59963_IG04:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0001 {x0}, byref
            dmb     ish
            str     w4, [fp, #0x18]     // [V05 tmp1]
            ldr     w0, [fp, #0x18]     // [V05 tmp1]
                             ; byrRegs -[x0]
            str     w0, [fp, #0x1C]     // [V03 loc0]
            nop
            ldr     w0, [fp, #0x1C]     // [V03 loc0]
                                                ;; size=24 bbWeight=1 PerfScore 16.50
G_M59963_IG05:        ; bbWeight=1, epilog, nogc, extend
            ldp     fp, lr, [sp], #0x30
            ret     lr
                                                ;; size=8 bbWeight=1 PerfScore 2.00

; Total bytes of code 88, prolog size 20, PerfScore 38.00, instruction count 22, allocated bytes for code 88 (MethodHash=599f15c4) for method System.Threading.Interlocked:CompareExchange(byref,int,int):int (MinOpts)
; ============================================================

CompareExchange(int*,int,int)

; Assembly listing for method System.Threading.Interlocked:CompareExchange(ulong,int,int):int (MinOpts)
; Emitting BLENDED_CODE for generic ARM64 - Windows
; MinOpts code
; NativeAOT compilation
; debuggable code
; fp based frame
; fully interruptible
; Final local variable assignments
;
;  V00 arg0         [V00    ] (  1,  1   )    long  ->  [fp+0x28]  do-not-enreg[]
;  V01 arg1         [V01    ] (  1,  1   )     int  ->  [fp+0x24]  do-not-enreg[]
;  V02 arg2         [V02    ] (  1,  1   )     int  ->  [fp+0x20]  do-not-enreg[]
;  V03 loc0         [V03    ] (  1,  1   )     int  ->  [fp+0x1C]  do-not-enreg[]
;# V04 OutArgs      [V04    ] (  1,  1   )  struct ( 0) [sp+0x00]  do-not-enreg[XS] addr-exposed "OutgoingArgSpace"
;  V05 tmp1         [V05    ] (  1,  1   )     int  ->  [fp+0x18]  do-not-enreg[] "impSpillStackEnsure"
;
; Lcl frame size = 32

G_M23566_IG01:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, nogc <-- Prolog IG
            stp     fp, lr, [sp, #-0x30]!
            mov     fp, sp
            str     x0, [fp, #0x28]     // [V00 arg0]
            str     w1, [fp, #0x24]     // [V01 arg1]
            str     w2, [fp, #0x20]     // [V02 arg2]
                                                ;; size=20 bbWeight=1 PerfScore 4.50
G_M23566_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
            nop
            ldr     x0, [fp, #0x28]     // [V00 arg0]     <-- no ref tracking. Just as we want
            ldr     w1, [fp, #0x24]     // [V01 arg1]
            ldr     w2, [fp, #0x20]     // [V02 arg2]
                                                ;; size=16 bbWeight=1 PerfScore 6.50
G_M23566_IG03:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, isz
            ldaxr   w4, [x0]
            cmp     w4, w2
            bne     G_M23566_IG04
            stlxr   w3, w1, [x0]
            cbnz    w3, G_M23566_IG03
                                                ;; size=20 bbWeight=1 PerfScore 8.50
G_M23566_IG04:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
            dmb     ish
            str     w4, [fp, #0x18]     // [V05 tmp1]
            ldr     w0, [fp, #0x18]     // [V05 tmp1]
            str     w0, [fp, #0x1C]     // [V03 loc0]
            nop
            ldr     w0, [fp, #0x1C]     // [V03 loc0]
                                                ;; size=24 bbWeight=1 PerfScore 16.50
G_M23566_IG05:        ; bbWeight=1, epilog, nogc, extend
            ldp     fp, lr, [sp], #0x30
            ret     lr
                                                ;; size=8 bbWeight=1 PerfScore 2.00

; Total bytes of code 88, prolog size 20, PerfScore 38.00, instruction count 22, allocated bytes for code 88 (MethodHash=7cdba3f1) for method System.Threading.Interlocked:CompareExchange(ulong,int,int):int (MinOpts)
; ============================================================

@VSadov
Copy link
Member Author

VSadov commented Aug 20, 2024

Thanks!!!

@VSadov VSadov merged commit ed68a64 into dotnet:main Aug 20, 2024
88 checks passed
@VSadov VSadov deleted the unlockAsrt2 branch August 20, 2024 21:25
@VSadov
Copy link
Member Author

VSadov commented Aug 20, 2024

/backport to release/9.0

Copy link
Contributor

Started backporting to release/9.0: https://github.com/dotnet/runtime/actions/runs/10479543523

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[NativeAOT] Object synchronization method was called from an unsynchronized block of code.
3 participants