Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error in Int16 variables when running in M1 macs #66720

Closed
agallero opened this issue Mar 16, 2022 · 15 comments · Fixed by #68108
Closed

Error in Int16 variables when running in M1 macs #66720

agallero opened this issue Mar 16, 2022 · 15 comments · Fixed by #68108
Assignees
Labels
arch-arm64 area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI bug os-mac-os-x macOS aka OSX
Milestone

Comments

@agallero
Copy link

Version Used:
.NET 6.0.1 - Apple M1 Pro
.NET SDK 6.0.201

Steps to Reproduce:

  1. Open the attached project uint-bug.zip in an "Apple Silicon" machine. I've used a 16" Macbook pro, M1 pro chip.
  2. dotnet run --configuration Debug

Expected Behavior:
It should not have any errors. We are just assigning an Int16 to an Int16, so there is no possibility to overflow.

Actual Behavior:
Unhandled exception. System.AggregateException: One or more errors occurred. (Arithmetic operation resulted in an overflow.) (Arithmetic operation resulted in an overflow.) (Arithmetic operation resulted in an overflow.) (Arithmetic operation resulted in an overflow.) (Arithmetic operation resulted in an overflow.) (Arithmetic operation resulted in an overflow.) (Arithmetic operation resulted in an overflow.) (Arithmetic operation resulted in an overflow.) (Arithmetic operation resulted in an overflow.) (Arithmetic operation resulted in an overflow.) (Arithmetic operation resulted in an overflow.)....

Description
This bug happens in Apple M1 chips, not in intel machines. I am not sure about iphone chips or other arm targets.
It is happening in a big application without threads, but it was difficult to reduce to a simple program that could be analyzed. To be able to reproduce it always, I had to add a parallel loop to make the effect evident. But it also happens in the big app wihtout threads at all. Also in this example, if you disable the parallel loop and change it to a single loop, I can still reproduce while debugging but not from the command line.

It looks like the compiler passes the full int32 value of an int16 variable (filling the upper bits with garbage, which causes the overflow).

If you change the "checked" line to "unchecked, then you get lines like this:0x2672 = 0x80F62672

0x37 = 0x80F60037
0x396 = 0x80F60396

Which seem to indicate the int16 number is just the original number with garbage in the upper bits)

When you debug the code, you see:

image

So the value it is trying to set is 0x81080EEB
If you move up in the stack trace, in the clone method we pass 0xEEB:

image

But in the class constructor, aVar4 changed from 0xEEB to 0x8180EEB:
image

I can reproduce it 100% in this machine, but I don't know if it will happen in others, even other arm macs. I hope this information is sufficient to figure out what is going on, but any extra information please ask.

uint-bug.zip

@dotnet-issue-labeler dotnet-issue-labeler bot added Area-Compilers untriaged New issue has not been triaged by the area owner labels Mar 16, 2022
@jcouv
Copy link
Member

jcouv commented Mar 16, 2022

This is unlikely to be a compiler issue, since the compiler emits IL (which is independent from the specific platform). I will move this to the runtime repo. If it turns out the IL emitted is wrong, then it would be a compiler issue.

@jcouv jcouv transferred this issue from dotnet/roslyn Mar 16, 2022
@agallero
Copy link
Author

Thanks for letting me know, and sorry if it is posted in the wrong place. I guess it is likely indeed from the ARM Jitter, but I thought (incorrectly, as it seems) this group covered IL generation and conversion from IL to actual code. If that is not the case, can you tell me where I should report it?

@danmoseley danmoseley added area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI and removed Area-Compilers labels Mar 16, 2022
@danmoseley
Copy link
Member

@agallero no problem at all, we have a lot of repos and it's easy to transfer. this is now in the right place and labeled for that team to find it. ps welcome to our repo.

@ghost
Copy link

ghost commented Mar 16, 2022

Tagging subscribers to this area: @JulieLeeMSFT
See info in area-owners.md if you want to be subscribed.

Issue Details

Version Used:
.NET 6.0.1 - Apple M1 Pro
.NET SDK 6.0.201

Steps to Reproduce:

  1. Open the attached project uint-bug.zip in an "Apple Silicon" machine. I've used a 16" Macbook pro, M1 pro chip.
  2. dotnet run --configuration Debug

Expected Behavior:
It should not have any errors. We are just assigning an Int16 to an Int16, so there is no possibility to overflow.

Actual Behavior:
Unhandled exception. System.AggregateException: One or more errors occurred. (Arithmetic operation resulted in an overflow.) (Arithmetic operation resulted in an overflow.) (Arithmetic operation resulted in an overflow.) (Arithmetic operation resulted in an overflow.) (Arithmetic operation resulted in an overflow.) (Arithmetic operation resulted in an overflow.) (Arithmetic operation resulted in an overflow.) (Arithmetic operation resulted in an overflow.) (Arithmetic operation resulted in an overflow.) (Arithmetic operation resulted in an overflow.) (Arithmetic operation resulted in an overflow.)....

Description
This bug happens in Apple M1 chips, not in intel machines. I am not sure about iphone chips or other arm targets.
It is happening in a big application without threads, but it was difficult to reduce to a simple program that could be analyzed. To be able to reproduce it always, I had to add a parallel loop to make the effect evident. But it also happens in the big app wihtout threads at all. Also in this example, if you disable the parallel loop and change it to a single loop, I can still reproduce while debugging but not from the command line.

It looks like the compiler passes the full int32 value of an int16 variable (filling the upper bits with garbage, which causes the overflow).

If you change the "checked" line to "unchecked, then you get lines like this:0x2672 = 0x80F62672

0x37 = 0x80F60037
0x396 = 0x80F60396

Which seem to indicate the int16 number is just the original number with garbage in the upper bits)

When you debug the code, you see:

image

So the value it is trying to set is 0x81080EEB
If you move up in the stack trace, in the clone method we pass 0xEEB:

image

But in the class constructor, aVar4 changed from 0xEEB to 0x8180EEB:
image

I can reproduce it 100% in this machine, but I don't know if it will happen in others, even other arm macs. I hope this information is sufficient to figure out what is going on, but any extra information please ask.

uint-bug.zip

Author: agallero
Assignees: -
Labels:

area-CodeGen-coreclr, untriaged

Milestone: -

@am11
Copy link
Member

am11 commented Mar 17, 2022

Reproducible with .NET 7 as well (both, Debug and Release configurations).

Reduced repro
using System;
using System.Threading.Tasks;

internal class BaseClass
{
    internal Int16 X;
    internal BaseClass(int x)
    {
        checked
        {
            X = (Int16) x;
        }
    }
}

internal class ChildClass : BaseClass
{
    internal ChildClass(int a, int b, int c, int d, bool e, bool f, int i, int last)
        : base(last) {}
    
    internal ChildClass Clone() => new ChildClass(0, 0, 0, 0, false, false, 0, X);
}

internal static class Program
{
    static void Main()
    {
        Parallel.For(0, 100, _ =>
        {
            var tk = new ChildClass(0, 0, 0, 0, false, false, 0, 0);
            tk = tk.Clone();
        });

        Console.WriteLine("finished");
    }
}
dotnet7 build && lldb dotnet7 bin/Debug/net7.0/uint-bug.dll
Added Microsoft public symbol server
Added symbol directory path: /usr/local/share/dotnet/shared/Microsoft.NETCore.App/6.0.2
Added symbol directory path: /usr/local/share/dotnet/packs/Microsoft.NETCore.App.Host.osx-arm64/6.0.2/runtimes/osx-arm64/native
(lldb) target create "/Users/am11/.dotnet7/dotnet"
Current executable set to '/Users/am11/.dotnet7/dotnet' (arm64).
(lldb) settings set -- target.run-args  "bin/Debug/net7.0/uint-bug.dll"
(lldb) r
Process 50462 launched: '/Users/am11/.dotnet7/dotnet' (arm64)
Unhandled exception. System.AggregateException: One or more errors occurred. (Arithmetic operation resulted in an overflow.) (Arithmetic operation resulted in an overflow.) (Arithmetic operation resulted in an overflow.) (Arithmetic operation resulted in an overflow.) (Arithmetic operation resulted in an overflow.) (Arithmetic operation resulted in an overflow.) (Arithmetic operation resulted in an overflow.) (Arithmetic operation resulted in an overflow.) (Arithmetic operation resulted in an overflow.) (Arithmetic operation resulted in an overflow.) (Arithmetic operation resulted in an overflow.)
 ---> System.OverflowException: Arithmetic operation resulted in an overflow.
   at BaseClass..ctor(Int32 first) in /Users/am11/Downloads/uint-bug/Program.cs:line 11
   at ChildClass..ctor(Int32 a, Int32 b, Int32 c, Int32 d, Boolean e, Boolean f, Int32 i, Int32 last) in /Users/am11/Downloads/uint-bug/Program.cs:line 19
   at ChildClass.Clone() in /Users/am11/Downloads/uint-bug/Program.cs:line 21
   at Program.<>c.<Main>b__0_0(Int32 _) in /Users/am11/Downloads/uint-bug/Program.cs:line 31
   at System.Threading.Tasks.Parallel.<>c__DisplayClass19_0`1.<ForWorker>b__1(RangeWorker& currentWorker, Int32 timeout, Boolean& replicationDelegateYieldedBeforeCompletion)
--- End of stack trace from previous location ---
   at System.Threading.Tasks.Parallel.<>c__DisplayClass19_0`1.<ForWorker>b__1(RangeWorker& currentWorker, Int32 timeout, Boolean& replicationDelegateYieldedBeforeCompletion)
   at System.Threading.Tasks.TaskReplicator.Replica.Execute()
   --- End of inner exception stack trace ---
   at System.Threading.Tasks.TaskReplicator.Run[TState](ReplicatableUserAction`1 action, ParallelOptions options, Boolean stopOnFirstFailure)
   at System.Threading.Tasks.Parallel.ForWorker[TLocal](Int32 fromInclusive, Int32 toExclusive, ParallelOptions parallelOptions, Action`1 body, Action`2 bodyWithState, Func`4 bodyWithLocal, Func`1 localInit, Action`1 localFinally)
--- End of stack trace from previous location ---
   at System.Threading.Tasks.Parallel.ForWorker[TLocal](Int32 fromInclusive, Int32 toExclusive, ParallelOptions parallelOptions, Action`1 body, Action`2 bodyWithState, Func`4 bodyWithLocal, Func`1 localInit, Action`1 localFinally)
   at System.Threading.Tasks.Parallel.For(Int32 fromInclusive, Int32 toExclusive, Action`1 body)
   at Program.Main() in /Users/am11/Downloads/uint-bug/Program.cs:line 28
 ---> (Inner Exception #1) System.OverflowException: Arithmetic operation resulted in an overflow.
   at BaseClass..ctor(Int32 first) in /Users/am11/Downloads/uint-bug/Program.cs:line 11
   at ChildClass..ctor(Int32 a, Int32 b, Int32 c, Int32 d, Boolean e, Boolean f, Int32 i, Int32 last) in /Users/am11/Downloads/uint-bug/Program.cs:line 19
   at ChildClass.Clone() in /Users/am11/Downloads/uint-bug/Program.cs:line 21
   at Program.<>c.<Main>b__0_0(Int32 _) in /Users/am11/Downloads/uint-bug/Program.cs:line 31
   at System.Threading.Tasks.Parallel.<>c__DisplayClass19_0`1.<ForWorker>b__1(RangeWorker& currentWorker, Int32 timeout, Boolean& replicationDelegateYieldedBeforeCompletion)
--- End of stack trace from previous location ---
   at System.Threading.Tasks.Parallel.<>c__DisplayClass19_0`1.<ForWorker>b__1(RangeWorker& currentWorker, Int32 timeout, Boolean& replicationDelegateYieldedBeforeCompletion)
   at System.Threading.Tasks.TaskReplicator.Replica.Execute()<---

 ---> (Inner Exception #2) System.OverflowException: Arithmetic operation resulted in an overflow.
   at BaseClass..ctor(Int32 first) in /Users/am11/Downloads/uint-bug/Program.cs:line 11
   at ChildClass..ctor(Int32 a, Int32 b, Int32 c, Int32 d, Boolean e, Boolean f, Int32 i, Int32 last) in /Users/am11/Downloads/uint-bug/Program.cs:line 19
   at ChildClass.Clone() in /Users/am11/Downloads/uint-bug/Program.cs:line 21
   at Program.<>c.<Main>b__0_0(Int32 _) in /Users/am11/Downloads/uint-bug/Program.cs:line 31
   at System.Threading.Tasks.Parallel.<>c__DisplayClass19_0`1.<ForWorker>b__1(RangeWorker& currentWorker, Int32 timeout, Boolean& replicationDelegateYieldedBeforeCompletion)
--- End of stack trace from previous location ---
   at System.Threading.Tasks.Parallel.<>c__DisplayClass19_0`1.<ForWorker>b__1(RangeWorker& currentWorker, Int32 timeout, Boolean& replicationDelegateYieldedBeforeCompletion)
   at System.Threading.Tasks.TaskReplicator.Replica.Execute()<---

 ---> (Inner Exception #3) System.OverflowException: Arithmetic operation resulted in an overflow.
   at BaseClass..ctor(Int32 first) in /Users/am11/Downloads/uint-bug/Program.cs:line 11
   at ChildClass..ctor(Int32 a, Int32 b, Int32 c, Int32 d, Boolean e, Boolean f, Int32 i, Int32 last) in /Users/am11/Downloads/uint-bug/Program.cs:line 19
   at ChildClass.Clone() in /Users/am11/Downloads/uint-bug/Program.cs:line 21
   at Program.<>c.<Main>b__0_0(Int32 _) in /Users/am11/Downloads/uint-bug/Program.cs:line 31
   at System.Threading.Tasks.Parallel.<>c__DisplayClass19_0`1.<ForWorker>b__1(RangeWorker& currentWorker, Int32 timeout, Boolean& replicationDelegateYieldedBeforeCompletion)
--- End of stack trace from previous location ---
   at System.Threading.Tasks.Parallel.<>c__DisplayClass19_0`1.<ForWorker>b__1(RangeWorker& currentWorker, Int32 timeout, Boolean& replicationDelegateYieldedBeforeCompletion)
   at System.Threading.Tasks.TaskReplicator.Replica.Execute()<---

 ---> (Inner Exception #4) System.OverflowException: Arithmetic operation resulted in an overflow.
   at BaseClass..ctor(Int32 first) in /Users/am11/Downloads/uint-bug/Program.cs:line 11
   at ChildClass..ctor(Int32 a, Int32 b, Int32 c, Int32 d, Boolean e, Boolean f, Int32 i, Int32 last) in /Users/am11/Downloads/uint-bug/Program.cs:line 19
   at ChildClass.Clone() in /Users/am11/Downloads/uint-bug/Program.cs:line 21
   at Program.<>c.<Main>b__0_0(Int32 _) in /Users/am11/Downloads/uint-bug/Program.cs:line 31
   at System.Threading.Tasks.Parallel.<>c__DisplayClass19_0`1.<ForWorker>b__1(RangeWorker& currentWorker, Int32 timeout, Boolean& replicationDelegateYieldedBeforeCompletion)
--- End of stack trace from previous location ---
   at System.Threading.Tasks.Parallel.<>c__DisplayClass19_0`1.<ForWorker>b__1(RangeWorker& currentWorker, Int32 timeout, Boolean& replicationDelegateYieldedBeforeCompletion)
   at System.Threading.Tasks.TaskReplicator.Replica.Execute()<---

 ---> (Inner Exception #5) System.OverflowException: Arithmetic operation resulted in an overflow.
   at BaseClass..ctor(Int32 first) in /Users/am11/Downloads/uint-bug/Program.cs:line 11
   at ChildClass..ctor(Int32 a, Int32 b, Int32 c, Int32 d, Boolean e, Boolean f, Int32 i, Int32 last) in /Users/am11/Downloads/uint-bug/Program.cs:line 19
   at ChildClass.Clone() in /Users/am11/Downloads/uint-bug/Program.cs:line 21
   at Program.<>c.<Main>b__0_0(Int32 _) in /Users/am11/Downloads/uint-bug/Program.cs:line 31
   at System.Threading.Tasks.Parallel.<>c__DisplayClass19_0`1.<ForWorker>b__1(RangeWorker& currentWorker, Int32 timeout, Boolean& replicationDelegateYieldedBeforeCompletion)
--- End of stack trace from previous location ---
   at System.Threading.Tasks.Parallel.<>c__DisplayClass19_0`1.<ForWorker>b__1(RangeWorker& currentWorker, Int32 timeout, Boolean& replicationDelegateYieldedBeforeCompletion)
   at System.Threading.Tasks.TaskReplicator.Replica.Execute()<---

 ---> (Inner Exception #6) System.OverflowException: Arithmetic operation resulted in an overflow.
   at BaseClass..ctor(Int32 first) in /Users/am11/Downloads/uint-bug/Program.cs:line 11
   at ChildClass..ctor(Int32 a, Int32 b, Int32 c, Int32 d, Boolean e, Boolean f, Int32 i, Int32 last) in /Users/am11/Downloads/uint-bug/Program.cs:line 19
   at ChildClass.Clone() in /Users/am11/Downloads/uint-bug/Program.cs:line 21
   at Program.<>c.<Main>b__0_0(Int32 _) in /Users/am11/Downloads/uint-bug/Program.cs:line 31
   at System.Threading.Tasks.Parallel.<>c__DisplayClass19_0`1.<ForWorker>b__1(RangeWorker& currentWorker, Int32 timeout, Boolean& replicationDelegateYieldedBeforeCompletion)
--- End of stack trace from previous location ---
   at System.Threading.Tasks.Parallel.<>c__DisplayClass19_0`1.<ForWorker>b__1(RangeWorker& currentWorker, Int32 timeout, Boolean& replicationDelegateYieldedBeforeCompletion)
   at System.Threading.Tasks.TaskReplicator.Replica.Execute()<---

 ---> (Inner Exception #7) System.OverflowException: Arithmetic operation resulted in an overflow.
   at BaseClass..ctor(Int32 first) in /Users/am11/Downloads/uint-bug/Program.cs:line 11
   at ChildClass..ctor(Int32 a, Int32 b, Int32 c, Int32 d, Boolean e, Boolean f, Int32 i, Int32 last) in /Users/am11/Downloads/uint-bug/Program.cs:line 19
   at ChildClass.Clone() in /Users/am11/Downloads/uint-bug/Program.cs:line 21
   at Program.<>c.<Main>b__0_0(Int32 _) in /Users/am11/Downloads/uint-bug/Program.cs:line 31
   at System.Threading.Tasks.Parallel.<>c__DisplayClass19_0`1.<ForWorker>b__1(RangeWorker& currentWorker, Int32 timeout, Boolean& replicationDelegateYieldedBeforeCompletion)
--- End of stack trace from previous location ---
   at System.Threading.Tasks.Parallel.<>c__DisplayClass19_0`1.<ForWorker>b__1(RangeWorker& currentWorker, Int32 timeout, Boolean& replicationDelegateYieldedBeforeCompletion)
   at System.Threading.Tasks.TaskReplicator.Replica.Execute()<---

 ---> (Inner Exception #8) System.OverflowException: Arithmetic operation resulted in an overflow.
   at BaseClass..ctor(Int32 first) in /Users/am11/Downloads/uint-bug/Program.cs:line 11
   at ChildClass..ctor(Int32 a, Int32 b, Int32 c, Int32 d, Boolean e, Boolean f, Int32 i, Int32 last) in /Users/am11/Downloads/uint-bug/Program.cs:line 19
   at ChildClass.Clone() in /Users/am11/Downloads/uint-bug/Program.cs:line 21
   at Program.<>c.<Main>b__0_0(Int32 _) in /Users/am11/Downloads/uint-bug/Program.cs:line 31
   at System.Threading.Tasks.Parallel.<>c__DisplayClass19_0`1.<ForWorker>b__1(RangeWorker& currentWorker, Int32 timeout, Boolean& replicationDelegateYieldedBeforeCompletion)
--- End of stack trace from previous location ---
   at System.Threading.Tasks.Parallel.<>c__DisplayClass19_0`1.<ForWorker>b__1(RangeWorker& currentWorker, Int32 timeout, Boolean& replicationDelegateYieldedBeforeCompletion)
   at System.Threading.Tasks.TaskReplicator.Replica.Execute()<---

 ---> (Inner Exception #9) System.OverflowException: Arithmetic operation resulted in an overflow.
   at BaseClass..ctor(Int32 first) in /Users/am11/Downloads/uint-bug/Program.cs:line 11
   at ChildClass..ctor(Int32 a, Int32 b, Int32 c, Int32 d, Boolean e, Boolean f, Int32 i, Int32 last) in /Users/am11/Downloads/uint-bug/Program.cs:line 19
   at ChildClass.Clone() in /Users/am11/Downloads/uint-bug/Program.cs:line 21
   at Program.<>c.<Main>b__0_0(Int32 _) in /Users/am11/Downloads/uint-bug/Program.cs:line 31
   at System.Threading.Tasks.Parallel.<>c__DisplayClass19_0`1.<ForWorker>b__1(RangeWorker& currentWorker, Int32 timeout, Boolean& replicationDelegateYieldedBeforeCompletion)
--- End of stack trace from previous location ---
   at System.Threading.Tasks.Parallel.<>c__DisplayClass19_0`1.<ForWorker>b__1(RangeWorker& currentWorker, Int32 timeout, Boolean& replicationDelegateYieldedBeforeCompletion)
   at System.Threading.Tasks.TaskReplicator.Replica.Execute()<---

 ---> (Inner Exception #10) System.OverflowException: Arithmetic operation resulted in an overflow.
   at BaseClass..ctor(Int32 first) in /Users/am11/Downloads/uint-bug/Program.cs:line 11
   at ChildClass..ctor(Int32 a, Int32 b, Int32 c, Int32 d, Boolean e, Boolean f, Int32 i, Int32 last) in /Users/am11/Downloads/uint-bug/Program.cs:line 19
   at ChildClass.Clone() in /Users/am11/Downloads/uint-bug/Program.cs:line 21
   at Program.<>c.<Main>b__0_0(Int32 _) in /Users/am11/Downloads/uint-bug/Program.cs:line 31
   at System.Threading.Tasks.Parallel.<>c__DisplayClass19_0`1.<ForWorker>b__1(RangeWorker& currentWorker, Int32 timeout, Boolean& replicationDelegateYieldedBeforeCompletion)
--- End of stack trace from previous location ---
   at System.Threading.Tasks.Parallel.<>c__DisplayClass19_0`1.<ForWorker>b__1(RangeWorker& currentWorker, Int32 timeout, Boolean& replicationDelegateYieldedBeforeCompletion)
   at System.Threading.Tasks.TaskReplicator.Replica.Execute()<---

Process 50462 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGABRT
    frame #0: 0x00000001a96f99b8 libsystem_kernel.dylib`__pthread_kill + 8
libsystem_kernel.dylib`__pthread_kill:
->  0x1a96f99b8 <+8>:  b.lo   0x1a96f99d8               ; <+40>
    0x1a96f99bc <+12>: pacibsp 
    0x1a96f99c0 <+16>: stp    x29, x30, [sp, #-0x10]!
    0x1a96f99c4 <+20>: mov    x29, sp
Target 0: (dotnet) stopped.
(lldb) clrthreads
ThreadCount:      14
UnstartedThread:  0
BackgroundThread: 13
PendingThread:    0
DeadThread:       0
Hosted Runtime:   no
                                                                                                            Lock  
 DBG   ID     OSID ThreadOBJ           State GC Mode     GC Alloc Context                  Domain           Count Apt Exception
   1    1    80741 000000010380AA00    20020 Preemptive  0000000110075990:0000000110077070 0000000101810400 -00001 Ukn System.AggregateException 0000000110030d50 (nested exceptions)
   7    2    80758 0000000103813400    21220 Preemptive  0000000000000000:0000000000000000 0000000101810400 -00001 Ukn (Finalizer) 
   8    3    80759 0000000103816200    21220 Preemptive  0000000000000000:0000000000000000 0000000101810400 -00001 Ukn 
   9    4    8075a 000000010182B200  1021220 Preemptive  0000000110017520:0000000110018940 0000000101810400 -00001 Ukn (Threadpool Worker) 
  10    5    8075b 000000010182C200  3021220 Preemptive  00000001100191C0:000000011001A940 0000000101810400 -00001 Ukn (Threadpool Worker) 
  11    6    8075c 0000000107008C00  1021220 Preemptive  000000011001B3A8:000000011001C940 0000000101810400 -00001 Ukn (Threadpool Worker) 
  12    7    8075d 0000000101836600  1021220 Preemptive  000000011001D770:000000011001E940 0000000101810400 -00001 Ukn (Threadpool Worker) 
  13    8    8075e 0000000101009E00  1021220 Preemptive  0000000110033B40:0000000110034940 0000000101810400 -00001 Ukn (Threadpool Worker) 
  14    9    8075f 000000010780B800  1021220 Preemptive  000000011002F520:0000000110030940 0000000101810400 -00001 Ukn (Threadpool Worker) 
  15   10    80760 000000010780D200  1021220 Preemptive  0000000110023360:0000000110024940 0000000101810400 -00001 Ukn (Threadpool Worker) 
  16   11    80761 0000000101826400  1021220 Preemptive  0000000110025368:0000000110026940 0000000101810400 -00001 Ukn (Threadpool Worker) 
  17   12    80762 0000000130808200  1021220 Preemptive  00000001100274F8:0000000110028940 0000000101810400 -00001 Ukn (Threadpool Worker) 
  18   13    80763 000000010780EC00  1021220 Preemptive  0000000110029378:000000011002A940 0000000101810400 -00001 Ukn (Threadpool Worker) 
  19   14    80764 000000010700DC00  1021220 Preemptive  000000011002B2E0:000000011002C940 0000000101810400 -00001 Ukn (Threadpool Worker) 
(lldb) clrstack -f
OS Thread Id: 0x80741 (1)
        Child SP               IP Call Site
000000016FDFC6F0 00000001A96F99B8 libsystem_kernel.dylib!__pthread_kill + 8
000000016FDFC6F0 00000001A972CEB0 libsystem_pthread.dylib!pthread_kill + 288
000000016FDFC720 00000001A966A314 libsystem_c.dylib!abort + 164
000000016FDFC760 0000000102696B3C libcoreclr.dylib!PAL_SetShutdownCallback
000000016FDFC780 0000000102696A4C libcoreclr.dylib!TerminateProcess
000000016FDFC7B0 00000001028B7944 libcoreclr.dylib!UnwindManagedExceptionPass1(PAL_SEHException&, _CONTEXT*) + 968
000000016FDFCCF0 00000001028B798C libcoreclr.dylib!DispatchManagedException(PAL_SEHException&, bool) + 68
000000016FDFD0B0 000000010281BA50 libcoreclr.dylib!IL_Throw(Object*) + 528
000000016FDFD118                  [HelperMethodFrame: 000000016fdfd118] 
000000016FDFD260 0000000280DBE4C0 System.Private.CoreLib.dll!System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() + 56 [/_/src/libraries/System.Private.CoreLib/src/System/Runtime/ExceptionServices/ExceptionDispatchInfo.cs @ 49]
000000016FDFD280 000000028042842C System.Private.CoreLib.dll!System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw(System.Exception) + 44 [/_/src/libraries/System.Private.CoreLib/src/System/Runtime/ExceptionServices/ExceptionDispatchInfo.cs @ 56]
000000016FDFD290 0000000280E6BAB8 System.Threading.Tasks.Parallel.dll!System.Threading.Tasks.Parallel.ForWorker[[System.__Canon, System.Private.CoreLib]](Int32, Int32, System.Threading.Tasks.ParallelOptions, System.Action`1<Int32>, System.Action`2<Int32,System.Threading.Tasks.ParallelLoopState>, System.Func`4<Int32,System.Threading.Tasks.ParallelLoopState,System.__Canon,System.__Canon>, System.Func`1<System.__Canon>, System.Action`1<System.__Canon>) + 1464 [/_/src/libraries/System.Threading.Tasks.Parallel/src/System/Threading/Tasks/Parallel.cs @ 1099]
000000016FDFE418                  [HelperMethodFrame: 000000016fdfe418] 
000000016FDFE560 0000000280E6B010 System.Threading.Tasks.Parallel.dll!System.Threading.Tasks.TaskReplicator.Run[[System.Threading.Tasks.RangeWorker, System.Threading.Tasks.Parallel]](ReplicatableUserAction`1<System.Threading.Tasks.RangeWorker>, System.Threading.Tasks.ParallelOptions, Boolean) + 624 [/_/src/libraries/System.Threading.Tasks.Parallel/src/System/Threading/Tasks/TaskReplicator.cs @ 155]
000000016FDFE5D0 0000000280E6B7D0 System.Threading.Tasks.Parallel.dll!System.Threading.Tasks.Parallel.ForWorker[[System.__Canon, System.Private.CoreLib]](Int32, Int32, System.Threading.Tasks.ParallelOptions, System.Action`1<Int32>, System.Action`2<Int32,System.Threading.Tasks.ParallelLoopState>, System.Func`4<Int32,System.Threading.Tasks.ParallelLoopState,System.__Canon,System.__Canon>, System.Func`1<System.__Canon>, System.Action`1<System.__Canon>) + 720 [/_/src/libraries/System.Threading.Tasks.Parallel/src/System/Threading/Tasks/Parallel.cs @ 953]
000000016FDFE6A0 0000000280E6459C System.Threading.Tasks.Parallel.dll!System.Threading.Tasks.Parallel.For(Int32, Int32, System.Action`1<Int32>) + 124 [/_/src/libraries/System.Threading.Tasks.Parallel/src/System/Threading/Tasks/Parallel.cs @ 382]
000000016FDFE6E0 0000000280D91AA8 uint-bug.dll!Program.Main() + 272 [/Users/am11/Downloads/uint-bug/Program.cs @ 28]
FFFFFFFFFFFFFFFF 000000028042842C 
FFFFFFFFFFFFFFFF 0000000280E6BAB8 
FFFFFFFFFFFFFFFF 0000000280E6459C 
FFFFFFFFFFFFFFFF 0000000280D91AA8 
FFFFFFFFFFFFFFFF 0000000102940C70 libcoreclr.dylib!CallDescrWorkerInternal + 132
000000016FDFE750 00000001027B5C20 libcoreclr.dylib!MethodDescCallSite::CallTargetWorker(unsigned long const*, unsigned long*, int) + 852
000000016FDFE9C0 00000001026B32C0 libcoreclr.dylib!RunMain(MethodDesc*, short, int*, PtrArray**) + 640
000000016FDFEBD0 00000001026B35C4 libcoreclr.dylib!Assembly::ExecuteMainMethod(PtrArray**, int) + 380
000000016FDFEE60 00000001026E0818 libcoreclr.dylib!CorHost2::ExecuteAssembly(unsigned int, char16_t const*, int, char16_t const**, unsigned int*) + 472
000000016FDFEF40 00000001026A06F4 libcoreclr.dylib!coreclr_execute_assembly + 192
000000016FDFEFA0 0000000100511F10 libhostpolicy.dylib!run_app_for_context(hostpolicy_context_t const&, int, char const**) + 1072
000000016FDFF0A0 0000000100512C74 libhostpolicy.dylib!corehost_main + 240
000000016FDFF1F0 000000010029DD74 libhostfxr.dylib!fx_muxer_t::handle_exec_host_command(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, host_startup_info_t const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::unordered_map<known_options, std::__1::vector<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > >, known_options_hash, std::__1::equal_to<known_options>, std::__1::allocator<std::__1::pair<known_options const, std::__1::vector<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > > > > > const&, int, char const**, int, host_mode_t, bool, char*, int, int*) + 1328
000000016FDFF350 000000010029CE50 libhostfxr.dylib!fx_muxer_t::execute(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, int, char const**, host_startup_info_t const&, char*, int, int*) + 860
000000016FDFF460 0000000100299AB4 libhostfxr.dylib!hostfxr_main_startupinfo + 152
000000016FDFF510 0000000100009654 dotnet!exe_start(int, char const**) + 1176
000000016FDFF640 0000000100009804 dotnet!main + 160
000000016FDFF6A0 00000001000250F4 dyld!start + 520

The exception disappears with any of these:

  • delete any argument from either base or child's ctor signature (and call-sites)
  • replace Parallel.ForEach with a regular loop (happens only under high concurrency)
  • in Clone(), pass X in any previous integer and in ChildClass.ctor pass the corresponding argument to base (instead of last)

cc @jkotas, @janvorli

@agallero
Copy link
Author

Thanks for looking into it. Just in case it can help, the original bug in my codebase happens without Parallel.ForEach, and it is indeed happening in a normal loop. But the code is way more complex than this test case, and when I start removing code to make a minimal reproducible example, then the bug disappears.

So that's why I thought to put it into a parallel.foreach, since the problem seems to be a register or memory that isn't cleared, and high concurrency can help making sure the memory and registers are not zero.

But with more complex code, this is reproducible in a single thread.

@am11
Copy link
Member

am11 commented Mar 17, 2022

Thanks for the additional input! It is indeed an interesting edge case where (apparently) this particular set/order of arguments seem to have side-effects (while narrowing or widening the number). I was trying to pinpoint whether it is related to threading, runtime or case of bad codegen. Note that on linux-arm64, this issue does not reproduce.

@am11
Copy link
Member

am11 commented Mar 26, 2022

This also repros with daily build on osx-arm64 / M1 (tested with 7.0.100-preview.4.22174.1; runtime commit: c5d40c9).
It does not repro on win/linux x64 and linux-arm64.

@am11
Copy link
Member

am11 commented Mar 26, 2022

Repro without Prallel.ForEach and Action (predicate):

// osx-arm64 | .NET 7 & 6 | Debug & Release

using System;
using System.Runtime.CompilerServices;
using System.Threading;

internal sealed class C
{
    private readonly short X;

    internal C(int a, int b, int c, int d, byte e, byte f, int g, int last)
    {
        if (last > 0)
            throw new ApplicationException($"Unexpected value: {last}");

        X = (short) last;
    }

    [MethodImpl(MethodImplOptions.NoInlining)] // comment this line and ApplicationException will
                                               // disappear from release (but not debug).
    internal void M() => new C(0, 0, 0, 0, 0, 0, 0, X);
}

internal static class Program
{
    static void Main()
    {
        var tk = new C(0, 0, 0, 0, 0, 0, 0, 0);
        Thread.Sleep(1); // comment this line and ApplicationException will not be thrown
        tk.M();

        Console.WriteLine("finished");
    }
}

@AndyAyersMS AndyAyersMS self-assigned this Mar 28, 2022
@JulieLeeMSFT JulieLeeMSFT removed the untriaged New issue has not been triaged by the area owner label Mar 28, 2022
@JulieLeeMSFT JulieLeeMSFT added this to the 7.0.0 milestone Mar 28, 2022
@AndyAyersMS
Copy link
Member

Think this is a different issue than what we see in #67188 / #67152-- here we do a narrow store and a wide load; there we do wide stores.

In this case it looks like it's the caller's fault, we are passing an int so should be doing a 32 bit store.

;; caller M(.... X)

        79C01000          ldrsh   w0, [x0,#8]
        790003E0          strh    w0, [sp]      // [V01 OutArgs]

;; callee C..ctor(....)

        B94083A0          ldr     w0, [fp,#128]
        7100001F          cmp     w0, #0

@AndyAyersMS
Copy link
Member

M starts out on the right track, but morph seemingly gets confused:

fgMorphTree BB01, STMT00001 (before)
               [000014] --CXG-------              *  CALL      void   C..ctor
               [000013] ------------ this in x0   +--*  LCL_VAR   ref    V02 tmp1         
               [000000] ------------ arg1         +--*  CNS_INT   int    0
               [000001] ------------ arg2         +--*  CNS_INT   int    0
               [000002] ------------ arg3         +--*  CNS_INT   int    0
               [000003] ------------ arg4         +--*  CNS_INT   int    0
               [000015] ------------ arg5         +--*  PUTARG_TYPE ubyte 
               [000004] ------------              |  \--*  CNS_INT   int    0
               [000016] ------------ arg6         +--*  PUTARG_TYPE ubyte 
               [000005] ------------              |  \--*  CNS_INT   int    0
               [000006] ------------ arg7         +--*  CNS_INT   int    0
               [000017] ---XG------- arg8         \--*  PUTARG_TYPE int   
               [000008] ---XG-------                 \--*  FIELD     short  X
               [000007] ------------                    \--*  LCL_VAR   ref    V00 this         
Initializing arg info for 14.CALL:
ArgTable for 14.CALL after fgInitArgInfo:
fgArgTabEntry[arg 0 13.LCL_VAR ref (By ref), 1 reg: x0, byteAlignment=8]
fgArgTabEntry[arg 1 0.CNS_INT int (By ref), 1 reg: x1, byteAlignment=4]
fgArgTabEntry[arg 2 1.CNS_INT int (By ref), 1 reg: x2, byteAlignment=4]
fgArgTabEntry[arg 3 2.CNS_INT int (By ref), 1 reg: x3, byteAlignment=4]
fgArgTabEntry[arg 4 3.CNS_INT int (By ref), 1 reg: x4, byteAlignment=4]
fgArgTabEntry[arg 5 15.PUTARG_TYPE int (By ref), 1 reg: x5, byteAlignment=1]
fgArgTabEntry[arg 6 16.PUTARG_TYPE int (By ref), 1 reg: x6, byteAlignment=1]
fgArgTabEntry[arg 7 6.CNS_INT int (By ref), 1 reg: x7, byteAlignment=4]
fgArgTabEntry[arg 8 17.PUTARG_TYPE short (By ref), numSlots=1, slotNum=0, byteSize=4, byteOffset=0, byteAlignment=4]

Morphing args for 14.CALL:

Final value of Compiler::fgMorphField after calling fgMorphSmpOp:
               [000008] ---XG-------              *  IND       short 
               [000022] -----+------              \--*  ADD       byref 
               [000007] -----+------                 +--*  LCL_VAR   ref    V00 this         
               [000021] -----+------                 \--*  CNS_INT   long   8 field offset Fseq[X]

@AndyAyersMS
Copy link
Member

AndyAyersMS commented Mar 29, 2022

PUTARG_TYPE represents the type the callee expects, so the caller has to produce a value of that size. So, if there's a widening then morph needs to introduce a cast.

@jakobbotsch
Copy link
Member

PUTARG_TYPE represents the type the callee expects, so the caller has to produce a value of that size. So, if there's a widening then morph needs to introduce a cast.

FWIW, we have code that has some (non-macOS) handling for this during import, in the same place we insert PUTARG_TYPE nodes:

// insert implied casts (from float to double or double to float)
if ((jitSigType == TYP_DOUBLE) && (arg->GetNode()->TypeGet() == TYP_FLOAT))
{
arg->SetNode(gtNewCastNode(TYP_DOUBLE, arg->GetNode(), false, TYP_DOUBLE));
}
else if ((jitSigType == TYP_FLOAT) && (arg->GetNode()->TypeGet() == TYP_DOUBLE))
{
arg->SetNode(gtNewCastNode(TYP_FLOAT, arg->GetNode(), false, TYP_FLOAT));
}
// insert any widening or narrowing casts for backwards compatibility
arg->SetNode(impImplicitIorI4Cast(arg->GetNode(), jitSigType));

@AndyAyersMS
Copy link
Member

FWIW, we have code that has some (non-macOS) handling for this during import,

I think this is "too early" as we only need this for memory args, and don't know this yet?

@sandreenko any suggestions on where to try and fix this? My current thinking is to add a cast in morph if we have a stack PUTARG_TYPE -- if we see the putarg type is wider than the underlying node we need to extend the underlying node to the wider size.

@jakobbotsch
Copy link
Member

jakobbotsch commented Apr 16, 2022

I'm not sure that's the right fix, a small GT_IND node should guarantee a normalized result so the cast might just be discarded as an optimization if you try to do that.

I think the problem is here:

if (compMacOsArm64Abi())
{
switch (treeNode->GetStackByteSize())
{
case 1:
targetType = TYP_BYTE;
break;
case 2:
targetType = TYP_SHORT;
break;
default:
assert(treeNode->GetStackByteSize() >= 4);
break;
}
}

This needs to handle the 4 case too.

Alternatively we can fix it here:

if (!compMacOsArm64Abi())
{
targetType = genActualType(source->TypeGet());
}
else
{
targetType = source->TypeGet();
}

For small types it is not right to use the source type, we need the actual type of the argument.
In fact, maybe we should just stop special casing macOS arm64 here and the above switch will take care of the small types for us without any change.

jakobbotsch added a commit to jakobbotsch/runtime that referenced this issue Apr 16, 2022
@ghost ghost added the in-pr There is an active PR which will close this issue when it is merged label Apr 16, 2022
@ghost ghost removed the in-pr There is an active PR which will close this issue when it is merged label Apr 16, 2022
@ghost ghost locked as resolved and limited conversation to collaborators May 16, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
arch-arm64 area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI bug os-mac-os-x macOS aka OSX
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants