-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SPMI asmdiff errors on Arm64 Linux #91257
Comments
Tagging subscribers to this area: @hoyosjs Issue DetailsDescriptionOn Arm64 Linux, running SPMI asmdiffs causes errors:
And the script fails with no diffs. Reproduction Steps
Expected behaviorSPMI will run to completion without any errors Actual behaviorFull log file: superpmi.4.log Regression?Used to work. Known WorkaroundsNone. Configurationruntime HEAD:
Linux Ubuntu 22.04, Arm64 Altra. Other informationWorks on: Linux Ubuntu 22.04, X64
|
What happens when you run the command it is failing on manually without parallelism? I.e. /home/alahay01/dotnet/runtime_base/artifacts/tests/coreclr/linux.arm64.Checked/Tests/Core_Root/superpmi -a -jitoption force JitAlignLoops=0 -jitoption force JitEnableNoWayAssert=1 -jitoption force JitNoForceFallback=1 -jit2option force JitAlignLoops=0 -jit2option force JitEnableNoWayAssert=1 -jit2option force JitNoForceFallback=1 /home/alahay01/dotnet/runtime_base/artifacts/spmi/basejit/31234a863efe1a4dc1c6f4f1520f8515d5a90640.linux.arm64.Checked/libclrjit.so /home/alahay01/dotnet/runtime_base/artifacts/tests/coreclr/linux.arm64.Checked/Tests/Core_Root/libclrjit.so /home/alahay01/dotnet/runtime_base/artifacts/spmi/mch/4bceb905-d550-4a5d-b1eb-276fff68d183.linux.arm64/libraries_tests.pmi.linux.arm64.checked.mch |
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch Issue DetailsDescriptionOn Arm64 Linux, running SPMI asmdiffs causes errors:
And the script fails with no diffs. Reproduction Steps
Expected behaviorSPMI will run to completion without any errors Actual behaviorFull log file: superpmi.4.log Regression?Used to work. Known WorkaroundsNone. Configurationruntime HEAD:
Linux Ubuntu 22.04, Arm64 Altra. Other informationWorks on: Linux Ubuntu 22.04, X64 SPMI replay works on Linux, Arm64
|
I eventually get a segfault....
|
Do you have a backtrace? Generally SPMI should handle exceptions occurring within the JIT, so this might be a segfault in SPMI itself. |
Not sure why it wasn't saving a coredump (I had ulimit -c unlimited set) Running in gdb:
CompileMethod is 0 when inside Curiously the caller, neardiffer, is inside an Arm64 only block at the time:
|
I'm not sure exactly what the code is doing (
But that (as probably expected) gives the following error from the
Then it errors a bit further along:
|
Does the following patch fix the problem: diff --git a/src/coreclr/tools/superpmi/superpmi/neardiffer.cpp b/src/coreclr/tools/superpmi/superpmi/neardiffer.cpp
index 4fa300725cb..c80970f538c 100644
--- a/src/coreclr/tools/superpmi/superpmi/neardiffer.cpp
+++ b/src/coreclr/tools/superpmi/superpmi/neardiffer.cpp
@@ -1247,28 +1247,29 @@ bool NearDiffer::compare(MethodContext* mc, CompileResult* cr1, CompileResult* c
// is a sum of their sizes. The following is to adjust their sizes and the roDataBlock_{1,2} pointers.
if (GetSpmiTargetArchitecture() == SPMI_TARGET_ARCHITECTURE_ARM64)
{
- BYTE* nativeEntry_1;
- ULONG nativeSizeOfCode_1;
- CorJitResult jitResult_1;
+ if (hotCodeSize_1 > 0)
+ {
+ BYTE* nativeEntry_1;
+ ULONG nativeSizeOfCode_1;
+ CorJitResult jitResult_1;
+ cr1->repCompileMethod(&nativeEntry_1, &nativeSizeOfCode_1, &jitResult_1);
+ roDataSize_1 = hotCodeSize_1 - nativeSizeOfCode_1;
+ roDataBlock_1 = hotCodeBlock_1 + nativeSizeOfCode_1;
+ orig_roDataBlock_1 = (void*)((size_t)orig_hotCodeBlock_1 + nativeSizeOfCode_1);
+ hotCodeSize_1 = nativeSizeOfCode_1;
+ }
- BYTE* nativeEntry_2;
- ULONG nativeSizeOfCode_2;
- CorJitResult jitResult_2;
-
- cr1->repCompileMethod(&nativeEntry_1, &nativeSizeOfCode_1, &jitResult_1);
- cr2->repCompileMethod(&nativeEntry_2, &nativeSizeOfCode_2, &jitResult_2);
-
- roDataSize_1 = hotCodeSize_1 - nativeSizeOfCode_1;
- roDataSize_2 = hotCodeSize_2 - nativeSizeOfCode_2;
-
- roDataBlock_1 = hotCodeBlock_1 + nativeSizeOfCode_1;
- roDataBlock_2 = hotCodeBlock_2 + nativeSizeOfCode_2;
-
- orig_roDataBlock_1 = (void*)((size_t)orig_hotCodeBlock_1 + nativeSizeOfCode_1);
- orig_roDataBlock_2 = (void*)((size_t)orig_hotCodeBlock_2 + nativeSizeOfCode_2);
-
- hotCodeSize_1 = nativeSizeOfCode_1;
- hotCodeSize_2 = nativeSizeOfCode_2;
+ if (hotCodeSize_2 > 0)
+ {
+ BYTE* nativeEntry_2;
+ ULONG nativeSizeOfCode_2;
+ CorJitResult jitResult_2;
+ cr2->repCompileMethod(&nativeEntry_2, &nativeSizeOfCode_2, &jitResult_2);
+ roDataSize_2 = hotCodeSize_2 - nativeSizeOfCode_2;
+ roDataBlock_2 = hotCodeBlock_2 + nativeSizeOfCode_2;
+ orig_roDataBlock_2 = (void*)((size_t)orig_hotCodeBlock_2 + nativeSizeOfCode_2);
+ hotCodeSize_2 = nativeSizeOfCode_2;
+ }
}
LogDebug("HCS1 %d CCS1 %d RDS1 %d xcpnt1 %d flag1 %08X, HCB %p CCB %p RDB %p ohcb %p occb %p odb %p", hotCodeSize_1, It is likely a regression introduced by #89654, though it is strange to me that it works on win-arm64 but not linux-arm64 if so. I will try to take a closer look once I have some free cycles. |
That seems to fix it!
Running inside gdb with the patch, I still get the sigtraps, but I'm assuming that is expected behaviour.
On Windows we were running the windows .mch files. Quite possible it falls over on Windows if you give it the Linux .mch files
I can use the above fix for now, and will await the real patch. Thanks. |
After dotnet#89654 SPMI replay will succeed instead of result in replay errors in expected error cases (such as BADCODE or EE exception). To support diffing such contexts, we record zero-sized assembly that the near differ uses. However, on arm64 there is some additional code that calls repCompileMethod to make some additional adjustments to the code blob, and in the "EE exception" cases we cannot replay this function, resulting in crash during asmdiff. This fixes the problem by only making the adjustments when we know there is any code. An alternative solution could be to avoid invoking the neardiffer at all in the succeeding error cases, but this seemed like an ok pragmatic solution. Fix dotnet#91257
After #89654 SPMI replay will succeed instead of result in replay errors in expected error cases (such as BADCODE or EE exception). To support diffing such contexts, we record zero-sized assembly that the near differ uses. However, on arm64 there is some additional code that calls repCompileMethod to make some additional adjustments to the code blob, and in the "EE exception" cases we cannot replay this function, resulting in crash during asmdiff. This fixes the problem by only making the adjustments when we know there is any code. An alternative solution could be to avoid invoking the neardiffer at all in the succeeding error cases, but this seemed like an ok pragmatic solution. Fix #91257
Confirmed this works for me now. Thanks! |
…#91783) After dotnet#89654 SPMI replay will succeed instead of result in replay errors in expected error cases (such as BADCODE or EE exception). To support diffing such contexts, we record zero-sized assembly that the near differ uses. However, on arm64 there is some additional code that calls repCompileMethod to make some additional adjustments to the code blob, and in the "EE exception" cases we cannot replay this function, resulting in crash during asmdiff. This fixes the problem by only making the adjustments when we know there is any code. An alternative solution could be to avoid invoking the neardiffer at all in the succeeding error cases, but this seemed like an ok pragmatic solution. Fix dotnet#91257
Description
On Arm64 Linux, running SPMI asmdiffs causes errors:
And the script fails with no diffs.
Reproduction Steps
Expected behavior
SPMI will run to completion without any errors
Actual behavior
Full log file: superpmi.4.log
Regression?
Used to work.
Known Workarounds
None.
Configuration
runtime HEAD:
Linux Ubuntu 22.04, Arm64 Altra.
Other information
Works on: Linux Ubuntu 22.04, X64
Works on: Windows, Arm64
SPMI replay works on Linux, Arm64
The text was updated successfully, but these errors were encountered: