-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GC/API/GC/GetGCMemoryInfo/GetGCMemoryInfo.sh test failing intermittently on CoreCLR Linux ARM32 #73247
Comments
Tagging subscribers to this area: @dotnet/gc Issue DetailsRunfo Creating Tracking Issue (data being generated)
|
@mangod9 given the hit rate, I'll disable this test while it gets investigated |
PR to disable: #73477 |
I have investigated it. The failure is pretty strange. We crash because the current thread's Thread::m_pFrame is 0xffffffff when the NDirectImportWorker attempts to execute the following assert: runtime/src/coreclr/vm/dllimport.cpp Lines 5852 to 5853 in 3238c2a
So the This is the call stack:
|
The managed frame is obviously the |
I don't understand how the #73491 fix would be related to what I am seeing in the dump. But it is possible there are more failure modes. |
guess we should disable it then and continue to investigate. |
…lter clauses (dotnet#73032)" This reverts commit 4c07f3d. We believe it is causing recent CI failures. See dotnet#73247
The failure here is not just a linux arm issue. See this (sadly Microsoft internal only) kusto query. On Windows it shows up less frequently and as an assert firing on win-arm (asserts are off on linux-arm). As Noah observed, there's a sharp rise on The first few are different asserts from other PRs that weren't merged largely. The first PR that saw the regression was #73032, after that it was the IBC PR that already contained the former PR. Looks like Noah got this one right :) |
On windows we end up unable to unwind for the assert:
|
We will evaluate whether the revert resolves this issue. |
Test disabled in #73477 - not blocking CI anymore. Adjusting labels accordingly. |
@hoyosjs guess we should re-enable again since pinvoke inlining PR is reverted? |
Noah already enabled again in PR #73595 |
ok nice, so good to close this issue then? |
I linked the PR for the reenable so that once merged this just gets closed. |
@noahfalk can you please confirm that you believe / verified that PR #73032 by @jkoritzinsky caused this regression and made tests fail? Did the PR cause other problems in the related issues on arm32? (aka is it the GC Hole @jkotas asked you to track down?) (sorry, there is too many links and comments I understand only half-way, so trying to clarify it ...) |
Yes, there was a large increase in failure rate shortly after 73032 was checked in and from the looks of it there have been no failures in the last 18 hours where the test was re-enabled but 73032 isn't present. 73032 also failed the GetGCMemoryInfo test in the initial PR when it was merged and passed in the change that reverted it. I did not do any direct analysis of the code change in 73032, all of the conclusions are based on test results with and without 73032 present.
@hoyosjs pointed out some other failures he found that were correlated with the change.
Yes I treat 73032 as being that issue. Jan also referenced some ARM failures which predated last week and 73032 wasn't checked in at the time those occurred so it couldn't have caused them. I did not investigate them independently once they appeared to be disconnected from the major failure source.
No worries, hope this helps clear it up a bit : ) |
Failures - 4/5-8/5 (incl. PRs) via Runfo - last 120 days
Potentially related: Failures for #73433 also start on arm on 7/28, ~5 hours before this test first fails.
Runfo Tracking Issue: GC/API/GC/GetGCMemoryInfo/GetGCMemoryInfo.sh test failing intermittently on CoreCLR Linux ARM32
Displaying 100 of 137 results
Build Result Summary
The text was updated successfully, but these errors were encountered: