-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fatal CLR exception in .NET 5.0.0-RC2 high async/await load application #43707
Comments
Tagging subscribers to this area: @tommcdon |
@tactical-drone the issue you were describing was really different. However, the data you provided was not . The second one caused dotnet-dump to fail, which is a completely different issue altogether (@mikem8361 fyi). However, the frames that I see there only tell me that some check in |
The dotnet-dump
|
This commit had produced the error To reproduce, just checkout the branch net5 and run on rc2. But you do need a beefy PC on those test parameters. I run my tests on a 3950x with 64Gb ram, which is effectively a mini datacenter. Just set the total nodes to something more manageable like 500 or something. Bump the value until your memory runs out. 2000 nodes requires 10Gb ram. Wait for the error to occur. (All the program does is spawn a bunch of TCP/IP clients that connect to each other and then continuously exchange test bit between each other, 2000 nodes cause 30K connections so this starts to add up) With regards to the failure. I haven't seen it since. It occurred on a fresh boot (a boot that felt unstable and was rebooted not long after). I am 100% sure The reason I initially thought I then thought wtf (started googling), set In any case, I am not using vs2019 or it's debugger since from RC1 those became massively unstable. Before RC1 I could regularly debug my program from vs2019 preview, but after rc1 only very small test parameters work. Otherwise the debug session just hangs. It hangs regardless actually. |
Do you have any feedback on this? It is concerning and I am wondering it it was us who caused such instability. There was a bug that had to be fixed not too long ago (1744896) that would explain this behavior. This was part of the RC2 release. |
@hoyosjs RC2 has exactly the same debugger issue as RC1. I would say fairly regularly a debug session would just lock up when I run less that 30 nodes. The lockup would happen within 20 seconds of the debug session starting, or NEVER. Between 30 ~ 100 nodes this fairly regularly becomes a certainty. Failure type 1: Small testSometimes the debug session just locks up. This means, the IDE's debug area with CPU and Memory becomes unresponsive and frozen. A stop debug and "close the purple cmd window" and rerun normally mends the issue. It feels like the closing of that window increases the probability of the next run working and not closing it almost guarantees it failing. I could be wrong. Failure type2: Progressively larger testThis one can go two ways. A total lockup as in failure type 1, or, a "lockup" but the IDE's CPU and Memory region still works, showing no CPU activity but the Memory region shows ~20Mb/s consistent linear growth (with no GCs) in memory usage that looks like a runaway process in the CLR. |
Another issue I am having since RC1 is the following: I am testing memory leaks, this consists of a longish run in RELEASE mode and then a teardown cycle that ends with a manual I then go to vs2019 attach the debugger to the process and take a memory snapshot. During that heap snapshot calculation (which is a fat one that sometimes shows a terminate dialog but completes) the debugger sometimes crashes taking the process with it. This did not happen pre RC1 with even larger HEAP snapshot calculations. Further more, the debug became equally unstable in vs2019 as in Rider since RC1. They both exhibit exactly the same issues. So I don't think it has something to do with the IDE itself. Something else I have seen since RC1 is, with these super large tests if they sometimes work I see strange things in the HEAP after teardown. Some of my objects are still floating around, but the numbers are not right. For example I spawned 2K nodes, and the HEAP shows that there is exactly ONE still in the heap after teardown. That just does not make sense. I have been doing this leak debugging for a while now and since RC1 the large tests produce snapshots that I cannot understand. In smaller (that are still large) runs everything works. |
@dotnet/dotnet-diag as the last couple points seem worth investigating |
Could this be related to the addition of the pinned object heap? /cc @Maoni0 |
Is there a repro for this? |
The .net5 Release seems to have fixed these issues I had. I can now debug again, like I could before. Good job men! |
Glad to hear the issues have gone away so far. I'll close this, but feel free to ping again if the behavior surfaces again. |
I spoke to soon. Ignore the previous post.
For some bizarre reason I am getting this error today, where the code I am running has been the same, mostly and working for a while now. Bizarre.
I don't have the debugger attached, I am in (console) RELEASE mode running massive parallel async/await load consuming many chips.
Running .net 5 rc2 and possibly forgetting to set
$env:DOTNET_SYSTEM_THREADING_POOLASYNCVALUETASKS = 1
which I have been extensively using and playing around with. I left it off, maybe that is what was different. App running again, but this setting has not done anything like this before.This is all the debug info I could find:
Console stack that was dumped this time:
clr dotnet-dump stack clr threads with exceptions on them
(threadid 486)
:Originally posted by @tactical-drone in #13590 (comment)
The text was updated successfully, but these errors were encountered: