-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Modified PR for reverse diagnostics server #35850
Modified PR for reverse diagnostics server #35850
Conversation
/azp list |
/azp run runtime-coreclr outerloop |
Azure Pipelines could not run because the pipeline triggers exclude this branch/path. |
Failed leg's test failure looks to have been hit before in #32955. Rerunning... |
Socket test failures look to be existing with failures reported as recent as yesterday. It repros on my Mac for all currently shipped previews of 5.0, albeit with a slightly different failure:
Edit: Has an issue tracking it now -> #35886 Linq test failures also appear to be existing and have been failing The I'm not convinced that any of these failures are due to this change, but will do some due diligence to be more confident in that assertion. |
Ran some benchmarking (avg, max, min for 1000 measurements) for the scenario in #12991 and got the following results: 3.1 $ C:\git\scratch\benchmark.ps1 { ./ConsoleApp.exe } -Samples 1000 -Silent
Average: 70.7435069ms
Minimum: 64.9551ms
Maximum: 87.1662ms 5.0-preview3 $ C:\git\scratch\benchmark.ps1 { ./ConsoleApp.exe } -Samples 1000 -Silent
Average: 62.6742732000001ms
Minimum: 56.4682ms
Maximum: 98.195ms This PR's coreclr.dll built on top of master in a self-contained publish of preview 3:
so we have between preview 3 and this PR on top of master (AKA preview 5). If I recall, the original issue for #12991 only showed when the app was executed on a machine with Hyper-V turned off. I'm not sure if I can turn off Hyper-V on my work desktop since I'm remoting into it from home and can't touch the UEFI options, but I'll try to run these same experiments on a Windows box with Hyper-V off. |
That sounds suspect. Did we get to a root cause for #12991 ? I am wondering whether it could have been another shutdown race in eventpipe that - depending on timing - destroyed something that was still in use. Hyper-V turned off just changed the timing to trigger the problem. |
I'm trying to find explicit documentation of the issue repro steps. In the meantime, this was the code added to fix the issue: https://github.com/dotnet/coreclr/pull/25602/files#diff-776272f612a5b488defb167f8afb2333R212-R232 According to the associated PR, the root cause was a difference in time between the OS unblocking the |
This code was deleted in dotnet/coreclr#27136 because of it was again causing intermittent hangs during shutdown. |
Ah, found a more detailed description/discussion: #13563 In light of the discussion in that issue and the fact that 3.1 doesn't have this code path (it only attempts to unlink the unix domain socket on linux) I'm starting to think I should discount #12991 until I can find explicit documentation that this issue is applicable. Thoughts? @sergiy-k I believe the original report for that issue came through you from a partner team. Do you have any further information on this floating around your inbox? |
Talked with Sergiy offline, and it sounds like the original issue was specifically on x64 with "virtualization support" turned off at the BIOS/UEFI level. Additionally, the issue wasn't consistent in nature and was more that there was a variation in the execution time that sometimes had this delay. I'm working on getting a machine with virtualization support turned off to run these benchmarks on and see if this behavior is still present. |
91ecfac
to
8df50c8
Compare
Worked offline with @noahfalk to test runtimes on a machine with "Virtualization Support" turned off. We couldn't reproduce any slowdown effects. Furthermore, we concluded that there wasn't an apparent slowdown from removing the cleanup logic. I've modified the PR to only perform minimal cleanup for Diagnostics Server resources on shutdown, i.e., it will only unlink the Unix Domain Socket on non-Windows platforms. |
CI failure looks to be #35877. I'm going to rebase and rerun CI. |
8df50c8
to
08267ee
Compare
@msftbot merge after @noahfalk approves |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
This PR un-reverts the reverse feature of the Diagnostics Server and adds fixes for a HANDLE-use-after-close issue that cropped up in CI (see #35451).
I recommend looking at this PR commit by commit, to filter out the revert changes (776724f) from the fixes introduced in this PR.
I'll open this PR in draft mode to run some outer loop CI over it and then open it for further review.
Details of changes:
Add a mechanism for canceling out ofIpcStreamFactory::Poll
should prevent regression for Diagnostics Server IPC thread causes 20-40ms shutdown delays on Windows #12991CC - @tommcdon @noahfalk @jkotas @sywhang