-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TcpReceiveSendGetsCanceledByDispose tests fail on Fedora 38 #91543
Comments
Tagging subscribers to this area: @dotnet/ncl Issue DetailsThe following tests fail on Fedora 38:
The exceptions look like:
These tests don't fail on Fedora 37, so the behavior is possibly triggered by a change in kernel behavior. I will investigate further when I have some time. cc @omajid
|
@tmds please do not remove |
Ok, I haven't had this feedback before. I'll stop removing the label. |
@tmds this is failing for you on x64 with Core, right? These exact failures are plaguing our partners at IBM, for their configurations (Mono on PPC64 little endian, Mono on s390x). At the very least that seems to imply it's not a runtime-specific issue. e.g. https://dev.azure.com/dnceng-public/public/_build/results?buildId=421005&view=results |
Right. As mentioned in the initial comment, I think it's in the kernel. Note that these tests don't fail on our CI when they run on ppc64le with RHEL 8. |
The IBM failures are on Ubuntu 20.04, with a 5.4 kernel. Fedora 38 is 6.2.9. That seems too wide a range (especially since Fedora 37 was OK, with kernel 6.0.7) |
It could be different kernel bugs ... Has this test ever passed on the public CI setup? If so, do you know when it started to fail? |
Nothing in the commit range stands out. I will investigate further when I upgrade to Fedora 38, which will next month or so. |
Interesting question. According to the dpkg logs, OS updates happened on August 23rd, and September 13th - there's nothing on the OS/VM side which changed in the problem range. |
I tried on Ubuntu 20.04 with kernel version 5.4.0-162-generic and it worked there. Below is the console log: |
Seems I don't need to upgrade to Fedora 38 to investigate, as the issue is now also reproducing on my Fedora 37 system. These are my findings. runtime/src/libraries/System.Net.Sockets/src/System/Net/Sockets/SafeSocketHandle.Unix.cs Line 218 in 9476f40
unexpectedly gets runtime/src/native/libs/System.Native/pal_networking.c Lines 3072 to 3077 in 9476f40
This is covered by the runtime/src/libraries/System.Net.Sockets/tests/FunctionalTests/SendReceive/SendReceive.cs Lines 1130 to 1137 in 9476f40
This reproduces on my Fedora 37 system with the 6.4.15-100.fc37.x86_64 kernel, but it does not reproduce on our Fedora 37 CI system which has an older 6.4.15-100.fc37.x86_64 kernel. I will investigate further with a kernel engineer what is causing the change in behavior from the kernel side. To unblock the CI, you can add the failing cases to this: runtime/src/libraries/System.Net.Sockets/tests/FunctionalTests/SendReceive/SendReceive.cs Lines 1034 to 1038 in 9476f40
|
This is caused by this change: torvalds/linux@4faeee0. It explicitly prevents connect to work when operations are on going. We prefer We'll see the test fail on newer kernels, and kernels that back-port this change. We should set |
The following tests fail on Fedora 38:
The exceptions look like:
These tests don't fail on Fedora 37, so the behavior is possibly triggered by a change in kernel behavior.
I will investigate further when I have some time.
cc @omajid
The text was updated successfully, but these errors were encountered: