-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix & re-enable DualMode socket tests #31923
Conversation
} | ||
|
||
// Exclude ports which are unavailable at initialization time | ||
private static ConcurrentDictionary<int, int> GetAllPortsUsedBySystem() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if any of the ports in the allowed range are bound by other tests and/or processes after the static initializer runs?
Kestrel tests used to check if certain ports were available during xunit test discovery, but would find that the port would become unavailable by the time the test ran. Now, for tests that want to verify binding to a specific port, we test if the port is available at the start of each test, and skip the test otherwise. This improved reliability, but even that logic is still sometimes flaky, and that's despite running those tests serially in their own test group so they shouldn't be running in parallel with other tests.
Can we have RentPort() redo all the IsPortUsed() checks at the last possible moment in order to double check nothing else has taken the port? I think RentPortAndBindSocket should also retry on bind failures instead of throw.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The configured port range for TestPortPool
does not overlap with the OS ephemeral ports. I think if we make sure that this range s dedicated to the tests using TestPortPool
, and no other tests try to arbitrarily bind to it's ports, we don't need to be concerned about this in System.Net.Sockets.Tests. We may add a README.MD, to make sure no one will violate this in the future. Or am I missing something?
I wanted to avoid the creation + destroyal of temporary socket handles per rental, because I was afraid of possible esoteric OS side effects I'm not aware of. We need to create & destroy 4 sockets, according to the "unique-across-protocols" contract guaranteed by TestPortPool
. This might be a false concern though.
@@ -2489,7 +2493,8 @@ public SocketServer(ITestOutputHelper output, IPAddress address, bool dualMode, | |||
_server = new Socket(address.AddressFamily, SocketType.Stream, ProtocolType.Tcp); | |||
} | |||
|
|||
port = _server.BindToAnonymousPort(address); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't BindToAnonymousPort still preferable to BindToPoolPort in non dualMode cases?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The class is being used by several fragile DualMode cases. We could pass a parameter / use alternative constructor to distinguish them, but it doesn't worth the additional complexity IMO.
/azp run runtime-libraries outerloop |
Azure Pipelines successfully started running 1 pipeline(s). |
@antonfirsov it's been a while since this issue was last discussed. I want to make sure I understand the problem.
|
Not in practice, only in the theoretical case when you exhaust all the ports, but this is not verified.
Both TCP and UDP tests are affected.
Currently, it's only an IPv4 vs IPv6 DualMode problem. I think no TCP vs UDP port collisions will lead to such issues, but I made |
# Conflicts: # src/libraries/Common/tests/System/Net/Configuration.Sockets.cs
I did an experiment to check the dynamic port range on all OS machines, by introducing a failing test case that reports the port range in the exception message. It turns out, that it is set to defaults on all CI machines at the moment. |
// This test case is intended to detect a potential OS configuration changes on CI machines | ||
// that may prevent TestPortPool to operate correctly. The recommended action is to alter TestPortPool range | ||
// in those cases. | ||
// Although this test is relatively long running because of the external process execution it triggers, | ||
// it's better to keep it in the Inner Loop to detect the potential issues fast. | ||
[Fact] | ||
public void ConfiguredPortRange_DoesNotOverlapWith_OsDynamicPortRange() | ||
{ | ||
PortRange poolRange = TestPortPool.ConfiguredPortRange; | ||
var osRange = PortRange.GetDefaultOsDynamicPortRange(); | ||
string info = $"TestPortPool port range: {poolRange} | OS Dynamic Port Range: {osRange}"; | ||
_output.WriteLine(info); | ||
|
||
Assert.False(PortRange.AreOverlappingRanges(poolRange, osRange), | ||
$"Overlapping port ranges may prevent correct test execution! {info}" ); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of introducing complicated logic to predict a port range that is likely to be free on a given execution environment, I have added a test case to detect conflicts early. We should be good enough with this IMO.
|
||
namespace System.Net.Sockets.Tests | ||
{ | ||
internal readonly struct PortLease : IDisposable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of managing the ports with PortLease
, could we incrementally try ports from a range? If the range is large enough for all tests that need a unique port nr, we don't need to deal with returning (and possible issues by re-using port nrs).
There can be BindToTcpPoolPort
/BindToUdpPoolPort
methods which are similar to BindToAnonymousPort
. (TCP and UDP ports are distinct).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would mean creating and destroying temporary sockets, since we need to probe both IPV4 and IPV6.
I was afraid of possible side effects which are unknown to me, and rent/return seemed less risky. If you say there are none, I believe you :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would mean creating and destroying temporary sockets, since we need to probe both IPV4 and IPV6.
Why is probing needed?
If we incrementally use ports from the range, and not re-use any, do we still expect issues?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could we incrementally try ports from a range
I thought you meant probing by "trying" here. It seemed logical to probe both protocols to make sure we are safe.
But you are right, we don't need probing at all, if the port range is big enough compared to the number of tests. In that case I would just return the next port number, and let the test fail in the theoretical case of a collision (which we don't expect to happen in practice.)
I will remove the returning logic, and it's backing dictionary. Seems like a reasonable simplification, thanks for the suggestion!
@antonfirsov, are you still working on this PR, or should it be closed for now? The last interaction with it was seven months ago. Thanks. |
Closing as stale. |
Fix #1481 by replacing the standard Linux ephemeral port distribution mechanism with a custom one, implemented in the
TestPortPool
utility class.As described in the issue description, the root cause of these failures was that ephemeral ports are not unique across protocols.
TestPortPool
is intended to guarantee that. With the current configuration it uses the port range 17000-22000, and it's configurable. I made the assumption that these is not used by any application on the CI, but maybe there is a better range to choose.@tmds @steveharter although this solution differs from the one you suggested, it seems to be robust enough according to my experiments, and doesn't require additional ceremony in individual test cases. I wonder if you have any concerns?
I also tried serial execution of test-cases, but it slowed down the execution time significantly when applied to all DualMode tests on Windows. It could be fine-tuned, but that would increase the accidental complexity of the test classes.
Additionally, the PR fine tunes which tests are in OuterLoop depending on local test execution time. Also fixes #19162.
/cc @wfurt