Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix & re-enable DualMode socket tests #31923

Closed
wants to merge 15 commits into from

Conversation

antonfirsov
Copy link
Member

@antonfirsov antonfirsov commented Feb 7, 2020

Fix #1481 by replacing the standard Linux ephemeral port distribution mechanism with a custom one, implemented in the TestPortPool utility class.

As described in the issue description, the root cause of these failures was that ephemeral ports are not unique across protocols. TestPortPool is intended to guarantee that. With the current configuration it uses the port range 17000-22000, and it's configurable. I made the assumption that these is not used by any application on the CI, but maybe there is a better range to choose.

@tmds @steveharter although this solution differs from the one you suggested, it seems to be robust enough according to my experiments, and doesn't require additional ceremony in individual test cases. I wonder if you have any concerns?

I also tried serial execution of test-cases, but it slowed down the execution time significantly when applied to all DualMode tests on Windows. It could be fine-tuned, but that would increase the accidental complexity of the test classes.

Additionally, the PR fine tunes which tests are in OuterLoop depending on local test execution time. Also fixes #19162.

/cc @wfurt

@antonfirsov antonfirsov added area-System.Net.Sockets test-enhancement Improvements of test source code labels Feb 7, 2020
@antonfirsov antonfirsov added this to the 5.0 milestone Feb 7, 2020
@antonfirsov antonfirsov requested a review from a team February 7, 2020 17:57
}

// Exclude ports which are unavailable at initialization time
private static ConcurrentDictionary<int, int> GetAllPortsUsedBySystem()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if any of the ports in the allowed range are bound by other tests and/or processes after the static initializer runs?

Kestrel tests used to check if certain ports were available during xunit test discovery, but would find that the port would become unavailable by the time the test ran. Now, for tests that want to verify binding to a specific port, we test if the port is available at the start of each test, and skip the test otherwise. This improved reliability, but even that logic is still sometimes flaky, and that's despite running those tests serially in their own test group so they shouldn't be running in parallel with other tests.

Can we have RentPort() redo all the IsPortUsed() checks at the last possible moment in order to double check nothing else has taken the port? I think RentPortAndBindSocket should also retry on bind failures instead of throw.

Copy link
Member Author

@antonfirsov antonfirsov Feb 8, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The configured port range for TestPortPool does not overlap with the OS ephemeral ports. I think if we make sure that this range s dedicated to the tests using TestPortPool, and no other tests try to arbitrarily bind to it's ports, we don't need to be concerned about this in System.Net.Sockets.Tests. We may add a README.MD, to make sure no one will violate this in the future. Or am I missing something?

I wanted to avoid the creation + destroyal of temporary socket handles per rental, because I was afraid of possible esoteric OS side effects I'm not aware of. We need to create & destroy 4 sockets, according to the "unique-across-protocols" contract guaranteed by TestPortPool. This might be a false concern though.

@@ -2489,7 +2493,8 @@ public SocketServer(ITestOutputHelper output, IPAddress address, bool dualMode,
_server = new Socket(address.AddressFamily, SocketType.Stream, ProtocolType.Tcp);
}

port = _server.BindToAnonymousPort(address);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't BindToAnonymousPort still preferable to BindToPoolPort in non dualMode cases?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The class is being used by several fragile DualMode cases. We could pass a parameter / use alternative constructor to distinguish them, but it doesn't worth the additional complexity IMO.

@dotnet dotnet deleted a comment from azure-pipelines bot Feb 8, 2020
@antonfirsov
Copy link
Member Author

/azp run runtime-libraries outerloop

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@tmds
Copy link
Member

tmds commented Feb 11, 2020

@antonfirsov it's been a while since this issue was last discussed. I want to make sure I understand the problem.

  • Is it also an issue on Windows?
  • Is it an issue for UDP, and TCP?
  • Is always caused by IPv4 and IPv6 being used on the same port? Or are there other causes?

@antonfirsov
Copy link
Member Author

antonfirsov commented Feb 11, 2020

@tmds

Is it also an issue on Windows?

Not in practice, only in the theoretical case when you exhaust all the ports, but this is not verified.

Is it an issue for UDP, and TCP?

Both TCP and UDP tests are affected.

Is always caused by IPv4 and IPv6 being used on the same port? Or are there other causes?

Currently, it's only an IPv4 vs IPv6 DualMode problem. I think no TCP vs UDP port collisions will lead to such issues, but I made TestPortPool a centralized static class to provide unique ports "across everything" for simplicity.

@antonfirsov
Copy link
Member Author

I did an experiment to check the dynamic port range on all OS machines, by introducing a failing test case that reports the port range in the exception message. It turns out, that it is set to defaults on all CI machines at the moment.

Comment on lines +228 to +243
// This test case is intended to detect a potential OS configuration changes on CI machines
// that may prevent TestPortPool to operate correctly. The recommended action is to alter TestPortPool range
// in those cases.
// Although this test is relatively long running because of the external process execution it triggers,
// it's better to keep it in the Inner Loop to detect the potential issues fast.
[Fact]
public void ConfiguredPortRange_DoesNotOverlapWith_OsDynamicPortRange()
{
PortRange poolRange = TestPortPool.ConfiguredPortRange;
var osRange = PortRange.GetDefaultOsDynamicPortRange();
string info = $"TestPortPool port range: {poolRange} | OS Dynamic Port Range: {osRange}";
_output.WriteLine(info);

Assert.False(PortRange.AreOverlappingRanges(poolRange, osRange),
$"Overlapping port ranges may prevent correct test execution! {info}" );
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of introducing complicated logic to predict a port range that is likely to be free on a given execution environment, I have added a test case to detect conflicts early. We should be good enough with this IMO.


namespace System.Net.Sockets.Tests
{
internal readonly struct PortLease : IDisposable
Copy link
Member

@tmds tmds Feb 17, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of managing the ports with PortLease, could we incrementally try ports from a range? If the range is large enough for all tests that need a unique port nr, we don't need to deal with returning (and possible issues by re-using port nrs).
There can be BindToTcpPoolPort/BindToUdpPoolPort methods which are similar to BindToAnonymousPort. (TCP and UDP ports are distinct).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would mean creating and destroying temporary sockets, since we need to probe both IPV4 and IPV6.

I was afraid of possible side effects which are unknown to me, and rent/return seemed less risky. If you say there are none, I believe you :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would mean creating and destroying temporary sockets, since we need to probe both IPV4 and IPV6.

Why is probing needed?
If we incrementally use ports from the range, and not re-use any, do we still expect issues?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we incrementally try ports from a range

I thought you meant probing by "trying" here. It seemed logical to probe both protocols to make sure we are safe.

But you are right, we don't need probing at all, if the port range is big enough compared to the number of tests. In that case I would just return the next port number, and let the test fail in the theoretical case of a collision (which we don't expect to happen in practice.)

I will remove the returning logic, and it's backing dictionary. Seems like a reasonable simplification, thanks for the suggestion!

@stephentoub
Copy link
Member

@antonfirsov, are you still working on this PR, or should it be closed for now? The last interaction with it was seven months ago. Thanks.

@stephentoub
Copy link
Member

Closing as stale.

@ghost ghost locked as resolved and limited conversation to collaborators Dec 10, 2020
@karelz karelz modified the milestones: 5.0.0, 6.0.0 Jan 26, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-System.Net.Sockets test-enhancement Improvements of test source code
Projects
None yet
5 participants