Fix & re-enable DualMode socket tests #31923

antonfirsov · 2020-02-07T17:57:32Z

Fix #1481 by replacing the standard Linux ephemeral port distribution mechanism with a custom one, implemented in the TestPortPool utility class.

As described in the issue description, the root cause of these failures was that ephemeral ports are not unique across protocols. TestPortPool is intended to guarantee that. With the current configuration it uses the port range 17000-22000, and it's configurable. I made the assumption that these is not used by any application on the CI, but maybe there is a better range to choose.

@tmds @steveharter although this solution differs from the one you suggested, it seems to be robust enough according to my experiments, and doesn't require additional ceremony in individual test cases. I wonder if you have any concerns?

I also tried serial execution of test-cases, but it slowed down the execution time significantly when applied to all DualMode tests on Windows. It could be fine-tuned, but that would increase the accidental complexity of the test classes.

Additionally, the PR fine tunes which tests are in OuterLoop depending on local test execution time. Also fixes #19162.

/cc @wfurt

halter73 · 2020-02-08T02:36:21Z

src/libraries/Common/tests/System/Net/Sockets/TestPortPool.cs

+        }
+
+        // Exclude ports which are unavailable at initialization time
+        private static ConcurrentDictionary<int, int> GetAllPortsUsedBySystem()


What if any of the ports in the allowed range are bound by other tests and/or processes after the static initializer runs?

Kestrel tests used to check if certain ports were available during xunit test discovery, but would find that the port would become unavailable by the time the test ran. Now, for tests that want to verify binding to a specific port, we test if the port is available at the start of each test, and skip the test otherwise. This improved reliability, but even that logic is still sometimes flaky, and that's despite running those tests serially in their own test group so they shouldn't be running in parallel with other tests.

Can we have RentPort() redo all the IsPortUsed() checks at the last possible moment in order to double check nothing else has taken the port? I think RentPortAndBindSocket should also retry on bind failures instead of throw.

The configured port range for TestPortPool does not overlap with the OS ephemeral ports. I think if we make sure that this range s dedicated to the tests using TestPortPool, and no other tests try to arbitrarily bind to it's ports, we don't need to be concerned about this in System.Net.Sockets.Tests. We may add a README.MD, to make sure no one will violate this in the future. Or am I missing something?

I wanted to avoid the creation + destroyal of temporary socket handles per rental, because I was afraid of possible esoteric OS side effects I'm not aware of. We need to create & destroy 4 sockets, according to the "unique-across-protocols" contract guaranteed by TestPortPool. This might be a false concern though.

halter73 · 2020-02-08T02:38:52Z

src/libraries/System.Net.Sockets/tests/FunctionalTests/DualModeSocketTest.cs

@@ -2489,7 +2493,8 @@ public SocketServer(ITestOutputHelper output, IPAddress address, bool dualMode,
                    _server = new Socket(address.AddressFamily, SocketType.Stream, ProtocolType.Tcp);
                }

-                port = _server.BindToAnonymousPort(address);


Isn't BindToAnonymousPort still preferable to BindToPoolPort in non dualMode cases?

The class is being used by several fragile DualMode cases. We could pass a parameter / use alternative constructor to distinguish them, but it doesn't worth the additional complexity IMO.

antonfirsov · 2020-02-08T13:08:56Z

/azp run runtime-libraries outerloop

azure-pipelines · 2020-02-08T13:09:08Z

Azure Pipelines successfully started running 1 pipeline(s).

tmds · 2020-02-11T10:34:24Z

@antonfirsov it's been a while since this issue was last discussed. I want to make sure I understand the problem.

Is it also an issue on Windows?
Is it an issue for UDP, and TCP?
Is always caused by IPv4 and IPv6 being used on the same port? Or are there other causes?

antonfirsov · 2020-02-11T13:59:59Z

@tmds

Is it also an issue on Windows?

Not in practice, only in the theoretical case when you exhaust all the ports, but this is not verified.

Is it an issue for UDP, and TCP?

Both TCP and UDP tests are affected.

Is always caused by IPv4 and IPv6 being used on the same port? Or are there other causes?

Currently, it's only an IPv4 vs IPv6 DualMode problem. I think no TCP vs UDP port collisions will lead to such issues, but I made TestPortPool a centralized static class to provide unique ports "across everything" for simplicity.

# Conflicts: # src/libraries/Common/tests/System/Net/Configuration.Sockets.cs

antonfirsov · 2020-02-17T13:05:05Z

I did an experiment to check the dynamic port range on all OS machines, by introducing a failing test case that reports the port range in the exception message. It turns out, that it is set to defaults on all CI machines at the moment.

antonfirsov · 2020-02-17T13:08:14Z

src/libraries/Common/tests/Tests/System/Net/TestPortPoolTests.cs

+        // This test case is intended to detect a potential OS configuration changes on CI machines
+        // that may prevent TestPortPool to operate correctly. The recommended action is to alter TestPortPool range
+        // in those cases.
+        // Although this test is relatively long running because of the external process execution it triggers,
+        // it's better to keep it in the Inner Loop to detect the potential issues fast.
+        [Fact]
+        public void ConfiguredPortRange_DoesNotOverlapWith_OsDynamicPortRange()
+        {
+            PortRange poolRange = TestPortPool.ConfiguredPortRange;
+            var osRange = PortRange.GetDefaultOsDynamicPortRange();
+            string info = $"TestPortPool port range: {poolRange} | OS Dynamic Port Range: {osRange}";
+            _output.WriteLine(info);
+
+            Assert.False(PortRange.AreOverlappingRanges(poolRange, osRange),
+                $"Overlapping port ranges may prevent correct test execution! {info}" );
+        }


Instead of introducing complicated logic to predict a port range that is likely to be free on a given execution environment, I have added a test case to detect conflicts early. We should be good enough with this IMO.

tmds · 2020-02-17T15:50:30Z

src/libraries/Common/tests/System/Net/Sockets/TestPortPool.cs

+
+namespace System.Net.Sockets.Tests
+{
+    internal readonly struct PortLease : IDisposable


Instead of managing the ports with PortLease, could we incrementally try ports from a range? If the range is large enough for all tests that need a unique port nr, we don't need to deal with returning (and possible issues by re-using port nrs).
There can be BindToTcpPoolPort/BindToUdpPoolPort methods which are similar to BindToAnonymousPort. (TCP and UDP ports are distinct).

That would mean creating and destroying temporary sockets, since we need to probe both IPV4 and IPV6.

I was afraid of possible side effects which are unknown to me, and rent/return seemed less risky. If you say there are none, I believe you :)

That would mean creating and destroying temporary sockets, since we need to probe both IPV4 and IPV6.

Why is probing needed?
If we incrementally use ports from the range, and not re-use any, do we still expect issues?

could we incrementally try ports from a range

I thought you meant probing by "trying" here. It seemed logical to probe both protocols to make sure we are safe.

But you are right, we don't need probing at all, if the port range is big enough compared to the number of tests. In that case I would just return the next port number, and let the test fail in the theoretical case of a collision (which we don't expect to happen in practice.)

I will remove the returning logic, and it's backing dictionary. Seems like a reasonable simplification, thanks for the suggestion!

stephentoub · 2020-09-14T21:24:54Z

@antonfirsov, are you still working on this PR, or should it be closed for now? The last interaction with it was seven months ago. Thanks.

stephentoub · 2020-09-18T13:47:23Z

Closing as stale.

antonfirsov added 6 commits February 7, 2020 14:53

re enable Dual Mode tests listed in dotnet#1481

477acf8

Move everything to innerloop temporarily

05a3d83

add TestPortPool

55083d2

Utilize TestPortPool in DualModeSocketTest

62f8817

Move long-running tests to OuterLoop

31500a4

Improve comments and code

844e1e0

antonfirsov added area-System.Net.Sockets test-enhancement Improvements of test source code labels Feb 7, 2020

antonfirsov added this to the 5.0 milestone Feb 7, 2020

antonfirsov requested a review from a team February 7, 2020 17:57

halter73 reviewed Feb 8, 2020

View reviewed changes

dotnet deleted a comment from azure-pipelines bot Feb 8, 2020

antonfirsov added 8 commits February 11, 2020 22:04

Merge branch 'master' into af/testfix/dualmode

88c6d4e

Merge branch 'master' into af/testfix/dualmode

fa36bef

# Conflicts: # src/libraries/Common/tests/System/Net/Configuration.Sockets.cs

use DOTNET_TEST prefix

d3233c9

PortRange & utility methods

d063f37

temporarily disable OuterLoop for TestPortPoolTests

f421299

fix ParseCmdletOutputLinux

6ab1791

ConfiguredPortRange_DoesNotOverlapWith_OsDynamicPortRange

4ae280a

fine-tune TestPortPoolTests, improve comments

2d97ddb

remove unnecessary newline

de666af

antonfirsov commented Feb 17, 2020

View reviewed changes

tmds reviewed Feb 17, 2020

View reviewed changes

stephentoub closed this Sep 18, 2020

ghost locked as resolved and limited conversation to collaborators Dec 10, 2020

karelz modified the milestones: 5.0.0, 6.0.0 Jan 26, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix & re-enable DualMode socket tests #31923

Fix & re-enable DualMode socket tests #31923

antonfirsov commented Feb 7, 2020 •

edited

Loading

halter73 Feb 8, 2020

antonfirsov Feb 8, 2020 •

edited

Loading

halter73 Feb 8, 2020

antonfirsov Feb 8, 2020

antonfirsov commented Feb 8, 2020

azure-pipelines bot commented Feb 8, 2020

tmds commented Feb 11, 2020

antonfirsov commented Feb 11, 2020 •

edited

Loading

antonfirsov commented Feb 17, 2020

antonfirsov Feb 17, 2020

tmds Feb 17, 2020 •

edited

Loading

antonfirsov Feb 17, 2020

tmds Feb 25, 2020

antonfirsov Feb 25, 2020

stephentoub commented Sep 14, 2020

stephentoub commented Sep 18, 2020

Fix & re-enable DualMode socket tests #31923

Fix & re-enable DualMode socket tests #31923

Conversation

antonfirsov commented Feb 7, 2020 • edited Loading

halter73 Feb 8, 2020

Choose a reason for hiding this comment

antonfirsov Feb 8, 2020 • edited Loading

Choose a reason for hiding this comment

halter73 Feb 8, 2020

Choose a reason for hiding this comment

antonfirsov Feb 8, 2020

Choose a reason for hiding this comment

antonfirsov commented Feb 8, 2020

azure-pipelines bot commented Feb 8, 2020

tmds commented Feb 11, 2020

antonfirsov commented Feb 11, 2020 • edited Loading

antonfirsov commented Feb 17, 2020

antonfirsov Feb 17, 2020

Choose a reason for hiding this comment

tmds Feb 17, 2020 • edited Loading

Choose a reason for hiding this comment

antonfirsov Feb 17, 2020

Choose a reason for hiding this comment

tmds Feb 25, 2020

Choose a reason for hiding this comment

antonfirsov Feb 25, 2020

Choose a reason for hiding this comment

stephentoub commented Sep 14, 2020

stephentoub commented Sep 18, 2020

antonfirsov commented Feb 7, 2020 •

edited

Loading

antonfirsov Feb 8, 2020 •

edited

Loading

antonfirsov commented Feb 11, 2020 •

edited

Loading

tmds Feb 17, 2020 •

edited

Loading