Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test: flaky parallel/test-net-connect-local-error on macOS & freeBSD #12950

Closed
refack opened this issue May 10, 2017 · 13 comments
Closed

test: flaky parallel/test-net-connect-local-error on macOS & freeBSD #12950

refack opened this issue May 10, 2017 · 13 comments
Labels
freebsd Issues and PRs related to the FreeBSD platform. macos Issues and PRs related to the macOS platform / OSX. net Issues and PRs related to the net subsystem. test Issues and PRs related to the tests.

Comments

@refack
Copy link
Contributor

refack commented May 10, 2017

  • Version: master
  • Platform: macOS,freeBSD
  • Subsystem: test

https://ci.nodejs.org/job/node-test-commit-freebsd/8933/nodes=freebsd11-x64/tapResults/
https://ci.nodejs.org/job/node-test-commit-osx/9696/nodes=osx1010/tapResults/

738	parallel/test-net-connect-local-error	
duration_ms	
0.162
severity	
fail
stack	
Mismatched onError function calls. Expected 1, actual 0.
    at Object.<anonymous> (/Users/iojs/build/workspace/node-test-commit-osx/nodes/osx1010/test/parallel/test-net-connect-local-error.js:15:27)
    at Module._compile (module.js:569:30)
    at Object.Module._extensions..js (module.js:580:10)
    at Module.load (module.js:503:32)
    at tryModuleLoad (module.js:466:12)
    at Function.Module._load (module.js:458:3)
    at Function.Module.runMain (module.js:605:10)
    at startup (bootstrap_node.js:144:16)
    at bootstrap_node.js:537:3
@refack refack added macos Issues and PRs related to the macOS platform / OSX. test Issues and PRs related to the tests. net Issues and PRs related to the net subsystem. labels May 10, 2017
@refack refack changed the title test: flaky parallel/test-net-connect-local-error on macOS test: flaky parallel/test-net-connect-local-error on macOS & freeBSD May 10, 2017
@refack refack added the freebsd Issues and PRs related to the FreeBSD platform. label May 10, 2017
@sebastianplesciuc
Copy link

Can I take this one?

@refack
Copy link
Contributor Author

refack commented May 10, 2017

@sebastianplesciuc you like these ones 😉 . dibs is yours.
P.S. I've been told that the use of common.PORT is fragile in macOS / freeBSD.
There's a general goal to use 0 as port (if possible) and then figure out which port the OS assigned.

@refack
Copy link
Contributor Author

refack commented May 10, 2017

Ref: #12376

@sebastianplesciuc
Copy link

I think that's the case here as well.

I think there are two issues with this, one is that server.close() might be called before the client's callback which is the error on line 15 I believe.

Another issue might arise with port + 1. We have no guarantee that this is an empty port when this runs. A way of fixing this might be creating another server with port 0 and assigning it's port to the client socket in it's close callback. Thoughts? Is it guaranteed to be in use while the callback is running from net.Server close event?

@refack
Copy link
Contributor Author

refack commented May 10, 2017

  1. The close is definitely in a bad place
  2. When net.Server.close call the cb the server will not answer anymore, so it's a good time to try to connect to it (to get an error)
    Another option is to loop port + 1...5, if that's simpler.

Worse case we can move it to /sequential/ where it will run alone.

@refack
Copy link
Contributor Author

refack commented May 12, 2017

@sebastianplesciuc You might appreciate this. I've been looking at this issue's brother #12951.
These tests are run in parallel first this one, then the other.
In the other one there's a server that's supposed to receive 6 connections, instead it received 7.
I wonder where that 7th request comes from 🤣

@sebastianplesciuc
Copy link

@refack I'm not sure I get it :) You're suggesting that the connection in that test comes from this one? How does that work?

@sebastianplesciuc
Copy link

@refack OOOOH! I think I get it now. So when that test fails, this test fails because it connects from this test to the server from that test. So fixing this test would fix that one?

@refack
Copy link
Contributor Author

refack commented May 13, 2017

Yep

@aqrln
Copy link
Contributor

aqrln commented May 14, 2017

@refack

P.S. I've been told that the use of common.PORT is fragile in macOS / freeBSD.
There's a general goal to use 0 as port (if possible) and then figure out which port the OS assigned.

Using 0 to choose a random port can be fragile on BSD systems (including macOS) too, because of how SO_REUSEADDR works in their implementations of TCP. Recently I observed test/parallel/test-http-extra-response failing on my machine trying to communicate with the server of an autocompletion plugin for Vim instead of the server created in the test itself. The problem is that when a process binds to port 0 with SO_REUSEADDR, the kernel may give it a port that another process is already listening on if one of them binds to a concrete address (like 127.0.0.1) and the other — to a wildcard one. The logic that the kernel uses to route requests between them is not very clear to me.

/cc @nodejs/testing

@Trott
Copy link
Member

Trott commented May 14, 2017

P.S. I've been told that the use of common.PORT is fragile in macOS / freeBSD.

Hmmm, this sounds like there's a misunderstanding somewhere. There's nothing macOS or FreeBSD specific about the common.PORT stuff as far as I know. And there's nothing fragile about common.PORT.

The reason we don't typically want to use common.PORT in parallel is to prevent this scenario:

Test A uses port 0 for something. Test B uses common.PORT for something. Test A starts to run first. The OS gives it an open port that just happens to be the same value as common.PORT will be when Test B runs. Test B starts to run but Test A is already using the port it requests via common.PORT and so Test B fails.

If Test B were in sequential rather than parallel, then there wouldn't be a problem.

The port collision scenario described above may sound unlikely, but it has been observed in CI results (or at least things that look suspiciously like it).

@Trott
Copy link
Member

Trott commented May 14, 2017

Also: common.PORT and common.PORT + 1 in sequential is safe (assuming no non-Node.js processes grab the ports) so you could just consider moving to sequential if the logic gets too hairy otherwise. It's probably not anyone's first choice for a solution, but it probably shouldn't be the last choice either :-D

Also also, and this starts to enter rant territory so feel free to tune out right now: There seems to be a reluctance to move tests to sequential out of concerns for making the test suite run longer. I know I've certainly been reluctant. I believe now that these concerns are massively overblown. First, a lot of CI hosts (most?) don't run any tests in parallel and these include our absolute far-and-away slowest hosts. So moving something to sequential will have no impact whatsoever on the test suite duration for the slowest hosts. And on the faster hosts, we're generally talking about shaving off something on the order of a tenth of a second. And a lot of that benefit is probably lost by the introduction of somewhat more complicated code to make it safe to run in parallel. So we're making the code more complex but not actually getting any benefit out of it. Or at least, that's my theory. It would be fun to compare how long the test suite takes if everything in parallel were moved to sequential. I might do that right now.

UPDATE (still in rant mode, so feel free to skip): Ran the test benchmark described above. Normal parallel test run on my computer was about 90 seconds. Running the same tests sequentially took about 250 seconds. That's pretty good, but it's worth noting that with 1333 tests, each test moved from parallel to sequential will cost about 120ms. The question becomes: Is the added code complexity worth saving 120ms on a test suite that (once you add in the addon tests and tests in sequential and so on) takes about 2.5 minutes to run. For any significant complexity increase, the answer is always going to be "no".

@refack
Copy link
Contributor Author

refack commented May 15, 2017

There seems to be a reluctance to move tests to sequential ... each test moved from parallel to sequential will cost about 120ms

@Trott IMHO it's not just the perf cost; it's the implicit assumption that testing in parallel is more vigorous, hence more rigorous. which is kinda true. A good mechanism could be to add a fourth flag to parallel.status: TRY_SEQ, to flag the harness that for a particular test is it fail in parallel, decision should be relegated until the test is run alone.
[that was me being grandiloquent]

@refack refack closed this as completed in cf30d5e May 16, 2017
refack pushed a commit to refack/node that referenced this issue May 16, 2017
Fixed test-net-connect-local-error by moving the test from
parallel to sequential.

PR-URL: nodejs#12964
Fixes: nodejs#12950
Reviewed-By: Refael Ackermann <[email protected]>
Reviewed-By: Rich Trott <[email protected]>
anchnk pushed a commit to anchnk/node that referenced this issue May 19, 2017
Fixed test-net-connect-local-error by moving the test from
parallel to sequential.

PR-URL: nodejs#12964
Fixes: nodejs#12950
Reviewed-By: Refael Ackermann <[email protected]>
Reviewed-By: Rich Trott <[email protected]>
MylesBorins pushed a commit that referenced this issue Jun 22, 2017
Fixed test-net-connect-local-error by moving the test from
parallel to sequential.

PR-URL: #12964
Fixes: #12950
Reviewed-By: Refael Ackermann <[email protected]>
Reviewed-By: Rich Trott <[email protected]>
MylesBorins pushed a commit that referenced this issue Jul 11, 2017
Fixed test-net-connect-local-error by moving the test from
parallel to sequential.

PR-URL: #12964
Fixes: #12950
Reviewed-By: Refael Ackermann <[email protected]>
Reviewed-By: Rich Trott <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
freebsd Issues and PRs related to the FreeBSD platform. macos Issues and PRs related to the macOS platform / OSX. net Issues and PRs related to the net subsystem. test Issues and PRs related to the tests.
Projects
None yet
Development

No branches or pull requests

4 participants