-
Notifications
You must be signed in to change notification settings - Fork 29.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
test: fix flaky test-http-chunked-304 on SmartOS #3903
test: fix flaky test-http-chunked-304 on SmartOS #3903
Conversation
e7efad6
to
ade9bfb
Compare
/cc @indutny since he wrote the test and it has had no substantial modification by anyone else until now. CI stress test with this change: CI stress test without this change (should show failures):
CI for this PR: |
What exactly happens on SmartOS? How could we get ECONNREFUSED after starting the server? |
@indutny: Does the explanation at the bottom of https://smartos.org/bugview/OS-2767 offer a plausible mechanism for spurious ECONNREFUSED in this situation? |
Bad news anyway: The stress test still gets failures with this fix in place.
|
Gosh, really? I think most of the tests should be broken on SmartOS then. Perhaps we should mark the whole platform as flaky, or restart tests on ECONNRESET? |
Hmmm. Now that I'm at home and looking at the code, it kinda makes sense to me that, since we're talking about "connection refused", the "connect" event isn't called at all in that case, and since the "error" listener only gets bound inside the listener... yeah. Let me just make another change. I wish this issue was easier to test. |
OK, in that case, I'm going to terminate the running CI because it's tying up other tests that are waiting for a 32-bit SmartOS machine. |
ade9bfb
to
210b623
Compare
Yeah, good call. Just made a commit, I'd say if a fail comes up, also terminate it and look for an alternate solution. |
CI stress test for new version: https://ci.nodejs.org/job/node-stress-single-test/23/nodes=smartos14-32/console |
@@ -24,20 +24,39 @@ function test(statusCode, next) { | |||
}); | |||
|
|||
server.listen(common.PORT, function() { | |||
var conn = net.createConnection(common.PORT, function() { | |||
conn.write('GET / HTTP/1.1\r\n\r\n'); | |||
var |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Linting fails due to trailing space on this line. Can you fix that and use make jslint
to confirm no other oddities slipped in?
Stress test looks good. Can you fix the minor linting problem? |
SmartOS has an issue where it will trigger ECONNREFUSED when it should not. See https://smartos.org/bugview/OS-2767. This change adds logic to test-http-chunked-304 to work around the issue. See also similar issue: nodejs#2663 Fixes: nodejs#3864
210b623
to
89d67c3
Compare
Oops, sorry, still getting used to the project. Fixed and jslint is coming up clear. |
@Trott why do we want to land this? Why other tests are not failing with ECONNRESET? |
@indutny I'll take your second question first:
(It's We definitely see it on other tests. In #2663 we saw it a lot on It doesn't pop up as often on other tests probably because they typically only open a few connections and not hundreds. But we do see it. Here are two more occurrences on different tests since this test was designated flaky a few days ago.
It's a way to let the test not be flaky on SmartOS. I admit that part of it doesn't pass the smell test for me. The retry logic seemed OK in the max-server-connections test mentioned above, but to move it to a bunch of other tests...
Other ideas? |
This is what I propose cc @bnoordhuis |
Even if we go that way I'd suggest keeping the change that extracts the listener bindings out of the connect listener, and also keeping the error handling - basically everything in this PR except the reconnect logic (on a new PR). |
What's the reasoning of this @fansworld-claudio ? The test appears to be working fine everywhere else. |
Making it easier to implement future changes I guess, mainly. It's true that it works right now, in the context it's executed. I guess not then. |
Well, it is a test, so I'm not sure if future-proofing it is reasonable. Sorry for this! |
Don't worry, like I said I'm still getting used to the project and it makes perfect sense. |
Unfortunately, we have two SmartOS builds in the test CI and both of them are affected. This would seem to only affect tests where a server is set up and then an attempt is made to connect to that server. That's a lot of tests under
The standard three-or-four lines I'm talking about look like this:
This will be a lot of tests in the end, and we will want to be able to roll them all back should the bug get fixed in SmartOS. So we'll want to make sure the reason provided for skipping is identical in all the tests so that we can find them easily. |
We could just retry test several times if it fails with ECONNREFUSED, maybe 2 or 3 times before failing |
Retrying the test if we're on SmartOS and get an ECONNREFUSED is basically what this PR does, isn't it? |
@Trott I mean doing this automatically on CI, regardless of particular test name. Again, there are lots of tests with similar semantics, and doing this hack for all of them seems to be pointless to me. |
That would probably mean putting the logic for it in the Python test wrapper/harness. That should work. |
Exactly! ;) |
Closing this, then. |
SmartOS has an issue where it will trigger ECONNREFUSED when it
should not. See https://smartos.org/bugview/OS-2767.
This change adds logic to test-http-chunked-304 to work around
the issue. See also similar issue: #2663
Fixes: #3864