-
Notifications
You must be signed in to change notification settings - Fork 29.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
test: fix flaky test-http2-session-timeout #15338
test: fix flaky test-http2-session-timeout #15338
Conversation
I was able to run |
a6b0f34
to
c8e3767
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SGTM if CI passes.
We should be able to run a CI stress test now that @TimothyGu's patch landed. Could someone pull the trigger, please? |
Both of them failed all times, so maybe I got something wrong in kicking off the job. Can someone else have a look and start it again? |
c8e3767
to
3bd0870
Compare
It behaved as if the patch to the test runner didn't land. Not sure why that would be. (I've rebased this just in case but I can't imagine it's running my version of the test runner... is it?) |
I think it is (or was). It was using ref |
Well, that explains that. Thanks @Trott |
Whoops, those stress tests aren't really valid because they aren't running things concurrently. (Glad their green, though.) I'll kick off some more in a minute. |
Oh, this should include Fixes: #15326 as metadata. This is a note to whoever lands this. :-D |
Stress test on freebsd10-64 where it has been observed to fail: This PR (should be green): https://ci.nodejs.org/job/node-stress-single-test/1417/ master (should be red): https://ci.nodejs.org/job/node-stress-single-test/1418/ |
The stress test still fails with these changes. (EDIT: I stopped the one for this PR after 76 failures in 76 runs.) For comparison, here's a stress test on #15328: If that stress test passes but you still want the overall changes in this PR, hopefully the solution is to simply change the timeout values from If that stress test fails too, then it's time to look into things some more... |
The problem with comparing this to the other PR is that it doesn't actually test the timeout condition, as per my comment there. It only runs 40 requests 10ms (400ms total) apart and the timeout is 1200ms. I'll see what can be done to fix this PR. |
3bd0870
to
c47516c
Compare
Ok, changed up some more stuff. Should be able to handle concurrency better now. Can we run this again? Thanks so much! |
Is there a margin of failure we consider acceptable for stress tests? Looks like it's failing about 15-20% of the time now. I'm guessing if I bump up the timeout to something like 5000 and the calls to, say, 800 then it'll probably be fine on that system but it makes the test unnecessarily long on all platforms. Not sure if that's a big deal or not. |
c47516c
to
65e5a0b
Compare
|
65e5a0b
to
3103f89
Compare
@mcollina I just checked the CI and the commit it's testing is the one from the other PR. I reverted my changes since I don't think they got tested. |
So here are the important parameters to use to kick of a CI stress test for this PR:
|
Still failing. I think moving the test from |
A lot closer this time but yeah, still failing. Will think about it a bit and see if I can revise it in any way, if not we'll just have to move it to sequential. Sorry for the hassle, everyone. |
No worries at all @apapirovski ! |
3103f89
to
0c22399
Compare
Updated to move to sequential for now. Will revisit later potentially but all the http timeout tests are there so it might not be possible anyway. |
Question that can be ignored if you wish in the interest of getting the fix landed quickly: Since it's in sequential, does it make sense to go back to a 200ms timeout or at least reduce it somewhat from 2000 to speed up the test? |
@Trott Yeah, we can definitely reduce it. I'll put it at 200 for the server & 20 for the call. Did you want to run the CI on the updated version once I put it up or is that run above enough? |
Increase server timeout, reduce frequency of calls and unbind timeout after runs are done in order to avoid race conditions. Temporarily moved to sequential. Fixes: nodejs#15326
Yeah, let's run CI again after any changes. |
0c22399
to
0113f2e
Compare
Ok, timing updates are available to test now. Thanks @Trott! |
CI (again): https://ci.nodejs.org/job/node-test-pull-request/10118/ The failure seemed unrelated, but it would be better to check. |
Failure looks unrelated, this is the test that failed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Landed as de51717. |
Increase server timeout, reduce frequency of calls and unbind timeout after runs are done in order to avoid race conditions. Temporarily moved to sequential. Fixes: #15326 PR-URL: #15338 Reviewed-By: Matteo Collina <[email protected]> Reviewed-By: James M Snell <[email protected]> Reviewed-By: Yuta Hiroto <[email protected]> Reviewed-By: Tobias Nießen <[email protected]>
Increase server timeout, reduce frequency of calls and unbind timeout after runs are done in order to avoid race conditions. Temporarily moved to sequential. Fixes: #15326 PR-URL: #15338 Reviewed-By: Matteo Collina <[email protected]> Reviewed-By: James M Snell <[email protected]> Reviewed-By: Yuta Hiroto <[email protected]> Reviewed-By: Tobias Nießen <[email protected]>
Increase server timeout, reduce frequency of calls and unbind timeout after runs are done in order to avoid race conditions. Temporarily moved to sequential. Fixes: nodejs/node#15326 PR-URL: nodejs/node#15338 Reviewed-By: Matteo Collina <[email protected]> Reviewed-By: James M Snell <[email protected]> Reviewed-By: Yuta Hiroto <[email protected]> Reviewed-By: Tobias Nießen <[email protected]>
Increase server timeout, reduce frequency of calls and unbind timeout after runs are done in order to avoid race conditions. Temporarily moved to sequential. Fixes: nodejs/node#15326 PR-URL: nodejs/node#15338 Reviewed-By: Matteo Collina <[email protected]> Reviewed-By: James M Snell <[email protected]> Reviewed-By: Yuta Hiroto <[email protected]> Reviewed-By: Tobias Nießen <[email protected]>
This is an alternative solution to #15328. Increases server timeout, reduces frequency of calls and unbinds timeout after runs are done in order to avoid race conditions.
cc @Trott @mcollina @tniessen
Checklist
make -j4 test
(UNIX), orvcbuild test
(Windows) passesAffected core subsystem(s)
test