-
Notifications
You must be signed in to change notification settings - Fork 29.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
test: improve flaky http2-stream-destroy-event-order #31590
test: improve flaky http2-stream-destroy-event-order #31590
Conversation
This doesn't properly fix the issue, it only significantly reduces (removes?) the flakiness of the test. The setTimeout must not be needed here. Specifically if the settings frame on the client is 'delayed' and only comes after we have already called `req.close(2)` then the 'close' event will be emitted before the stream has got a chance to write the rst to the underlying socket (even though FlushRstStream and SendPendingData have been called) and since we are destroying a session here it will never be written resulting in a 'clean' stream close on the server side and therefore a timeout because there will be no error.
CI: https://ci.nodejs.org/job/node-test-pull-request/28807/ |
😆 got hit by 3 other flaky tests Edit: stress test looks ok but some of the machines failed to build, should I just restart? |
I'm not sure what the expected state of compiling |
Alternative to nodejs#31590. It appears that the issue here is that the test falsely assumed that closing the client (which also currently destroys the socket rather than gracefully shutting down the connection) would still leave enough time for the server side to receive the stream error. Address that by explicitly waiting for the server side to receive the stream error before closing the client and the connection with it. Refs: nodejs#31590 Refs: nodejs#20750
I think #31610 fully fixes this, can you take a look? If my understanding of the situation is correct, then yes, this PR improves the situation by giving the server one more event loop turn to receive the relevant stream data, but I think explicitly waiting for it to arrive is also fine and more reliable. |
Alternative to nodejs#31590. It appears that the issue here is that the test falsely assumed that closing the client (which also currently destroys the socket rather than gracefully shutting down the connection) would still leave enough time for the server side to receive the stream error. Address that by explicitly waiting for the server side to receive the stream error before closing the client and the connection with it. Refs: nodejs#31590 Refs: nodejs#20750 PR-URL: nodejs#31610 Reviewed-By: Richard Lau <[email protected]> Reviewed-By: Colin Ihrig <[email protected]> Reviewed-By: Denys Otrishko <[email protected]> Reviewed-By: James M Snell <[email protected]> Reviewed-By: Luigi Pinca <[email protected]> Reviewed-By: Rich Trott <[email protected]>
Alternative to #31590. It appears that the issue here is that the test falsely assumed that closing the client (which also currently destroys the socket rather than gracefully shutting down the connection) would still leave enough time for the server side to receive the stream error. Address that by explicitly waiting for the server side to receive the stream error before closing the client and the connection with it. Refs: #31590 Refs: #20750 PR-URL: #31610 Reviewed-By: Richard Lau <[email protected]> Reviewed-By: Colin Ihrig <[email protected]> Reviewed-By: Denys Otrishko <[email protected]> Reviewed-By: James M Snell <[email protected]> Reviewed-By: Luigi Pinca <[email protected]> Reviewed-By: Rich Trott <[email protected]>
Alternative to #31590. It appears that the issue here is that the test falsely assumed that closing the client (which also currently destroys the socket rather than gracefully shutting down the connection) would still leave enough time for the server side to receive the stream error. Address that by explicitly waiting for the server side to receive the stream error before closing the client and the connection with it. Refs: #31590 Refs: #20750 PR-URL: #31610 Reviewed-By: Richard Lau <[email protected]> Reviewed-By: Colin Ihrig <[email protected]> Reviewed-By: Denys Otrishko <[email protected]> Reviewed-By: James M Snell <[email protected]> Reviewed-By: Luigi Pinca <[email protected]> Reviewed-By: Rich Trott <[email protected]>
Alternative to #31590. It appears that the issue here is that the test falsely assumed that closing the client (which also currently destroys the socket rather than gracefully shutting down the connection) would still leave enough time for the server side to receive the stream error. Address that by explicitly waiting for the server side to receive the stream error before closing the client and the connection with it. Refs: #31590 Refs: #20750 PR-URL: #31610 Reviewed-By: Richard Lau <[email protected]> Reviewed-By: Colin Ihrig <[email protected]> Reviewed-By: Denys Otrishko <[email protected]> Reviewed-By: James M Snell <[email protected]> Reviewed-By: Luigi Pinca <[email protected]> Reviewed-By: Rich Trott <[email protected]>
Alternative to #31590. It appears that the issue here is that the test falsely assumed that closing the client (which also currently destroys the socket rather than gracefully shutting down the connection) would still leave enough time for the server side to receive the stream error. Address that by explicitly waiting for the server side to receive the stream error before closing the client and the connection with it. Refs: #31590 Refs: #20750 PR-URL: #31610 Reviewed-By: Richard Lau <[email protected]> Reviewed-By: Colin Ihrig <[email protected]> Reviewed-By: Denys Otrishko <[email protected]> Reviewed-By: James M Snell <[email protected]> Reviewed-By: Luigi Pinca <[email protected]> Reviewed-By: Rich Trott <[email protected]>
This doesn't properly fix the issue, it only significantly reduces
(removes?) the flakiness of the test.
The setTimeout must not be needed here. Specifically if the settings
frame on the client is 'delayed' and only comes after we have already
called
req.close(2)
then the 'close' event will be emitted before thestream has got a chance to write the rst to the underlying socket (even
though FlushRstStream and SendPendingData have been called) and since we
are destroying a session here it will never be written resulting in a
'clean' stream close on the server side and therefore a timeout because
there will be no error.
Looks like we are not correctly waiting for the rst frame to be sent and I assume the time it takes to read the settings frame delays the write just enough to not get written at all. I will try to investigate this further but at least this should improve our CI so that it won't fail with this test almost every time.
This patch reduces the flakiness from 1 failed in 10 runs to 0/2000 on my windows machine.
The debug native logs can be found:
http2-destroy-event-order-success.txt
http2-destroy-event-order-fail.txt.
/cc @nodejs/testing @jasnell @addaleax
Checklist
make -j4 test
(UNIX), orvcbuild test
(Windows) passes