-
Notifications
You must be signed in to change notification settings - Fork 29.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
test-debug-signal-cluster.js fails sometimes (investigate flaky test) #3796
Comments
Would you mind opening this as a proper PR? |
Is there a howto? I'm usually use git on CLI, only (and try to avoid the chaotic github UI). |
@jelmd you can start by forking this repository. Then create a branch, apply your patch, push the changes to GitHub and finally create the pull request. |
Isn't it easier, when you just do it? Just forking, doing all the curios things just for the thing which already exists and than let it rot seems to be a little bit overkill, waste of resources. |
@jelmd ... all changes in the source are handled through pull requests and all pull requests need someone to open them ;-) |
@jelmd If you could provide your fix in a gist, I can make a PR. |
something like this? https://gist.github.com/jelmd/c1c1cd7d1c49a3385adf |
Haven't build a new version yet. Due to the lack of openssl 1.0.1 support it takes me a considerable amount of time to go over the changes of new versions and make it 1.0.1 compatible. That's why my current plan is to wait for the version, which incorporates the fips mode changes, so that it is worth to spent time on it... |
@jelmd Any chance you've had an opportunity to verify this bug with a recent version? |
Well, just saw this on OS X on CI, so I guess this is now our "investigate flaky test-debug-signal-cluster on OS X" issue. :-/ https://ci.nodejs.org/job/node-test-commit-osx/4041/nodes=osx1010/console
|
Although it does seem that the CI failure is different than what's described here? |
Stress test on all platforms: https://ci.nodejs.org/job/node-stress-single-test/793/ |
Stress test was clean except for a build failure. |
tested 6.3.1: seems to work. |
This seems to have failed on AIX on master last night: https://ci.nodejs.org/job/node-test-commit-aix/nodes=aix61-ppc64/321/console not ok 199 parallel/test-debug-signal-cluster # got pids [8978502,5701842,8257694] # # assert.js:89 # throw new assert.AssertionError({ # ^ # AssertionError: test timed out. # at Timeout.testTimedOut [as _onTimeout] (/home/iojs/build/workspace/node-test-commit-aix/nodes/aix61-ppc64/test/parallel/test-debug-signal-cluster.js:53:3) # at Timer.unrefdHandle (timers.js:462:14) # > all workers are running # > Starting debugger agent. # > Debugger listening on [::]:12347 # > Starting debugger agent. # > Starting debugger agent. # > Debugger listening on [::]:12349Debugger listening on [::]:12348 --- |
We now have adequate AIX hardware to add AIX to the regular regression runs. However, there are a couple of failing tests even though AIX was green at one point. This PR marks those tests as flaky so that we can add AIX so that we can spot any new regressions without making the builds RED The tests are being worked under the following PRs - being worked under nodejs#7564 test-async-wrap-post-did-throw test-async-wrap-throw-from-callback test-crypto-random - being worked under nodejs#7973 test-stdio-closed - covered by nodejs#3796 test-debug-signal-cluster
We now have adequate AIX hardware to add AIX to the regular regression runs. However, there are a couple of failing tests even though AIX was green at one point. This PR marks those tests as flaky so that we can add AIX so that we can spot any new regressions without making the builds RED The tests are being worked under the following PRs - being worked under #7564 test-async-wrap-post-did-throw test-async-wrap-throw-from-callback test-crypto-random - being worked under #7973 test-stdio-closed - covered by #3796 test-debug-signal-cluster PR-URL: #8065 Reviewed-By: joaocgreis - João Reis <[email protected]> Reviewed-By: Rich Trott <[email protected]> Reviewed-By: Anna Henningsen <[email protected]>
We now have adequate AIX hardware to add AIX to the regular regression runs. However, there are a couple of failing tests even though AIX was green at one point. This PR marks those tests as flaky so that we can add AIX so that we can spot any new regressions without making the builds RED The tests are being worked under the following PRs - being worked under #7564 test-async-wrap-post-did-throw test-async-wrap-throw-from-callback test-crypto-random - being worked under #7973 test-stdio-closed - covered by #3796 test-debug-signal-cluster PR-URL: #8065 Reviewed-By: joaocgreis - João Reis <[email protected]> Reviewed-By: Rich Trott <[email protected]> Reviewed-By: Anna Henningsen <[email protected]>
@mhdawson Is there any chance one or both of the two if (setvbuf(stdout, nullptr, _IONBF, 0) != 0) {
fprintf(stderr, "Could not unset buffering on stdout.\n");
exit(1);
}
if (setvbuf(stderr, nullptr, _IONBF, 0) != 0) {
fprintf(stderr, "Could not unset buffering on stderr.\n");
exit(1);
} I'm trying to figure out how the output you pasted above could be generated and that's my best guess without any more information. (Aside, although I put this in the #node-build IRC channel already a few minutes ago so now I'm probably just being annoying: Can we add AIX to the node-stress-single-test task so I can do stuff like this myself easily? /cc @joaocgreis) |
By the way, saw the same failure this morning on SmartOS: https://ci.nodejs.org/job/node-test-commit-smartos/3917/nodes=smartos14-64/console
Stress test on SmartOS with the C++ change in my previous comment to see if |
SmartOS failed the same way even though |
cfe76f2 reintroduced the no buffering mode for Running
Because this test creates several concurrent processes running the debug agent, these write calls can end up being interleaved, and the output does not correspond to what the test expects. I'm not sure this is a bug in the way SmartOS handles unbuffered I/O, because passing So I would lean towards thinking that the test should be rewritten to accept output that is not properly synchronized between the processes it creates, or have these processes synchronize their output. |
Do not assume any order and buffering/atomicity of output from child processes' debugger agents. Fixes nodejs#3796.
See #8568 for a PR that makes |
It seems, that many processes write at the "same" time to stderr, and thus this test fails from time to time: When it failes, I always got something like 'Debugger listening on port 12389Debugger listening on port 12390\n\n', i.e. the msg from another process got inserted before the of the message of another process msg.
To make it more reliable (in my case it now succeeds always), I changed it to collect the output w/o any <LF|CR> and finally just substract the expected lines (see http://iws.cs.uni-magdeburg.de/~elkner/tmp/node5/test-concur.patch). I know, as long as the write to stderr doesn't get synced, this test may always fail, however, it now seems to work better.
The text was updated successfully, but these errors were encountered: