-
Notifications
You must be signed in to change notification settings - Fork 29.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Investigate flaky sequential/test-child-process-execsync on AIX #24921
Comments
this was reported way back in 2015 through nodejs/node-v0.x-archive#9444 and was fixed through #1214 , don't know what has changed recently. I will have a look. |
ok, spent some time on this. there are 2 puzzles to solve: the stack message (`/bin/sh: iamabadcommand: not found.) indicates that the failure is in this section, node/test/sequential/test-child-process-execsync.js Lines 63 to 65 in 4aabd7e
where as the actual stack of the failure suggests it is at
at which point the actual command that is run as a child is NOT So this warrants a question about at what point the python parent captures the data pertinent to
I will run more with |
I ran a 10K times locally, and no sign of failure. I don't think running more is worth. So I conclude this has to do with the resource (CPU, memory, and other user limits) on CI. @Trott - is it possible for you to get me |
|
Woot! Fixed my ansible problem. So here you go: $ ssh test-osuosl-aix61-ppc64_be-1 ulimit -a
time(seconds) unlimited
file(blocks) 2097151
data(kbytes) 131072
stack(kbytes) 32768
memory(kbytes) 32768
coredump(blocks) 2097151
nofiles(descriptors) 2000
threads(per process) unlimited
processes(per user) unlimited
$ |
https://ci.nodejs.org/job/node-test-commit-aix/19688/nodes=aix61-ppc64/consoleText test-osuosl-aix61-ppc64_be-1 not ok 2383 sequential/test-child-process-execsync
---
duration_ms: 2.473
severity: fail
exitcode: 1
stack: |-
/bin/sh: iamabadcommand: not found.
assert.js:86
throw new AssertionError(obj);
^
AssertionError [ERR_ASSERTION]: Expected values to be strictly equal:
null !== 1
at Object.assert.throws (/home/iojs/build/workspace/node-test-commit-aix/nodes/aix61-ppc64/test/sequential/test-child-process-execsync.js:134:12)
at expectedException (assert.js:568:19)
at expectsError (assert.js:663:16)
at Function.throws (assert.js:694:3)
at Object.<anonymous> (/home/iojs/build/workspace/node-test-commit-aix/nodes/aix61-ppc64/test/sequential/test-child-process-execsync.js:127:10)
at Module._compile (internal/modules/cjs/loader.js:718:30)
at Object.Module._extensions..js (internal/modules/cjs/loader.js:729:10)
at Module.load (internal/modules/cjs/loader.js:617:32)
at tryModuleLoad (internal/modules/cjs/loader.js:560:12)
at Function.Module._load (internal/modules/cjs/loader.js:552:3)
... |
And again. Probably time to mark this as flaky. https://ci.nodejs.org/job/node-test-commit-aix/19692/nodes=aix61-ppc64/console test-osuosl-aix61-ppc64_be-2 21:40:41 not ok 2383 sequential/test-child-process-execsync
21:40:41 ---
21:40:41 duration_ms: 5.83
21:40:41 severity: fail
21:40:41 exitcode: 1
21:40:41 stack: |-
21:40:41 /bin/sh: iamabadcommand: not found.
21:40:41 assert.js:86
21:40:41 throw new AssertionError(obj);
21:40:41 ^
21:40:41
21:40:41 AssertionError [ERR_ASSERTION]: Expected values to be strictly deep-equal:
21:40:41 + actual - expected
21:40:41
21:40:41 + null
21:40:41 - 'SIGILL'
21:40:41 at spawnSyncKeys.forEach (/home/iojs/build/workspace/node-test-commit-aix/nodes/aix61-ppc64/test/sequential/test-child-process-execsync.js:138:14)
21:40:41 at Array.forEach (<anonymous>)
21:40:41 at Object.assert.throws (/home/iojs/build/workspace/node-test-commit-aix/nodes/aix61-ppc64/test/sequential/test-child-process-execsync.js:136:19)
21:40:41 at expectedException (assert.js:568:19)
21:40:41 at expectsError (assert.js:663:16)
21:40:41 at Function.throws (assert.js:694:3)
21:40:41 at Object.<anonymous> (/home/iojs/build/workspace/node-test-commit-aix/nodes/aix61-ppc64/test/sequential/test-child-process-execsync.js:127:10)
21:40:41 at Module._compile (internal/modules/cjs/loader.js:718:30)
21:40:41 at Object.Module._extensions..js (internal/modules/cjs/loader.js:729:10)
21:40:41 at Module.load (internal/modules/cjs/loader.js:617:32)
21:40:41 ... |
Refs: nodejs#24921 PR-URL: nodejs#25031 Reviewed-By: Gireesh Punathil <[email protected]> Reviewed-By: Colin Ihrig <[email protected]> Reviewed-By: Bradley Farias <[email protected]> Reviewed-By: Richard Lau <[email protected]>
so the first core dump arrived, but before that; want to clear the confusion in my earlier comment So in summary Now, on the failure: (dbx) where
.() at 0x0
array-buffer-collector._ZN2v88internal20CancelableLambdaTaskIZNS0_20ArrayBufferCollector15FreeAllocationsEvEUlvE_E11RunInternalEv(??) at 0x1011d770c
_ZN2v88internal14CancelableTask3RunEv(??) at 0x100039304
node_platform._ZN4node12_GLOBAL__N_1L20PlatformWorkerThreadEPv(??) at 0x1001e6990
(dbx) this is here,
and we have a wild branch - IAR 0. (dbx) x Value in Link Register (dbx) (0x1011d76fc)/10i
0x1011d76fc (_ZN2v88internal20CancelableLambdaTaskIZNS0_20ArrayBufferCollector15FreeAllocationsEvEUlvE_E11RunInternalEv+0x35c) e9490000 ld r10,0x0(r9)
0x1011d7700 (_ZN2v88internal20CancelableLambdaTaskIZNS0_20ArrayBufferCollector15FreeAllocationsEvEUlvE_E11RunInternalEv+0x360) e9690010 ld r11,0x10(r9)
0x1011d7704 (_ZN2v88internal20CancelableLambdaTaskIZNS0_20ArrayBufferCollector15FreeAllocationsEvEUlvE_E11RunInternalEv+0x364) 7d4903a6 mtctr r10
0x1011d7708 (_ZN2v88internal20CancelableLambdaTaskIZNS0_20ArrayBufferCollector15FreeAllocationsEvEUlvE_E11RunInternalEv+0x368) e8490008 ld r2,0x8(r9)
0x1011d770c (_ZN2v88internal20CancelableLambdaTaskIZNS0_20ArrayBufferCollector15FreeAllocationsEvEUlvE_E11RunInternalEv+0x36c) 4e800421 bctrl
(dbx) Without reading through ~500 instructions it is difficult to pin-point which construct in the source maps to this. However, wild branch typically indicates NULL function pointers, so in this case the lambda. Is it another manifestation of the exit race with destructors, or a new one altogether? Will study more core files. |
Unfortunately all subsequent core files show the same pattern at the same location. So the reason for this type of crash is a wild branch to a NULL lambda target. To establish its relation with exit race, I changed exit call with underscored exit in
So with that, I want to establish that this is root caused by the same issue reported in #25007 |
Refs: #24921 PR-URL: #25031 Reviewed-By: Gireesh Punathil <[email protected]> Reviewed-By: Colin Ihrig <[email protected]> Reviewed-By: Bradley Farias <[email protected]> Reviewed-By: Richard Lau <[email protected]>
Seeing similar looking failure on bsd https://ci.nodejs.org/job/node-test-commit-freebsd/23046/ |
* move `start` time to the point of execution (avoids counting 'throws' tests towards 'timeout' test case) * scope cmd/ret values where possible * use `filter` instead of manual if/return PR-URL: nodejs#25227 Refs: nodejs#24921 Reviewed-By: Anna Henningsen <[email protected]> Reviewed-By: Luigi Pinca <[email protected]> Reviewed-By: James M Snell <[email protected]>
* move `start` time to the point of execution (avoids counting 'throws' tests towards 'timeout' test case) * scope cmd/ret values where possible * use `filter` instead of manual if/return PR-URL: #25227 Refs: #24921 Reviewed-By: Anna Henningsen <[email protected]> Reviewed-By: Luigi Pinca <[email protected]> Reviewed-By: James M Snell <[email protected]>
Refs: nodejs#24921 PR-URL: nodejs#25031 Reviewed-By: Gireesh Punathil <[email protected]> Reviewed-By: Colin Ihrig <[email protected]> Reviewed-By: Bradley Farias <[email protected]> Reviewed-By: Richard Lau <[email protected]>
* move `start` time to the point of execution (avoids counting 'throws' tests towards 'timeout' test case) * scope cmd/ret values where possible * use `filter` instead of manual if/return PR-URL: nodejs#25227 Refs: nodejs#24921 Reviewed-By: Anna Henningsen <[email protected]> Reviewed-By: Luigi Pinca <[email protected]> Reviewed-By: James M Snell <[email protected]>
the test is active for the last one week with no failures, so as expected #25061 has fixed the underlying issue. Closing. |
Refs: #24921 PR-URL: #25031 Reviewed-By: Gireesh Punathil <[email protected]> Reviewed-By: Colin Ihrig <[email protected]> Reviewed-By: Bradley Farias <[email protected]> Reviewed-By: Richard Lau <[email protected]>
Refs: #24921 PR-URL: #25031 Reviewed-By: Gireesh Punathil <[email protected]> Reviewed-By: Colin Ihrig <[email protected]> Reviewed-By: Bradley Farias <[email protected]> Reviewed-By: Richard Lau <[email protected]>
Refs: #24921 PR-URL: #25031 Reviewed-By: Gireesh Punathil <[email protected]> Reviewed-By: Colin Ihrig <[email protected]> Reviewed-By: Bradley Farias <[email protected]> Reviewed-By: Richard Lau <[email protected]>
* move `start` time to the point of execution (avoids counting 'throws' tests towards 'timeout' test case) * scope cmd/ret values where possible * use `filter` instead of manual if/return PR-URL: #25227 Refs: #24921 Reviewed-By: Anna Henningsen <[email protected]> Reviewed-By: Luigi Pinca <[email protected]> Reviewed-By: James M Snell <[email protected]>
* move `start` time to the point of execution (avoids counting 'throws' tests towards 'timeout' test case) * scope cmd/ret values where possible * use `filter` instead of manual if/return PR-URL: #25227 Refs: #24921 Reviewed-By: Anna Henningsen <[email protected]> Reviewed-By: Luigi Pinca <[email protected]> Reviewed-By: James M Snell <[email protected]>
* move `start` time to the point of execution (avoids counting 'throws' tests towards 'timeout' test case) * scope cmd/ret values where possible * use `filter` instead of manual if/return PR-URL: #25227 Refs: #24921 Reviewed-By: Anna Henningsen <[email protected]> Reviewed-By: Luigi Pinca <[email protected]> Reviewed-By: James M Snell <[email protected]>
https://ci.nodejs.org/job/node-test-commit-aix/19533/nodes=aix61-ppc64/console
Host: test-osuosl-aix61-ppc64_be-1
The text was updated successfully, but these errors were encountered: