Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Assertion in parallel/test-trace-events-dynamic-enable from libuv #23675

Closed
AdamMajer opened this issue Oct 15, 2018 · 9 comments
Closed

Assertion in parallel/test-trace-events-dynamic-enable from libuv #23675

AdamMajer opened this issue Oct 15, 2018 · 9 comments
Labels
trace_events Issues and PRs related to V8, Node.js core, and userspace code trace events.

Comments

@AdamMajer
Copy link
Contributor

  • Version: 10.12.0
  • Platform: Linux
  • Subsystem: unit tests / libuv

While running NodeJS build and then test-ci make target under OBS (Open Build Service), I've seen crashes, all like,

https://build.opensuse.org/package/live_build_log/devel:languages:nodejs/nodejs10/openSUSE_Tumbleweed/i586

[ 3814s] not ok 1835 parallel/test-trace-events-dynamic-enable
[ 3814s] ---
[ 3814s] duration_ms: 0.268
[ 3814s] severity: crashed
[ 3814s] exitcode: -6
[ 3814s] stack: |-
[ 3814s] node10: ../deps/uv/src/unix/core.c:280: uv__finish_close: Assertion `0' failed.
[ 3814s] ...

The same build under different environment seems to succeed, for example with

https://build.opensuse.org/package/show/devel:languages:nodejs/nodejs10

So far I've been unable to isolate why this could be happening.

@jasnell
Copy link
Member

jasnell commented Oct 15, 2018

I believe this is a race condition that @ofrobots just fixed in master last week. Can you check this against master and see if you are continuing to see the same issue?

@jasnell jasnell added the trace_events Issues and PRs related to V8, Node.js core, and userspace code trace events. label Oct 15, 2018
@AdamMajer
Copy link
Contributor Author

As of 3 hours ago, it's still happening with master, but only on x86_64, while 32-bit is passing...

https://build.opensuse.org/package/show/devel:languages:nodejs:staging/nodejs42

[ 769s] not ok 1852 parallel/test-trace-events-dynamic-enable
[ 769s] ---
[ 769s] duration_ms: 0.181
[ 769s] severity: crashed
[ 769s] exitcode: -6
[ 769s] stack: |-
[ 769s] node42: ../deps/uv/src/unix/core.c:280: uv__finish_close: Assertion `0' failed.

I've triggered a rebuild to see if it happens again. I've had problems reproducing this problem outside of the open build service (generally, QEMU build environment without network)

@AdamMajer
Copy link
Contributor Author

That build is based on b2e4cb7

@ofrobots
Copy link
Contributor

This seems distinct from #22865 that I fixed recently.

@AdamMajer if there is a core-dump you're able to capture (with master), that would be awesome. I suspect this is flaky and related to timing and my guess is that it would not easily reproduce outside your QEMU environment.

@AdamMajer
Copy link
Contributor Author

I've been able to generate consistent coredump for this problem in a regular chroot using

for (( i=0;i<500;i=i+1)); do ../node parallel/test-trace-events-dynamic-enable.js& done

This gets me generally about 1% failure rate on the test.

  • gcc (SUSE Linux) 8.2.1 20180831 [gcc-8-branch revision 264010]
  • TERM=xterm

Backtrace is always the same in case of failure,

Thread 3 (Thread 0x7f05f1b12700 (LWP 26921)):
#0  0x00007f05f0315d96 in do_futex_wait.constprop () from /lib64/libpthread.so.0
#1  0x00007f05f0315e88 in __new_sem_wait_slow.constprop.0 () from /lib64/libpthread.so.0
#2  0x0000557da325e0d2 in uv__sem_wait (sem=<optimized out>) at ../deps/uv/src/unix/thread.c:579
#3  uv_sem_wait (sem=0x557da416a6c0 <node::inspector::(anonymous namespace)::start_io_thread_semaphore>)
    at ../deps/uv/src/unix/thread.c:635
#4  0x0000557da31defec in node::inspector::(anonymous namespace)::StartIoThreadMain(void*) ()
    at ../deps/uv/src/unix/core.c:822
#5  0x00007f05f030d554 in start_thread () from /lib64/libpthread.so.0
#6  0x00007f05f0040ccf in clone () from /lib64/libc.so.6

Thread 2 (Thread 0x7f05f1b13740 (LWP 25622)):
#0  0x00007f05f030e9ed in __pthread_timedjoin_ex () from /lib64/libpthread.so.0
#1  0x0000557da325db9e in uv_thread_join (tid=<optimized out>) at ../deps/uv/src/unix/thread.c:227
#2  0x0000557da31cc7c6 in node::tracing::Agent::~Agent() () at ../deps/uv/src/unix/core.c:822
#3  0x0000557da3100109 in node::Start(int, char**) () at ../deps/uv/src/unix/core.c:822
#4  0x00007f05eff68feb in __libc_start_main () from /lib64/libc.so.6
#5  0x0000557da30b312a in _start () at ../sysdeps/x86_64/start.S:120

Thread 1 (Thread 0x7f05ebda7700 (LWP 28055)):
#0  0x00007f05eff7e08b in raise () from /lib64/libc.so.6
#1  0x00007f05eff674e9 in abort () from /lib64/libc.so.6
#2  0x00007f05eff673c1 in __assert_fail_base.cold.0 () from /lib64/libc.so.6
#3  0x00007f05eff766f2 in __assert_fail () from /lib64/libc.so.6
#4  0x0000557da32512c7 in uv__finish_close (handle=0x557da5ec6d40) at ../deps/uv/src/unix/core.c:280
#5  uv__run_closing_handles (loop=0x557da5e25668) at ../deps/uv/src/unix/core.c:302
#6  uv_run (loop=0x557da5e25668, mode=UV_RUN_DEFAULT) at ../deps/uv/src/unix/core.c:372
#7  0x00007f05f030d554 in start_thread () from /lib64/libpthread.so.0
#8  0x00007f05f0040ccf in clone () from /lib64/libc.so.6

@ofrobots
Copy link
Contributor

@AdamMajer a few more questions.. 1) do you have instructions on how you're running in a chroot. I want to reproduce this. 2) in the core dump, on the asserting thread, do you know the value of handle->type in uv__finish_close. The stack trace has frames elided, so I am not sure which thread this is supposed to be.

@AdamMajer
Copy link
Contributor Author

I run it in a normal chroot with /proc and /sys mounted, nothing else. The way I run it, is just as seen in my example - with a for () loop to execute many node processes in parallel to trigger this race. If I only run 100 parallel processes, then I stop seeing the problem. So you'll need to boost that number accordingly. 500 seems to work ok for 8 concurrent threads CPU.

It's very interesting, but I've compiled node from git/master with just ./configure; make -j10 and then run the loop. The exact same executable in chroot segfaults sometimes, but I can't reproduce the bug outside the chroot...

# mount
none on /proc type proc (rw,relatime)
none on /sys type sysfs (rw,relatime)

Anyway, that handle->type is unfortunately corrupted. I can provide you with the executable and the coredump, but I suspect this is a little useless without reproducer.

(gdb) p *handle
$2 = {data = 0x4545454545454545, loop = 0x4545454545454545, type = 1162167621, close_cb = 0x4545454545454545, 
  handle_queue = {0x4545454545454545, 0x4545454545454545}, u = {fd = 1162167621, reserved = {0x4545454545454545, 
      0x4545454545454545, 0x4545454545454545, 0x4545454545454545}}, next_closing = 0x4545454545454545, 
  flags = 1162167623}
(gdb) info f
Stack level 4, frame at 0x7f4a387c4f00:
 rip = 0x5572f0a5b747 in uv__finish_close (../deps/uv/src/unix/core.c:280); saved rip = 0x7f4a3cd7e554
 inlined into frame 5, caller of frame at 0x7f4a387c4e90
 source language c.
 Arglist at unknown address.
 Locals at unknown address, Previous frame's sp is 0x7f4a387c4e90
 Saved registers:
  rbx at 0x7f4a387c4e68, rbp at 0x7f4a387c4e70, r12 at 0x7f4a387c4e78, r13 at 0x7f4a387c4e80, rip at 0x7f4a387c4e88
(gdb)

I'm investigating why this is not getting triggered outside the chroot

@ofrobots
Copy link
Contributor

ofrobots commented Nov 8, 2018

I have tried a few different environments (chroot, docker, docker with gvisor) and am unable to reproduce the the issue. Also, I don't have any ideas why the uv handle would be pointing to a region of memory painted with 0x45 (ASCII 'E'). I am not sure how to progress further on this. @AdamMajer if you're able to find any more information on your, please do let us know here.

@AdamMajer
Copy link
Contributor Author

This seems to have been now resolved, somehow.. Closing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
trace_events Issues and PRs related to V8, Node.js core, and userspace code trace events.
Projects
None yet
Development

No branches or pull requests

3 participants