Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

worker_threads: race and crash on worker exit #51129

Closed
bnoordhuis opened this issue Dec 12, 2023 · 0 comments · Fixed by #51138
Closed

worker_threads: race and crash on worker exit #51129

bnoordhuis opened this issue Dec 12, 2023 · 0 comments · Fixed by #51138
Labels
c++ Issues and PRs that require attention from people who are familiar with C++. confirmed-bug Issues with confirmed bugs. worker Issues and PRs related to Worker support.

Comments

@bnoordhuis
Copy link
Member

Reported with v20.9.0 but also present in ToT. Backtrace is from a debug build.

  * frame #0: 0x000055b5ae1539a2 node-ci`v8::base::OS::Abort() at platform-posix.cc:691:5
    frame #1: 0x000055b5ae13f7dc node-ci`V8_Fatal(file="", line=38, format="") at logging.cc:167:22
    frame #2: 0x000055b5ae13f7fb node-ci`v8::base::(anonymous namespace)::DefaultDcheckHandler(file=<unavailable>, line=<unavailable>, message=<unavailable>) at logging.cc:57:11
    frame #3: 0x000055b5ac38785d node-ci`v8::Isolate::GetCurrent() at isolate-inl.h:38:3
    frame #4: 0x00007f0aec58f2e4
    frame #5: 0x000055b5ac725d5a node-ci`v8::internal::GlobalHandles::InvokeFirstPassWeakCallbacks() at global-handles.cc:865:11
    frame #6: 0x000055b5ac725d27 node-ci`v8::internal::GlobalHandles::InvokeFirstPassWeakCallbacks(this=<unavailable>) at global-handles.cc:839:23
    frame #7: 0x000055b5ac840e40 node-ci`v8::internal::Heap::PerformGarbageCollection(this=0x00007f05c42d2048, collector=MARK_COMPACTOR, gc_reason=<unavailable>, collector_reason=<unavailable>) at heap.cc:2312:59
    frame #8: 0x000055b5ac841cb7 node-ci`v8::internal::Heap::CollectGarbage(this=0x00007f05c42d2048, space=<unavailable>, gc_reason=kExternalFinalize, gc_callback_flags=kGCCallbackScheduleIdleGarbageCollection) at heap.cc:1781:33
    frame #9: 0x000055b5ac842fa4 node-ci`v8::internal::Heap::FinalizeIncrementalMarkingAtomically(v8::internal::GarbageCollectionReason) [inlined] v8::internal::Heap::CollectAllGarbage(gc_callback_flags=<unavailable>, gc_reason=kExternalFinalize, flags=<unavailable>, this=0x00007f05c42d2048) at heap.cc:1477:17
    frame #10: 0x000055b5ac842f8e node-ci`v8::internal::Heap::FinalizeIncrementalMarkingAtomically(this=0x00007f05c42d2048, gc_reason=kExternalFinalize) at heap.cc:3802:20
    frame #11: 0x000055b5ac7652f9 node-ci`v8::internal::CppHeap::DetachIsolate(this=0x00007f05c40fd7b0) at cpp-heap.cc:560:59
    frame #12: 0x000055b5ac81c162 node-ci`v8::internal::Heap::DetachCppHeap(this=0x00007f05c42d2048) at heap.cc:5769:42
    frame #13: 0x000055b5abf29d0a node-ci`node::IsolateData::~IsolateData(this=0x00007f05c40ea5c0) at env.cc:583:28
    frame #14: 0x000055b5abf29d84 node-ci`node::IsolateData::~IsolateData(this=0x00007f05c40ea5c0) at env.cc:586:1
    frame #15: 0x000055b5abe6d144 node-ci`node::FreeIsolateData(isolate_data=0x00007f05c40ea5c0) at environment.cc:408:10
    frame #16: 0x000055b5abe67e10 node-ci`node::FunctionDeleter<node::IsolateData, &node::FreeIsolateData(node::IsolateData*)>::operator()(this=0x00007f0ae5bf9c50, pointer=0x00007f05c40ea5c0) const at util.h:675:47
    frame #17: 0x000055b5abe685c6 node-ci`std::__uniq_ptr_impl<node::IsolateData, node::FunctionDeleter<node::IsolateData, &node::FreeIsolateData(node::IsolateData*)>>::reset(this=0x00007f0ae5bf9c50, __p=0x0000000000000000) at unique_ptr.h:182:16
    frame #18: 0x000055b5abe66fe7 node-ci`std::unique_ptr<node::IsolateData, node::FunctionDeleter<node::IsolateData, &node::FreeIsolateData(node::IsolateData*)>>::reset(this=nullptr, __p=0x0000000000000000) at unique_ptr.h:456:12
    frame #19: 0x000055b5ac1c0d04 node-ci`node::worker::WorkerThreadData::~WorkerThreadData(this=0x00007f0ae5bf98f0) at node_worker.cc:220:26
    frame #20: 0x000055b5ac1b99aa node-ci`node::worker::Worker::Run(this=0x000055b5b2a7b480) at node_worker.cc:406:1
    frame #21: 0x000055b5ac1bba0a node-ci`operator(__closure=0x0000000000000000, arg=0x000055b5b2a7b480) at node_worker.cc:683:11
    frame #22: 0x000055b5ac1bbaae node-ci`_FUN((null)=0x000055b5b2a7b480) at node_worker.cc:693:3

The race condition is difficult to trigger but the problem is obvious once you see it. The order of events in src/node_worker.cc is as follows:

  1. Worker::Run() creates WorkerThreadData data(this)

  2. WorkerThreadData constructor creates the isolate and IsolateData

  3. Worker::Run() enters and exits said isolate (constructs and destructs a Locker and Isolate::Scope)

  4. Worker::Run() destructs the WorkerThreadData instance

  5. WorkerThreadData::~WorkerThreadData() calls IsolateData::~IsolateData()

  6. IsolateData::~IsolateData() calls Isolate::DetachCppHeap()

  7. Isolate::DetachCppHeap() runs GC

  8. GlobalHandles::InvokeFirstPassWeakCallbacks() calls Isolate::GetCurrent() but we've no longer entered the isolate (from step 3) and so it blows up violently

The solution IMO is to refactor so that the Locker stays around until the WorkerThreadData destructor finishes.

I believe the isolate can (and should) stay scoped to Worker::Run() and not leak out. Currently, WorkerThreadData "loans" the isolate to the Worker for no good reason I can see.

I can submit a patch but I'd like consensus on what approach to take before I start writing/refactoring/deleting code.

@bnoordhuis bnoordhuis added confirmed-bug Issues with confirmed bugs. c++ Issues and PRs that require attention from people who are familiar with C++. worker Issues and PRs related to Worker support. labels Dec 12, 2023
bnoordhuis added a commit to bnoordhuis/io.js that referenced this issue Dec 12, 2023
MVP fix for a worker_threads crash where ~WorkerThreadData() ->
~IsolateData() -> Isolate::DetachCppHeap() kicks off a round of
garbage collection that expects an entered isolate.

No test because the crash is not reliably reproducable but the bug
is pretty clearly described in the linked issue and is obvious once
you see it, IMO.

Fixes: nodejs#51129
nodejs-github-bot pushed a commit that referenced this issue Dec 25, 2023
MVP fix for a worker_threads crash where ~WorkerThreadData() ->
~IsolateData() -> Isolate::DetachCppHeap() kicks off a round of
garbage collection that expects an entered isolate.

No test because the crash is not reliably reproducable but the bug
is pretty clearly described in the linked issue and is obvious once
you see it, IMO.

Fixes: #51129
PR-URL: #51138
Reviewed-By: Santiago Gimeno <[email protected]>
Reviewed-By: Joyee Cheung <[email protected]>
Reviewed-By: Chengzhong Wu <[email protected]>
Reviewed-By: Benjamin Gruenbaum <[email protected]>
Reviewed-By: Tobias Nießen <[email protected]>
Reviewed-By: Juan José Arboleda <[email protected]>
Reviewed-By: Stephen Belanger <[email protected]>
RafaelGSS pushed a commit that referenced this issue Jan 2, 2024
MVP fix for a worker_threads crash where ~WorkerThreadData() ->
~IsolateData() -> Isolate::DetachCppHeap() kicks off a round of
garbage collection that expects an entered isolate.

No test because the crash is not reliably reproducable but the bug
is pretty clearly described in the linked issue and is obvious once
you see it, IMO.

Fixes: #51129
PR-URL: #51138
Reviewed-By: Santiago Gimeno <[email protected]>
Reviewed-By: Joyee Cheung <[email protected]>
Reviewed-By: Chengzhong Wu <[email protected]>
Reviewed-By: Benjamin Gruenbaum <[email protected]>
Reviewed-By: Tobias Nießen <[email protected]>
Reviewed-By: Juan José Arboleda <[email protected]>
Reviewed-By: Stephen Belanger <[email protected]>
richardlau pushed a commit that referenced this issue Mar 25, 2024
MVP fix for a worker_threads crash where ~WorkerThreadData() ->
~IsolateData() -> Isolate::DetachCppHeap() kicks off a round of
garbage collection that expects an entered isolate.

No test because the crash is not reliably reproducable but the bug
is pretty clearly described in the linked issue and is obvious once
you see it, IMO.

Fixes: #51129
PR-URL: #51138
Reviewed-By: Santiago Gimeno <[email protected]>
Reviewed-By: Joyee Cheung <[email protected]>
Reviewed-By: Chengzhong Wu <[email protected]>
Reviewed-By: Benjamin Gruenbaum <[email protected]>
Reviewed-By: Tobias Nießen <[email protected]>
Reviewed-By: Juan José Arboleda <[email protected]>
Reviewed-By: Stephen Belanger <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c++ Issues and PRs that require attention from people who are familiar with C++. confirmed-bug Issues with confirmed bugs. worker Issues and PRs related to Worker support.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant