src: refactor thread stopping mechanism #26757

addaleax · 2019-03-18T19:33:39Z

Follow style guide for naming, e.g. use lower_snake_case
for simple setters/getters.
For performance, use atomics instead of a mutex, and inline
the corresponding getter/setter pair.

Checklist

make -j4 test (UNIX), or vcbuild test (Windows) passes
commit message follows commit guidelines

- Follow style guide for naming, e.g. use lower_snake_case for simple setters/getters. - For performance, use atomics instead of a mutex, and inline the corresponding getter/setter pair.

nodejs-github-bot · 2019-03-18T19:33:41Z

@addaleax build started: https://ci.nodejs.org/blue/organizations/jenkins/node-test-pull-request-lite-pipeline/detail/node-test-pull-request-lite-pipeline/2974/pipeline

gireeshpunathil · 2019-03-19T05:01:27Z

src/env-inl.h

+}
+
+void AsyncRequest::set_stopped(bool flag) {
+  stopped_.store(flag, std::memory_order_relaxed);


so: the heavy usage of muexes around the worker code where multi-thread data access was expected, was almost always to ensure data consistency by flushing cache lines? (in other words, writes in one thread is made visible to other threads instantly). If so, std::memory_order_relaxed constraint is insufficient to ensure that? we might need at least memory_order_acquire ?

I guess I retract above question - when I tested with a small code, I see mfence or sync instruction being added with memory_order_relaxed itself; so pls ignore.

So … my line of thinking was that the syscalls behind uv_async_send() would themselves present full memory barriers. I’ll check later and verify that that’s indeed correct.

@addaleax - that may be true; but we do have set_stopped is_stopped calls that the threads can call but do not involve syscalls? however:

int foo(bool flag) { stopped_.store(flag, std::memory_order_relaxed); }

this is what I tested and this is what I see in the generated code:

(gdb) x/30i 0x40053e 0x40053e <_ZNSt11atomic_bool5storeEbSt12memory_order>: push rbp 0x40053f <_ZNSt11atomic_bool5storeEbSt12memory_order+1>: mov rbp,rsp 0x400542 <_ZNSt11atomic_bool5storeEbSt12memory_order+4>: sub rsp,0x30 0x400546 <_ZNSt11atomic_bool5storeEbSt12memory_order+8>: mov QWORD PTR [rbp-0x28],rdi 0x40054a <_ZNSt11atomic_bool5storeEbSt12memory_order+12>: mov eax,esi 0x40054c <_ZNSt11atomic_bool5storeEbSt12memory_order+14>: mov DWORD PTR [rbp-0x30],edx 0x40054f <_ZNSt11atomic_bool5storeEbSt12memory_order+17>: mov BYTE PTR [rbp-0x2c],al 0x400552 <_ZNSt11atomic_bool5storeEbSt12memory_order+20>: movzx eax,BYTE PTR [rbp-0x2c] 0x400556 <_ZNSt11atomic_bool5storeEbSt12memory_order+24>: mov rdx,QWORD PTR [rbp-0x28] 0x40055a <_ZNSt11atomic_bool5storeEbSt12memory_order+28>: mov QWORD PTR [rbp-0x8],rdx 0x40055e <_ZNSt11atomic_bool5storeEbSt12memory_order+32>: mov BYTE PTR [rbp-0x9],al 0x400561 <_ZNSt11atomic_bool5storeEbSt12memory_order+35>: and BYTE PTR [rbp-0x9],0x1 0x400565 <_ZNSt11atomic_bool5storeEbSt12memory_order+39>: mov eax,DWORD PTR [rbp-0x30] 0x400568 <_ZNSt11atomic_bool5storeEbSt12memory_order+42>: mov DWORD PTR [rbp-0x10],eax 0x40056b <_ZNSt11atomic_bool5storeEbSt12memory_order+45>: mov eax,DWORD PTR [rbp-0x10] 0x40056e <_ZNSt11atomic_bool5storeEbSt12memory_order+48>: mov esi,0xffff 0x400573 <_ZNSt11atomic_bool5storeEbSt12memory_order+53>: mov edi,eax 0x400575 <_ZNSt11atomic_bool5storeEbSt12memory_order+55>: call 0x40052a <_ZStanSt12memory_orderSt23__memory_order_modifier> 0x40057a <_ZNSt11atomic_bool5storeEbSt12memory_order+60>: mov DWORD PTR [rbp-0x14],eax 0x40057d <_ZNSt11atomic_bool5storeEbSt12memory_order+63>: movzx edx,BYTE PTR [rbp-0x9] 0x400581 <_ZNSt11atomic_bool5storeEbSt12memory_order+67>: mov rax,QWORD PTR [rbp-0x8] 0x400585 <_ZNSt11atomic_bool5storeEbSt12memory_order+71>: mov BYTE PTR [rax],dl 0x400587 <_ZNSt11atomic_bool5storeEbSt12memory_order+73>: mfence 0x40058a <_ZNSt11atomic_bool5storeEbSt12memory_order+76>: leave 0x40058b <_ZNSt11atomic_bool5storeEbSt12memory_order+77>: ret

please note that mfence at 0x400587 that settles the matter?

please note that mfence at 0x400587 that settles the matter?

On x64 it does, yes – to be honest, I don’t know how the different memory order modes are implemented on different platforms? It seems like this disassembled implementation simply ignores the order argument?

0x400529 <_Z3foob+35>: ret 0x40052a <_ZStanSt12memory_orderSt23__memory_order_modifier>: push rbp

@addaleax - if you look at the continuity of the instructions, looks like these (the atomic* helpers) are not static APIs, but compiler-generated code, on the fly; so it is possible that only necessary code was generated, on a per compilation unit basis?

my line of thinking was that the syscalls behind uv_async_send() would themselves present full memory barriers.

Libuv doesn't promise that. uv_async_send() can (at least in theory) elide the system call.

I'm kind of surprised the compiler emits an mfence. It's not needed on x64 (nor any other architecture, I think?) because aligned loads and stores are always atomic. It might just be a compiler bug; I wouldn't depend on it.

@gireeshpunathil I wouldn’t think so, the linker should be able to elide multiple copies of that variable into a single one.

@bnoordhuis Yeah, thanks. I’ve removed the memory_order_relaxed bit.

gireeshpunathil · 2019-03-19T05:17:29Z

src/node_worker.cc

@@ -381,7 +381,8 @@ void Worker::OnThreadStopped() {
 Worker::~Worker() {
  Mutex::ScopedLock lock(mutex_);

-  CHECK(stopped_ || env_ == nullptr || env_->GetAsyncRequest()->IsStopped());
+  CHECK(stopped_);
+  CHECK_NULL(env_);


IIRC, there was a control flow that takes to Worker destructor without nullifying env_ , not able to figure that out now; do you know?

@gireeshpunathil I think that would be a bug – the child thread is not allowed to exist at this point (and the next CHECK verifies that the thread has been joined), and the child thread in turn owns the Environment.

bnoordhuis

I'm curious, does the overhead of locking/unlocking the mutex show up in profiles anywhere?

(I mean, I could imagine it does but I can also imagine it doesn't. I'd like to be convinced by numbers. :-))

bnoordhuis · 2019-03-19T11:51:05Z

src/env-inl.h

+}
+
+void AsyncRequest::set_stopped(bool flag) {
+  stopped_.store(flag, std::memory_order_relaxed);


my line of thinking was that the syscalls behind uv_async_send() would themselves present full memory barriers.

Libuv doesn't promise that. uv_async_send() can (at least in theory) elide the system call.

I'm kind of surprised the compiler emits an mfence. It's not needed on x64 (nor any other architecture, I think?) because aligned loads and stores are always atomic. It might just be a compiler bug; I wouldn't depend on it.

addaleax · 2019-03-19T17:28:18Z

I'm curious, does the overhead of locking/unlocking the mutex show up in profiles anywhere?

This PR was created because it does :) It doesn’t have a huge impact, but this code is run a lot, and it might make up 1 % or so of the runtime for some processes.

mhdawson

LGTM

addaleax · 2019-03-21T10:59:56Z

CI: https://ci.nodejs.org/job/node-test-pull-request/21723/

addaleax · 2019-03-21T20:11:43Z

Resume CI: https://ci.nodejs.org/job/node-test-pull-request/21738/

BridgeAR · 2019-03-21T23:45:07Z

Landed in d812dbb 🎉

- Follow style guide for naming, e.g. use lower_snake_case for simple setters/getters. - For performance, use atomics instead of a mutex, and inline the corresponding getter/setter pair. PR-URL: nodejs#26757 Reviewed-By: James M Snell <[email protected]> Reviewed-By: Gireesh Punathil <[email protected]> Reviewed-By: Franziska Hinkelmann <[email protected]> Reviewed-By: Michael Dawson <[email protected]>

- Follow style guide for naming, e.g. use lower_snake_case for simple setters/getters. - For performance, use atomics instead of a mutex, and inline the corresponding getter/setter pair. PR-URL: #26757 Reviewed-By: James M Snell <[email protected]> Reviewed-By: Gireesh Punathil <[email protected]> Reviewed-By: Franziska Hinkelmann <[email protected]> Reviewed-By: Michael Dawson <[email protected]>

src: refactor thread stopping mechanism

47a0854

- Follow style guide for naming, e.g. use lower_snake_case for simple setters/getters. - For performance, use atomics instead of a mutex, and inline the corresponding getter/setter pair.

addaleax requested a review from gireeshpunathil March 18, 2019 19:33

nodejs-github-bot added c++ Issues and PRs that require attention from people who are familiar with C++. lib / src Issues and PRs related to general changes in the lib or src directory. labels Mar 18, 2019

jasnell approved these changes Mar 18, 2019

View reviewed changes

gireeshpunathil reviewed Mar 19, 2019

View reviewed changes

gireeshpunathil approved these changes Mar 19, 2019

View reviewed changes

bnoordhuis reviewed Mar 19, 2019

View reviewed changes

fixup! src: refactor thread stopping mechanism

3120670

fhinkel approved these changes Mar 19, 2019

View reviewed changes

mhdawson approved these changes Mar 19, 2019

View reviewed changes

addaleax added the author ready PRs that have at least one approval, no pending requests for changes, and a CI started. label Mar 21, 2019

BridgeAR closed this Mar 21, 2019

joyeecheung mentioned this pull request Mar 22, 2019

CI failures: 20190322 nodejs/reliability#21

Closed

targos mentioned this pull request Mar 27, 2019

v11.13.0 release proposal #26949

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

src: refactor thread stopping mechanism #26757

src: refactor thread stopping mechanism #26757

addaleax commented Mar 18, 2019

nodejs-github-bot commented Mar 18, 2019

gireeshpunathil Mar 19, 2019

gireeshpunathil Mar 19, 2019

addaleax Mar 19, 2019

gireeshpunathil Mar 19, 2019

addaleax Mar 19, 2019

gireeshpunathil Mar 19, 2019

bnoordhuis Mar 19, 2019

addaleax Mar 19, 2019

gireeshpunathil Mar 19, 2019

addaleax Mar 19, 2019

bnoordhuis left a comment

bnoordhuis Mar 19, 2019

addaleax commented Mar 19, 2019 •

edited

Loading

mhdawson left a comment

addaleax commented Mar 21, 2019

addaleax commented Mar 21, 2019

BridgeAR commented Mar 21, 2019

src: refactor thread stopping mechanism #26757

src: refactor thread stopping mechanism #26757

Conversation

addaleax commented Mar 18, 2019

Checklist

nodejs-github-bot commented Mar 18, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bnoordhuis left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

addaleax commented Mar 19, 2019 • edited Loading

mhdawson left a comment

Choose a reason for hiding this comment

addaleax commented Mar 21, 2019

addaleax commented Mar 21, 2019

BridgeAR commented Mar 21, 2019

addaleax commented Mar 19, 2019 •

edited

Loading