Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Pluggable thread pool #22631

Closed
wants to merge 31 commits into from
Closed

Conversation

davisjam
Copy link
Contributor

DO NOT MERGE (yet? ever?).

This is an experimental PR based on the "pluggable thread pool" concept.

This PR will eventually contain the changes necessary to use a pluggable thread pool in Node.js, as well as one or more prototype pluggable thread pools and a description of the performance impact.

This PR won't compile as presented here because on my development machine I am using my PluggableThreadPool branch of libuv. If you want to try it yourself, check out that branch and overwrite deps/uv with it.

I am opening this PR rather prematurely in an attempt to solicit feedback from the community about the general idea, as well as particular suggestions on Node.js-level threadpool designs.

Checklist
  • make -j4 test (UNIX), or vcbuild test (Windows) passes
  • tests and/or benchmarks are included
  • documentation is changed or added
  • commit message follows commit guidelines

@nodejs-github-bot nodejs-github-bot added c++ Issues and PRs that require attention from people who are familiar with C++. lib / src Issues and PRs related to general changes in the lib or src directory. labels Aug 31, 2018
@addaleax addaleax added the wip Issues and PRs that are still a work in progress. label Aug 31, 2018
@davisjam
Copy link
Contributor Author

davisjam commented Sep 3, 2018

Here I will track Node-specific benefits of a pluggable threadpool, as I become aware of them.

Benefit: Unified threadpool for V8 and libuv operations

A pluggable threadpool enables using the same threadpool for V8 operations and libuv operations (finally -- cf. #11855).

In #14001 @matthewloring initially tried to use the libuv threadpool for the NodePlatform implementation (src/node_platform.*). He observed here that this incurred a performance penalty in synchronous workloads (sounds to me like these workloads are along the lines of the Web Tooling Benchmark), since uv_queue_work can only be run on the event loop and this required deferring task submission from non-event-loop threads. I gather that V8 worker threads sometimes create new tasks.

This "only the event loop can queue work" restriction no longer holds once the threadpool is owned by Node.js. This suggests that the proposed Node.js threadpool "queue work" API should be thread-safe, and that within the NodePlatform threadpool hooks we should route calls to the Node.js threadpool.

Instead of two distinct threadpools of size n + m, we will have one threadpool of size P.

Extending this benefit to the cluster module

We can extend this benefit to the cluster module. The cluster module currently puts each instance in its own process, with its own NodePlatform, Isolate(s), and Worker(s). Each of the k cluster instances comes with n + m threads (V8 pool + libuv pool), and the V8 threads are likely duplicating work since the cluster instances may be running the same code. So we have k * (n + m) threads.

In #20876 @addaleax defined Workers, essentially "native threads for Node.js".

  • These Workers get their own Isolate, which runs on the same v8::Platform shared across all Isolates in a Node.js process. Thanks to src: Node implementation of v8::Platform #14001 (see the previous point), the same V8 threadpool is shared by all Workers, reducing thread contention for these CPU-bound tasks.
  • These Workers also get their own uv_loop_t. Since libuv's threadpool is global, the Workers all share the same threadpool.

Implementing cluster using Workers would permit the different instances to share the two pools (k*P) threads), and unifying the two pools as describe in the previous point would further reduce the thread burden to k + P threads.

@cjihrig
Copy link
Contributor

cjihrig commented Sep 3, 2018

FWIW, I don't think cluster should be moved to Workers.

Summary:
 This commit outlines the general API for the node threadpool.
 The current node threadpool is "plugged in" to libuv, but not V8.

Thoughts:
 I think the current API will generally suffice going forward.
 For example, separate I/O and CPU pools can be implemented
 by sub-classing TaskQueue and introducing multiple separate
 queues for I/O-bound and CPU-bound Tasks.

Routing logic:
 I plug this TP into libuv during the call to Start().
 I would like to refactor out a 'LibuvExecutor' class similar
 to NodePlatform and have them both use similar task routing logic.
 I have not yet routed the v8::Platform's TP into this TP.

Tests:
 I introduced (passing) unit tests in test/cctest/test_threadpool.cc.

 In addition, this version passes much of the core test suite.
 Some failures, due e.g. to the lack of uv_cancel support comparable to
 that of libuv's default executor in this iteration.
@davisjam
Copy link
Contributor Author

davisjam commented Sep 4, 2018

In bc0f42e I sketched a minimal src/node_threadpool.[cc,h] that is plugged into libuv. I also noted places to insert routing logic to move v8 requests submitted to NodePlatform into the Node-level threadpool.

See the commit message for details.

#include <stdio.h>
#define LOG_0(fmt) fprintf(stderr, fmt)
#define LOG_1(fmt, a1) fprintf(stderr, fmt, a1)
#define LOG_2(fmt, a1, a2) fprintf(stderr, fmt, a1, a2)
Copy link
Member

@devsnek devsnek Sep 4, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you just use __VA_ARGS__ here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, didn't know about that. Thank you.

@vmarchaud
Copy link
Contributor

@cjihrig I'm planning to give it a try, i'm interested why it shouldn't ?

Summary:
 In this commit I rewired NodePlatform to use the threadpool::Threadpool
 I introduced in the previous commit.

Approach:
 I touched the existing NodePlatform implementation as little as possible.
 Thus I wire WorkerThreadsTaskRunner to Post to a threadpool::Threadpool.

Existing problems:
 Not all existing behaviors supported by WorkerThreadsTaskRunner are
 supported by the current threadpool::Threadpool implementation.
 Where they do not exist I replaced them with no-ops.

 Node currently runs tests correctly (I tried a few) but segfaults during cleanup.
 I believe this is because of the lack of support for a "BlockingDrain" API.

 The CreatePlatform API is externalized.
 I do not know what this is for.
 This somewhat complicates my plan to have it accept a threadpool::Threadpool as an argument.
 Maybe I should overload this function and retain the existing n_threads API too?

Next steps:
 1. Refactor out a LibuvExecutor in node_threadpool, analogous to the
    WorkerThreadsTaskRunner.
 2. threadpool::Threadpool must support the union of the APIs needed
    by its various consumers (notably WorkerThreadsTaskRunner).
 3. Possibly we could refactor out the WorkerThreadsTaskRunner
    as an optimization (?), but since it handles Delayed tasks as well as
    "do them right now" Tasks it is a useful front-end.
    I don't intend such a refactoring to be part of the eventual PR.
 4. Consider overloading MultiIsolatePlatform/NodePlatform to retain the existing 'n_threads' API.
    When used, this should create and use a threadpool::Threadpool private to the NodePlatform.
@davisjam
Copy link
Contributor Author

davisjam commented Sep 6, 2018

In 7200111 I added the rewiring logic I mentioned here. Now both V8 and libuv tasks are routed to the same Node.js-land threadpool. Hooray!

Of course this is A Bad Idea without multiple queues to avoid more I/O-bound vs. CPU-bound conflicts. Will look into that soon.

On the notes in 7200111 about CreatePlatform, it appears that I should read #16981. But just in case @zcbenz is listening...
@zcbenz The CreatePlatform API lets an embedder specify the number of V8 threads. Would you be interested a separate API to plug in your own threadpool that would receive both V8 and libuv tasks?

No functional change in this commit.
I added a standalone LibuvExecutor that I plug into libuv.

Analogous to the NodePlatform's WorkerThreadsTaskRunner,
this design decouples the duties of the Threadpool from the
interface with libuv (V8).
No functional change
Tasks can be in one of these states:
  - QUEUED
  - ASSIGNED
  - COMPLETED
In a previous commit I was too eager in deleting threads
from the WorkerThreadsTaskRunner.

The DelayedTaskScheduler is the responsibility of the
WorkerThreadsTaskRunner, not the internal Threadpool.
Thus we should start and stop it correctly.

This was the cause of the segfault I mentioned earlier.
This completes the NodePlatform rewiring begun in a previous commit.

This BlockingDrain will wait on both V8 Tasks and libuv Tasks.
It waits on all Tasks in the Threadpool, even though NodePlatform
only cares about BlockingDrain'ing the V8 Tasks.
@davisjam
Copy link
Contributor Author

davisjam commented Sep 6, 2018

Highlights in the commits I just pushed:

  1. 662825f: Refactor out a LibuvExecutor so that the classes Do One Thing.
  2. 8f6df02: Flesh out the Threadpool APIs to meet the needs of the NodePlatform -- namely BlockingDrain.

Still need a Cancel for libuv. I added state tracking in 8771ad8 which is a step towards this.

This mostly-matches the old behavior, except
that instead of using a self-managed TP the NodePlatform
uses a private threadpool::Threadpool instance.

It's not clear whether an embedder would like
to plug in their own Threadpool, so play it safe for now.

Hopefully I can get better insight into desired behavior
from other community members.
No functional change
1. Use RAII for Threadpool. Don't have a separate Initialize phase.
   This was previously useful because the Threadpool knew about
   libuv. Now that there is a LibuvExecutor, we can use RAII.

2. Have Threadpool accept a size.
   When absent, try:
     - UV_THREADPOOL_SIZE (default libuv TP size)
     - # cores (default --v8-pool-size behavior)
Feature: Ability to cancel Tasks Post'ed to the Threadpool.
Need: A LibuvExecutor would like this.

Approach:
  Fact: Threadpool::Post accepts a unique_ptr.
  Fact: In principle we can easily cancel Tasks that have not
        yet been queued.
  Fact: But it's hard to cancel a Task if we gave away our pointer
        to the Threadpool.

  Threadpool::Post now returns a shared_ptr to a TaskState object.
  You can TaskState.Cancel() and it might work.
  This is the behavior offered by the default libuv threadpool as well.
This lets Threadpool operate at a higher level of
abstraction.

While I was in there, I switched to using smart pointers
for the TaskQueue shared by the Workers.
This permits the use of the existing Threadpool class as a building
block for more sophisticated NodeThreadpools.

The default NodeThreadpool is just a pass-thru for a Threadpool.
Splitting by I/O and CPU is a form of PartitionedNodeThreadpool.
A PartitionedNodeThreadpool basically follows the "threadpool handles"
proposed by saghul for libuv, but implemented in the Node-land executor.
@davisjam
Copy link
Contributor Author

According to my traces, npm install makes no use of the libuv threadpool. This is a bit surprising since I was under the impression that it (downloads and) unpacks tarballs, which I figured would be CPU-bound. Are these operations being performed synchronously?

If anyone out there knows about npm internals, I would appreciate some insight on this. Are my traces wrong?

This tool handles the output of PrintStats.
It produces graphs of TP queue lengths sampled over time.
It also prints a summary of task counts groupe by origin and type.

In the plot of pool queue lengths, it includes a per-TP
plot of "# CPU tasks" in the queue.
When running with NODE_THREADPOOL_TYPE=SPLIT_BY_ORIGIN,
this lets us visualize the extent to which the libuv TP is
working on both CPU and I/O.
Problem:
Apparently some uses of node go through Exit without fully cleaning up,
so printing stats in the PartitionedNodeThreadpool d'tor was not
giving us stats.

This was true, for example, for 'npm install'.

Solution:
DrainAndPrintStats in both node.cc::Start and node.cc::Exit,
and only keep the first one if we see it twice.

This should not be merged. Just performance profiling.
And dumping stats in SignalExit is unsafe.
@davisjam
Copy link
Contributor Author

The most recently pushed changes introduce tools/visualize-threadpool-behavior.py, which doesn't quite live up to its name. I am intending it to eventually be a threadpool tuning aid.

At the moment, if you invoke my version of node with NODE_THREADPOOL_TYPE=SPLIT_BY_ORIGIN and then feed the resulting log file to this tool, it will create some graphs (not super useful yet) and print the distribution of task types by origin (nice snapshot of how an application uses the threadpool).

For example, on a toy application it produces this summary:

Distribution of task types by origin
task-origin  task-type
LIBUV        CPU           10000
             DNS            5000
             FS           655000
V8           CPU            3554

At the moment I am looking for applications that use the threadpool for anything besides FS, since on FS-only workloads this PR won't have any effect.

  • Does anyone out there know of open-source application(s) that fit this description?
  • Does anyone out there have closed-source application(s) that they would be willing to benchmark on?

If so, comment here or contact me by email!

This commit should be reverted before merging the PR.

The Node.js PR should follow a separate PR bumping the libuv version
to one with my libuv PR.
@davisjam
Copy link
Contributor Author

davisjam commented Sep 30, 2018

I've just included the latest version of my libuv PR as part of this PR in order to:

  • permit people to try out this PR more easily
  • run node CI.

Commits like 64ce6e1 should be removed before merging this PR.

@davisjam
Copy link
Contributor Author

I've started a CI run to test this PR -- which should let me know, among other things, how my Windows changes in libuv went.

https://ci.nodejs.org/job/node-test-pull-request/17535

@Trott
Copy link
Member

Trott commented Nov 20, 2018

@davisjam Still working on this one?

@davisjam
Copy link
Contributor Author

davisjam commented Nov 23, 2018

@Trott I still haven't heard from @bnoordhuis on the libuv side of things.

@davisjam
Copy link
Contributor Author

@addaleax

#include <algorithm>

// TODO(davisjam): DO NOT MERGE. Only for debugging.
// TODO(davisjam): There must be a better way to do this.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might want to look at debug_utils.h for some ideas?

@addaleax
Copy link
Member

@davisjam Do you think you could get this PR and the libuv one up to date in terms of merge conflicts, so that it might be a bit easier to test this out?

As for feedback on the concepts introduced here:

  • My initial gut reaction to the new environment variables/different subclasses … I’m not a huge fan, tbh. I think we’d want some sensible defaults, and maybe add a bit of configurability on top of that, rather than providing completely different modes for different situations?
  • I think the v8_platform struct was a mistake, and so I’d prefer to avoid a node_threadpool one?
  • Some of the C++ code style is a bit unusual for Node.js core. If it helps, we have an autoformatter (e.g. CLANG_FORMAT_START=master make format-cpp), a style guide and I’m also totally willing to push changes to address this minor stuff myself if you like.

@davisjam
Copy link
Contributor Author

@addaleax Great feedback, thank you.

My initial gut reaction to the new environment variables/different subclasses … I’m not a huge fan, tbh. I think we’d want some sensible defaults, and maybe add a bit of configurability on top of that, rather than providing completely different modes for different situations?

Do you have any suggestions for the class(es) and knob(s) you think we should expose?

@addaleax
Copy link
Member

@davisjam I think it would be okay if, at least for an initial implementation, we could do something similar to what we had in libuv, i.e. try to keep a certain number/percentage of threads free from potentially long-running tasks? I think that’s already possible with what you have here, right?

@BridgeAR
Copy link
Member

Ping @davisjam

@davisjam
Copy link
Contributor Author

@BridgeAR

The libuv half of this PR is making progress. @addaleax is generally in favor and we are working through her feedback.

@fhinkel
Copy link
Member

fhinkel commented Oct 26, 2019

ping @davisjam 👋 Any updates on this?

@fhinkel
Copy link
Member

fhinkel commented Nov 3, 2019

I'm cleaning out a few old PRs 🧹. I'm closing this due to inactivity. Please re-open if needed!

@fhinkel fhinkel closed this Nov 3, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c++ Issues and PRs that require attention from people who are familiar with C++. lib / src Issues and PRs related to general changes in the lib or src directory. wip Issues and PRs that are still a work in progress.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants