Skip to content
This repository has been archived by the owner on Mar 25, 2021. It is now read-only.

Step 1: Figure out use cases? #1

Closed
addaleax opened this issue May 23, 2017 · 66 comments
Closed

Step 1: Figure out use cases? #1

addaleax opened this issue May 23, 2017 · 66 comments

Comments

@addaleax
Copy link
Member

I think one of the first things we’ll want to do is to figure out what the actual use cases for (Web)Workers in Node are, so that we have a better idea of the requirements the API and implementation have to fulfill.

I guess Workers could be applied to anything that needs a fast way to share information across some kind of parallel applications, but a lot of that can already be addressed using multiple processes and standard IPC methods, so what’s going to be most interesting is to hear what one can not do fast right now without them.

@addaleax
Copy link
Member Author

I can also go first: The one time I really wanted workers in Node was when building the coverage tooling for Node core itself; Parsing and serializing the JSON coverage information is a noticeable bottleneck, so being able to distribute that across multiple threads would be nice to have.

@Fishrock123
Copy link
Contributor

The first thing that comes to my mind: off-I/O-thread template rendering.

@matthewp
Copy link

Code portability.

@chrisdickinson
Copy link
Contributor

chrisdickinson commented May 23, 2017

Seconding @Fishrock123's suggestion of off-I/O-thread template rendering — especially for server side rendering of React. Along the lines of JSON parsing, I suspect it would allow for much more efficient architectures for things like Babel & module bundlers, since this would allow authors to hand off the main-thread-blocking JS parse to a pool of threads (& possibly do so with a cheap object transfer?)

@vladholubiev
Copy link

Parsing/Generating .csv ?

For example http://papaparse.com/ does this efficiently on client-side using web workers (I guess)

@vkurchatkin
Copy link
Contributor

@chrisdickinson

possibly do so with a cheap object transfer

is that a real thing or something hypothetical?

@addaleax
Copy link
Member Author

@vkurchatkin We do have an actual serialization/deserialization API now in V8, that should be a pretty good start. It’s basically what Chrome uses for their WebWorkers, so that should qualify for most people’s understanding of “cheap”.

(I’d like to keep this thread on-topic for use cases; don’t be shy to open new issues here for anything. People who go watch the repo are opting into the noise. ;))

@vkurchatkin
Copy link
Contributor

@addaleax fair enough. My problem with usecases is that unless a usecase is specified well enough, you could argue that anything can be solved with current multi-process model, including all examples above. For me in process workers are about efficiency and simplicity, not usecases.

@refack
Copy link
Contributor

refack commented May 23, 2017

IMHO a strong use-case as @matthewp mentioned is code portability (a.k.a isomorphism). Sorry bring up cluster again, but it was homegrown, while WebWorker has a wider acceptance as a standard.

@refack
Copy link
Contributor

refack commented May 23, 2017

Another use-case that comes to mind is low-priority tasks (even I/O bound). Things like Cache-Hydration, off-line batch-processing, etc.
Current implementations depend on the OS for prioritizing the main process vs the workers, WebWorker has the goal of keeping the main "thread" responsive.

@inikulin
Copy link

For parallelization of tokenization and tree construction in parse5. This can be handy for all parsers, I guess.

@kgryte
Copy link

kgryte commented May 23, 2017

Numeric computation. The ability to more cheaply distribute numeric computational tasks to multiple workers, as commonly found in machine learning and analysis of larger datasets (akin to MATLAB's parfor).

With the current model, need to perform wholesale copying of data between multiple processes. With shared array buffers, workers can operate on the same buffer, allowing better memory efficiency and performance.

In short, better environment support for parallel map-reduce style operations would be highly beneficial as Node.js applications become more computationally intensive.

@alexeagle
Copy link

Compilers like TypeScript and Angular would parallelize parts of the pipeline

@refack
Copy link
Contributor

refack commented May 24, 2017

[question] for the parsing and heavy computation use-case. What do see as the specific benefits of Worker? multi core utilization and/or ability to keep a main thread responsive?

@matthewp
Copy link

@refack Keeping the main thread responsive. ex, for a server not blocking incoming requests.

@inikulin
Copy link

inikulin commented May 24, 2017

@refack Different stages of parsing (e.g. input preprocessing, tokenization, syntactic analysis) can be performed in parallel, thus reducing cumulative parsing time.

@refack
Copy link
Contributor

refack commented May 24, 2017

Thank you @matthewp & @inikulin. That's what I wanted to know. 💯

@inikulin
Copy link

Just another example of worker use case popped from the top of my mind: parallelization of the gulp build tasks. Currently computationally heavy tasks (e.g. linting, compilation) can't be performed in parallel due to their blocking nature. Service workers should significantly reduce build times.

@refack
Copy link
Contributor

refack commented May 25, 2017

A reference for those who don't "watch" this repo - spinoff discussion on High level architecture

@domenic
Copy link
Contributor

domenic commented May 26, 2017

In jsdom, we would like this for two reasons:

We could also benefit from it for keeping the main thread responsive and parallelizing multiple files by doing background HTML and CSS parsing, as @inikulin alludes to.

@boneskull
Copy link
Contributor

@domenic couldn't parallelization of file parsing be accomplished via cluster or IPC?

@domenic
Copy link
Contributor

domenic commented May 26, 2017

No, because the serialization overhead of sending it over IPC outweighs the benefit.

@vkurchatkin
Copy link
Contributor

@domenic the problem is that serialization is required anyway

@domenic
Copy link
Contributor

domenic commented May 26, 2017

Not when using SharedArrayBuffer (or transferring normal ArrayBuffers). And also not when using strings (which are immutable and thus don't need to be serialized).

@Fishrock123
Copy link
Contributor

Ah yes, right, I had almost forgot.

SharedArrayBuffer certainly makes Workers a lot more desirable.

@DronRathore
Copy link

DronRathore commented May 27, 2017

At Housing.com our Node processes listens to exchanges of RabbitMq to flush and update cached keys that we keep in memory, for e.g. List of whitelist domains, List of cities, List of experiments etc. It would be great if that task gets off-loaded from our main app and somehow through Shared buffers or other means we can do the updating of that cache.

This is one of the usecase where you want the background tasks to be really working in background and not in your process's main thread.

@ljharb
Copy link
Member

ljharb commented May 27, 2017

At Airbnb, we use https://npmjs.com/hypernova as a long-running node React (we only use React, but it can render anything) rendering service for our Rails app to use as a service. Web Workers seem like they would make for a much more efficient sandbox than vm currently offers to be able to render each request/batch of jobs in an isolated fashion.

@bnoordhuis
Copy link
Member

And also not when using strings (which are immutable and thus don't need to be serialized).

@domenic Immutable but not fixed. Strings are moved around by the garbage collector. They need to be copied out before they can be used in another VM.

Your point about ArrayBuffers and SharedArrayBuffers is correct though.

@addaleax
Copy link
Member Author

Web Workers seem like they would make for a much more efficient sandbox than vm

Seems like the realms proposal might be a more specific solution for isolation: https://github.com/tc39/proposal-frozen-realms

As far as I can tell, there are things you can’t do with Realms that you could do with Workers, like limiting memory usage (which you’ll usually want when running untrusted code is your goal).

@ljharb
Copy link
Member

ljharb commented May 30, 2017

Effectively what I want is both control over a Worker and over a Realm, but having a Worker gives me a Realm for free (and I assume that once the Realms API lands, I'll be able to use it in conjunction with creating a Worker, whether on the web or in node).

@cpojer
Copy link

cpojer commented May 30, 2017

There are a bunch of use-cases that people identified here and if we zoom out, it all comes down to one high-level feature: Map-reduce with efficient data structure sharing. Almost all problems can be reduced to that: bundlers need worker processes that offload parsing/compilation to other threads/processes. Test runners would like to parallelize test runs. Client side frameworks would like the ability to do server-rendering efficiently and in parallel. Almost all of these are CPU bound and currently slowed down by slow IPC. I'm happy to dogfood any implementation proposals in projects that we work on at Facebook.

@zxc122333
Copy link

May bring us some new programming models, like Actor model of Erlang and Akka.

Some libraries are exploring it, even in single-threaded environments.

shakyShane/actor-js
untu/comedy
alexeyraspopov/actor-system

@NawarA
Copy link

NawarA commented Jun 14, 2017

Today, we process billions of requests. We use Cluster with n-1 number of workers, so at least one CPU is available to process incoming requests on the OS level. That being said, as a Node.js user, we have the interest of keeping the event loops unblocked that process requests.

In a normal software model, we have hotpath (or critical path to get to response as fast as possible) and what can be considered background work. The most ideal setup is to have worker Event Loops process hot path, and pass work to Background Worker whose soul purpose is to process non-critical, yet important functions, such as logging, processing database reads and writes that can be done in the background, etc. This work can be done after hotpath is complete. Yet the event loop processing requests doesnt have to busy itself with that work, since its an Actor worried about a different objective.

Forgive the ascii graphic...the model I propose looks something like this:

1 master -to- n cluster workers 
    |             |...|
n background process workers

The outcome is Cluster Master does is its job round robining requests without being blocked and healing / spawning Cluster Workers that process Internet Requests. Every Cluster Worker pushes its non-critical background logic to a Background Workers. This frees Cluster Workers to be solely focused on throughput.

I recommend a postMessasge design where you can pass a PID, (possibly a class name?), and an object (a message). Like the postMessage API with anycast capability

I think this kind of capability and specific use case makes Node.js software designs capable of being generally more efficient

@tjconcept
Copy link
Contributor

I guess I have a use case, and if not, I hope someone can point me in another direction ;)

I have some static data that is loaded and indexed (a specialized tree structure). The service is spending almost all of it's CPU time traversing this (static) tree.
The downside is that I waste almost 1GB of memory per process I scale to.

The best scenario I can imagine, being a guy with zero threads experience, is if a variable can simply be shared across all instances spawned by the cluster module (maybe in a frozen condition) - but I guess that's not feasible.

@pemrouz
Copy link

pemrouz commented Jul 16, 2017

I think @cpojer sums this up well: most of the use cases are basically map-reduce.

However, in my case (fero), I essentially have a microservice consuming one log of events, processing it and producing another to fan out to clients. I wanted to split this so the business logic is run in one thread, and another thread picks up the output and writes to the wire.

This seems like the perfect use case for shared memory, since the output is already a Buffer, so offloading the socket.write should make things faster. However, all the userland implementations I tried actually made performance worse than doing it in a single thread, so it'd be interesting to see if native SharedArrayBuffer's will actually provide a performance boost.

@addaleax - in response to your question, I think some form of require would be required. I need to be able to do some I/O, and even for most of the map-reduce cases where they just process and respond back, I think without being able to access other files or npm modules they would too limited to be useful.

@refack
Copy link
Contributor

refack commented Jul 27, 2017

/cc @mogill

@devsnek
Copy link
Member

devsnek commented Jul 28, 2017

I had an idea to be using these for sharded network communication with external services, mainly over websocket. I'm not sure what the benefits of offloading json/etf parsing to a worker would be, mainly since i'm not sure of the actual performance of a good structured clone algorithm, but I think that use cases like this might be taken into account.

@refack
Copy link
Contributor

refack commented Jul 28, 2017

@gabrielschulhof

@mogill
Copy link

mogill commented Jul 29, 2017

A few years ago I developed a native addon for Node that provides shared memory parallelism and addresses some of the issued being discussed here. Extended Memory Semantics (EMS) is based on the Tera MTA/Cray XMT shared memory parallel programming and execution model so it supports both loop level parallelism and heterogeneous fork-join multitasking, but the atomic operations are different from those used with a SharedArrayBuffer.

The shared-memory parallelism use cases discussed here all share the need for some combination of:

  • Heterogeneous multi-tasking (HTML rendering, parsing JSON and CSV, transpiling)
  • Loop-level parallel computation (Map-Reduce, Matlab's parfor)
  • Shared memory that is interoperable between any number or kind of processes (Decreased messaging and redundant in-memory data)

EMS addresses the latter two better than the first, with some syntactic sugar heterogeneous parallelism could be made more idiomatic. The EMS distribution includes examples ranging from MPI-like bulk synchronous parallelism to OpenMP-like parallel loops to a single HTTP request forking parallel workers that generate a single response. Another project implemented a conventional single async front-end process that shared memory with a parallel back-end process. EMS does not prescribe a source of parallelism.

True zero-copy data sharing is not possible in Node because by definition each JS process is running in an isolated VM and the only way for data to move in/out of the VM is via copying. The ability to reference data in another VM (read: isolate) generally creates more problems than it solves (makes garbage collection nearly impossible, requires code modifications for synchronization, etc.).

By extension, this is also true for any kind of VM (VirtualBox, Python, JVM, containers) -- if a process is not performing a copy-in/out of data from the VM it is probably breaking an assumption somewhere in the VM. Although this architectural limitation is unfortunate, it also means inter-language data sharing has no additional cost, and EMS presently implements data sharing between any number of Node, Python, and C/C++ programs. EMS relies on the OS' virtual memory mechanisms to provide data persistence, which is not traditionally considered an aspect of shared memory but will become more so as non-volatile and software defined main memory becomes mainstream.

Conversion/serialization, or at least copying, of data is almost inevitable, but there is a substantial mechanical advantage to communicating data via shared memory instead of via an OS network communication protocol. Specifically, synchronous shared memory access is faster than the overhead needed to make it asynchronous, and is many orders of magnitude faster than exchanging network communication with a server process.

@jokeyrhyme
Copy link

@p3x-robot check this out: https://gist.github.com/hellerbarde/2843375

Let's assume that the latency for shared memory as discussed here is 100 ns, whilst for Redis over a 1gbps network would be 20,000ns. Given that the network stack is involved, I unscientifically imagine that Redis on the same box will be somewhere between 3000ns and 20,000ns.

That's an enormous difference, and for the sorts of number-crunching tasks that you'd especially want to parallelise, this could be the difference between being able to give an answer within a few seconds versus having to wait minutes or hours.

@bnoordhuis
Copy link
Member

It seems JSC is going to experiment with shared state threads at the language level:

https://webkit.org/blog/7846/concurrent-javascript-it-can-work/
https://bugs.webkit.org/show_bug.cgi?id=174276

The memory model is outlined in the blog post but it's essentially 'simple property access is atomic.'

Perhaps not relevant to Node.js short-term but still interesting.

@mogill
Copy link

mogill commented Sep 4, 2017

From an application programmer's perspective I believe the most important aspect of the Webkit proposal is that by default all variables are shared unless explicitly made thread-private. This may be desirable for new codes, but given JS's history as a devoutly single-threaded execution model we might expect few existing frameworks and libraries to work in a shared-by-default execution model.

It would be better to separate sharing data from the source of parallelism: A shared-nothing webworker/cluster model benefits from a way to share specific variables in the same way a shared-everything model benefits from a way to make specific variables private.

The difference is whether or not legacy JS code can be used in a parallel region.

It would be a lost opportunity to require multi-threaded JS applications to deal with the same challenges as languages that do not provide standard libraries that are thread-safe (i.e.: C++ and STL/Boost).

@hax
Copy link

hax commented Sep 11, 2017

Though JSC is going to experiment threads in JS, it's far from being language standards even stage 0 (I don't think other js engine teams agree this direction.) And it seems impossible to switch vm from V8 to JSC in Node.js. So it is totally irrelevant.

@bnoordhuis
Copy link
Member

@hax Not quite. One of Node's longer-term goals is VM agnosticism. node-chakra exists and works; a node-jsc is not out of the question if there is enough interest.

@mogill
Copy link

mogill commented Dec 17, 2017

While compiling a list of tools for parallel and/or shared-memory Javascript
I paid close attention to the examples (if any) used in the documentation (if any) and source code in hopes of getting an insight into the use cases for parallel JS, but this leads to the conclusion the Fibonacci sequence is an important and common problem, or worse, we don't know why we want parallel JS.

Stepping back to look at how explicit parallelism is used in languages which have supported it for decades (i.e.: C/C++, Java) the only commonality is very few programs actually need multiple cores and the ones that do always find a way regardless of the tooling. Most of those programs use a small degree of heterogenous parallelism to achieve what Node.js programmers get from asynchronous I/O and the event loop. Vanishingly few programs parallelize loop-level computation across dozens of cores.

I wouldn't consider a lack of responses to this issue's root question of use cases as a lack of interest in parallel JS as much as an artifact of interacting with a diverse and fragmented user base.

@hax
Copy link

hax commented Dec 18, 2017

But I think there is Shared Array Buffers since Node 9.

Without WebWorker APIs, SharedArrayBuffer is useless.

@p3x-robot
Copy link

Disco(...ssion)

@jokeyrhyme just one question, how do you use threads in a server farm? not redis is the fastest???

@p3x-robot
Copy link

I think the creator was not thinking on threads, but a little bigger, don't you think?

@p3x-robot
Copy link

p3x-robot commented May 3, 2018

Once you will be happy about threads are implemented, you will create a bigger system and you will use server node clusters including NodeJs and Redis (or something faster, but right now that is the best as I know) and you will forget about threads.

I think for an addon C++ is good in NodeJs, but pure JS it will be slow either with web-workers or without.

Functional: JS
Fast: C++ (or you can go down to assembly as well)

Look at IntelliJ, I love it, I use it only.

But still, it is slow as hell, one process, and frozen, yes, tons of threads, and a big indexing freezes the program. If they just used another process for indexing and if communicated and then re-loaded the index, it would be speedy like Chrome and NodeJs.

@steve2507
Copy link

So I've followed these worker implementation issues for quite a while now, and am happy to see an initial implementation on the horizon.
Nonetheless, I thought I chip in here for a use case. Upfront I already know people will find that this is not the proper reason to seek for thread workers in NodeJS (and I kinda agree), but here it is anyway.

I'm the author of the https://xible.io project (https://github.com/spectrumbroad/xible). It's basically a flow-based or graphical programming environment.
Each flow currently runs in it's own cluster (actually child_process.fork). The reason for that is that I do not want these flows to interfere with each other. I use xible for deployment pipeline automation for example, which can take up quite some resources. Having multiple deployment flows in an enterprise running side-by-side is not something you want to do on a single thread.

So far, that's all fine. fork absolutely fits this purpose. However, I also use xible for home automation in combination with a solution that I have yet to publish (codename glow at the moment). In this solution, 'scenes' are defined as a xible flow. This allows stuff such as dynamic lighting, closing curtains, etc, by simply defining your needs in a xible flow. Whenever I flip the light switch in a room, glow starts the xible flow for the selected scene.
The problem is that forking is a slow process, especially because all the initialization of xible has to be performed again. Requiring all the dependencies really kills startup performance. As a result, flipping the switch to turn the lights on (which in the background starts that flow), takes roughly 500ms. You 'feel' that lag when you flip the switch.

I actually track startup performance for xible here: https://xible.io/benchmarks. Startup time is the time it takes to start the new fork. Init time is the time it takes to get to a fully initialized state (all requires done, xible initialized completely, ready to actually run the flow).

Another possible solution for my 'issue' is nodejs/node#17058. But having threads where required modules are shared would be absolutely amazing in terms of performance.

To circumvent this init time somewhat, I created a setting in xible which allows you to configure that a flow always has an empty initialized fork available. This means that when the time comes to start it, a lot of the pre-work has already been done as the fork is already up-and-running. This does however use resources (memory predominately) while not actually being of direct use.
For me, this feels like a workaround rather than a solution.

@addaleax
Copy link
Member Author

addaleax commented Jun 5, 2018

@steve2507 It might be good to hear more about this. For now, Worker startup performance might not be significantly better than what child processes deliver, because a lot of work is spent initializing V8 and Node, similar to child processes. So, for now, the recommendation would be to use Worker thread pools – how does that fit into your situation?

@steve2507
Copy link

steve2507 commented Jun 6, 2018

@addaleax Thank you for your input! :-)
I could regard the worker thread pools as a similar solution as I have running right now; always have one or more initialized flow(s) on standby. The only thing is that as far as I know thread pools have a static amount of workers.

Also, where I would see a pooled worker thread as agnostic to what flow it runs; that would miss out on some of the possible optimizations. A flow consists of nodes, which can also require() and therefore impact startup performance. So if I would hang on to a single xible wide thread pool, I could pre-init xible itself in each of the threads, but not the flow and its nodes (because it is not known yet which flow will run in the future).
The current solution in xible actually spins up one 'dummy' fork per flow so it can also have the nodes related to that flow perform whatever work they need to do, far ahead of the actual flow start. This basically boils down to a dynamic threadpool per flow of n+1 where n is the amount of running instances of the flow.

Running with this setting of forking an extra process can be memory intensive by the way. For the deployment pipeline use case; this setting would be off because no one cares about that half a second startup time. Even if it would be on; memory is not an issue in this area because such machines usually do not restrict the amount of memory to a point where this becomes problematic.
For the home automation the setting is enabled in my current setup. A user can change this if they want, but I would recommend against it because of the before mentioned 'lag'. The problem here is that I run my home automation on a raspberry pi 3 because I expect that others that may be running this solution in the future will have similar hardware available.
A fairly simple scene/flow takes up roughly 30MB of memory per fork right now. If I have 20 scenes installed (which are not even turned on), that will take up 600MB of the 1GB available on a rasppi3...

To give you an example of what I mean by a simple scene; here's one that just turns the lights on, changes the color temperature and changes the color on the lights;
xible_glow_scene
The flow is not running (except for the 'dummy' fork); memory usage is on the left (maybe a bit hard to read but the middle graph goes up to 40MB)

So yeah, shared memory threads would be absolutely next-level for me. Negating the need for xible and node(pack)s to init/require more than once would cut down startup performance and memory usage by a massive margin.

Thanks again! Really do appreciate it!

@addaleax
Copy link
Member Author

addaleax commented Oct 3, 2018

I’m closing the existing issues here. If you have feedback about the existing Workers implementation in Node.js 10+, please use #6 for that!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests