Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

worker: motivating examples? #33880

Closed
ronag opened this issue Jun 14, 2020 · 10 comments
Closed

worker: motivating examples? #33880

ronag opened this issue Jun 14, 2020 · 10 comments
Labels
doc Issues and PRs related to the documentations. question Issues that look for answers. worker Issues and PRs related to Worker support.

Comments

@ronag
Copy link
Member

ronag commented Jun 14, 2020

I'm having a hard time figuring out when I would actually want to use workers over a child process.

The docs provide the following motivation:

Unlike child_process or cluster, worker_threads can share memory. They do so by transferring ArrayBuffer instances or sharing SharedArrayBuffer instances.

However, I'm having a hard time finding a case where I would actually use this. Usually I would have either strings or objects that I would like to pass onto a worker. Maybe something like https://capnproto.org/ could take advantage of this? But from what I can see such approaches are actually quite slow in javascript.

Also, from what I understand actually transferring Buffers to workers have some edge cases and could fallback to become a copy.

I'm not saying it's a bad feature. I would really love to try it out and use it. I'm just having trouble finding the use case for it.

Maybe someone could provide some ideas/examples of when this would be useful would be helpful as a point of inspiration?

@ronag ronag added worker Issues and PRs related to Worker support. question Issues that look for answers. labels Jun 14, 2020
@addaleax addaleax added the doc Issues and PRs related to the documentations. label Jun 14, 2020
@addaleax
Copy link
Member

I'm having a hard time figuring out when I would actually want to use workers over a child process.

The short answer I usually give is “for CPU-intensive work that should be offloaded from the main thread and that benefits from fast communication between threads”. I know that that’s kind of generic, but I think it describes the niche that it fits in quite well.

The docs provide the following motivation:

Unlike child_process or cluster, worker_threads can share memory. They do so by transferring ArrayBuffer instances or sharing SharedArrayBuffer instances.

However, I'm having a hard time finding a case where I would actually use this. Usually I would have either strings or objects that I would like to pass onto a worker.

One classic example would be image processing – for example, you build an HTTP server which receives images as requests, and which returns a blurred variant of them as the response. That’s CPU-intensive, but the communication is fast because you can just transfer the image data to/from Worker threads without copying.

Also, from what I understand actually transferring Buffers to workers have some edge cases and could fallback to become a copy.

I wouldn’t worry about this, tbh. This is not relevant for performance – the Buffers which are copied even if they are listed as transferable tend to be small in size (and have a size limit, namely Buffer.poolSize / 2), where the difference between copying and transferring is relatively small. The reason this is called out in the docs is that this is a visible behavior change on the sending side.

I'm not saying it's a bad feature. I would really love to try it out and use it. I'm just having trouble finding the use case for it.

Yeah, I feel this. It’s hard to come up with good examples, because it’s hard to find simple CPU-intensive applications – usually, there’s too much complexity in there that would add more code to the example than the Worker + message passing code itself. In the image processing case above, the blurring code would likely make up most of the example code.

For a blogpost I wrote a while ago, I wrote a Sudoku-solving server – also probably not a real-world example, but maybe it’s close enough?

More real-world examples would include text processing, machine learning, data analysis, etc. – but again, that’s hard to use as examples.

@devsnek
Copy link
Member

devsnek commented Jun 14, 2020

Large discord bots spend a lot of cpu time processing incoming events from discord. Using worker threads one can shard discord connections.

@benjamingr
Copy link
Member

I'm having a hard time figuring out when I would actually want to use workers over a child process.

I've used workers for computationally expensive tasks where shared memory was important and the cost of cross-process shared memory considerably was more expensive than threads.

Some examples:

  • Any numerical calculation (like matrix multiplication if you want a more concrete example). If you want toy examples: calculating the mandlebrot set multithreaded.
  • Going through a large HTML document as text and processing different parts of it in parallel.
  • Running code so that it doesn't have to "block" the thread - for example spin a worker_thread for (trusted) user supplied code. This is useful if you need shared concurrent memory between your run and theirs.

It's hard to give concrete examples that don't go into specific problem domains that are CPU intensive but basically anything that is:

  • CPU intensive
  • easy to parallelise with shared memory
  • hard to parallelise without shared memory

@ronag
Copy link
Member Author

ronag commented Jun 15, 2020

It's hard to give concrete examples that don't go into specific problem domains that are CPU intensive but basically anything that is:

I don't quite agree with this. As far as I understand, workers are only useful when it's possible to either take advantage of shared memory or transferable buffers. Just being CPU intensive or parallelizable without shared memory is not enough.

Are buffers from sockets and/or files transferrable? I think at least file streams use a buffer pool which makes them untransferrable? If I have to make a new buffer i.e. a buffer for an image I'm reading in (a copy), I might as well copy them over and make the complete buffer in the child process?

@addaleax
Copy link
Member

It's hard to give concrete examples that don't go into specific problem domains that are CPU intensive but basically anything that is:

I don't quite agree with this. As far as I understand, workers are only useful when it's possible to either take advantage of shared memory or transferable buffers. Just being CPU intensive or parallelizable without shared memory is not enough.

Workers are useful for any CPU-intensive task that can be run in parallel, but yes, if sharing memory is something you can take advantage of then that’s something where Workers will really shine. If not, whether you’re better off using Workers or using child processes is more subtle.

Are buffers from sockets and/or files transferrable?

Buffers from sockets are generally transferrable currently.

I think at least file streams use a buffer pool which makes them untransferrable?

We should probably set the untransferable-marker on those buffers, yes (similar to #32759).

If I have to make a new buffer i.e. a buffer for an image I'm reading in (a copy), I might as well copy them over and make the complete buffer in the child process?

If you have to copy anyway, then it doesn’t really matter where the copy happens, yes.

@ronag
Copy link
Member Author

ronag commented Jun 15, 2020

We should probably set the untransferable-marker on those buffers, yes (similar to #32759).

How much do we actually gain in performance by pooling the buffers like that? Might be worth to go in the other direction, i.e. making them transferrable?

@addaleax
Copy link
Member

We should probably set the untransferable-marker on those buffers, yes (similar to #32759).

How much do we actually gain in performance by pooling the buffers like that? Might be worth to go in the other direction, i.e. making them transferrable?

@ronag That sounds like a good idea to me, yes. I would not expect the creation performance benefit to be significant enough here.

And looking at the code, I noticed that we share a pool for all fs.ReadStreams, but base its size on the HWM of individual streams, which is also not great in terms of consistency.

Do you want to open a PR for that?

@ronag
Copy link
Member Author

ronag commented Jun 19, 2020

Do you want to open a PR for that

Sure!

ronag added a commit to nxtedition/node that referenced this issue Jun 20, 2020
The performance benefit of using a custom pool are negligable.
Furthermore, it causes problems with Workers and transferrable.
Rather than further adding complexity for compat with Workers,
just remove the pooling logic.

Refs: nodejs#33880 (comment)
Fixes: nodejs#31733
ronag added a commit that referenced this issue Jun 23, 2020
The performance benefit of using a custom pool are negligable.
Furthermore, it causes problems with Workers and transferrable.
Rather than further adding complexity for compat with Workers,
just remove the pooling logic.

Refs: #33880 (comment)
Fixes: #31733

PR-URL: #33981
Reviewed-By: Anna Henningsen <[email protected]>
Reviewed-By: Ben Noordhuis <[email protected]>
@gireeshpunathil
Copy link
Member

though the issue started with generic worker use case discussions, is it fair to say it converged into #33981 which is landed now? and if so, can we close this?

@jasnell
Copy link
Member

jasnell commented Dec 7, 2020

I think so, the discussion seems to have run it's course.

@jasnell jasnell closed this as completed Dec 7, 2020
targos pushed a commit to targos/node that referenced this issue Apr 25, 2021
The performance benefit of using a custom pool are negligable.
Furthermore, it causes problems with Workers and transferrable.
Rather than further adding complexity for compat with Workers,
just remove the pooling logic.

Refs: nodejs#33880 (comment)
Fixes: nodejs#31733

PR-URL: nodejs#33981
Reviewed-By: Anna Henningsen <[email protected]>
Reviewed-By: Ben Noordhuis <[email protected]>
targos pushed a commit that referenced this issue Apr 26, 2021
The performance benefit of using a custom pool are negligable.
Furthermore, it causes problems with Workers and transferrable.
Rather than further adding complexity for compat with Workers,
just remove the pooling logic.

Refs: #33880 (comment)
Fixes: #31733

PR-URL: #33981
Backport-PR-URL: #38397
Reviewed-By: Anna Henningsen <[email protected]>
Reviewed-By: Ben Noordhuis <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
doc Issues and PRs related to the documentations. question Issues that look for answers. worker Issues and PRs related to Worker support.
Projects
None yet
Development

No branches or pull requests

6 participants