Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

doc,zlib: improve note on threadpool usage #20380

Closed
wants to merge 1 commit into from

Conversation

lpinca
Copy link
Member

@lpinca lpinca commented Apr 28, 2018

Raise awareness against the catastrophic memory fragmentation that can
be created while using the asynchronous zlib APIs.

Refs: #8871

Checklist

@nodejs-github-bot nodejs-github-bot added doc Issues and PRs related to the documentations. zlib Issues and PRs related to the zlib subsystem. labels Apr 28, 2018
@mscdex
Copy link
Contributor

mscdex commented Apr 28, 2018

Changing the wording while still referring to the threadpool size doesn't seem to make much sense. I think the issue is two-fold: that too few threads could starve async zlib requests (if I understand the original text correctly) and the memory fragmentation issue.

So I think we should make this text clearer, with the threadpool link being associated with the former and perhaps some additional information for the latter.

@lpinca
Copy link
Member Author

lpinca commented Apr 28, 2018

Fragmentation is still caused by thread pool usage, didn't want to go into details as that depends on the OS/allocator.

@mscdex
Copy link
Contributor

mscdex commented Apr 28, 2018

What I mean is it's confusing when someone clicks that link, expecting to find more information about the
mentioned zlib issues. Is there something being implied with the link now? Should users be remedying the problem by lowering the thread pool size or by increasing it, which the text at the link suggests? Or something else?

@lpinca lpinca changed the title zlib: improve note on threadpool usage doc,zlib: improve note on threadpool usage Apr 28, 2018
@lpinca
Copy link
Member Author

lpinca commented Apr 28, 2018

Ok, I see. The text on https://nodejs.org/api/cli.html#cli_uv_threadpool_size_size seems ok

Because libuv's threadpool has a fixed size, it means that if for whatever reason any of these APIs takes a long time, other (seemingly unrelated) APIs that run in libuv's threadpool will experience degraded performance. In order to mitigate this issue, one potential solution is to increase the size of libuv's threadpool by setting the 'UV_THREADPOOL_SIZE' environment variable to a value greater than 4 (its current default value). For more information, see the libuv threadpool documentation.

I'm not sure how to change that to make it clearer. Suggestions are welcome.

@lpinca lpinca force-pushed the improve/threadpool-usage-note branch from 831e238 to f313879 Compare April 28, 2018 12:03
@mscdex
Copy link
Contributor

mscdex commented Apr 28, 2018

That text has nothing to do with the memory fragmentation problem being described now.

@lpinca
Copy link
Member Author

lpinca commented Apr 28, 2018

@mscdex better like this?

@mscdex
Copy link
Contributor

mscdex commented Apr 28, 2018

I suppose, if we do not have any concrete suggestions for the memory issue.

@lpinca
Copy link
Member Author

lpinca commented Apr 28, 2018

Linked issue has some suggestions like disabling THP or using a different allocator but they are impractical or ineffective.

information.
threadpool. This can lead to surprising effects in some applications, such as
subpar performance (which can be mitigated by adjusting the [pool size][])
and/or unrecoverable and catastrophic memory fragmentation.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO this warning is a bit lax. I think it would be best to give a more detailed explanation of what is wrong and how to mitigate it (CPU bound tasks need their time one way or the other, so a solution could be to use a separate Node.js instance as a worker that is connected to the main application with a queue and the queue sends new tasks as soon as the worker is done with one entry).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@BridgeAR Can you expand on that? I can’t really make out a difference between what you’re describing and how the libuv event loop works right now…

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, what how I understood the issue so far is that the actual call to the async functions cause the problem. The reason is that the task itself is CPU bound and if we trigger lots of async calls, we end up with catastrophic memory fragmentation. The libuv event loop can not prevent that each call will at least allocate some memory.

I just suggest to document that it is best to only have a single worker for n Node.js instances that will handle all the async tasks. The single worker could actually process m tasks in parallel, while m stands for the number of CPU cores. That should mitigate the issue, if I am not mistaken. Besides that we might want to re-evaluate the recommendation to always use async calls in case the actual work will be CPU bound. Using sync calls will definitely not cause this problem.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@BridgeAR In ws we are doing something similar. We use a queue to limit the maximum number of concurrent calls to zlib: https://github.com/websockets/ws/blob/690b3f277c6f5c3aef8cd84792929450f516b3ae/lib/permessage-deflate.js#L67-L73.

It helps but according to this comment even setting concurrency to 1 does not fully fixes the issue. Your suggestion can help with applications but it's a bit impractical for libraries.

Also the point of this PR is to only make people aware of the "issue". A detailed explanation of why and where it happens and how to mitigate it's out of the scope of this PR as that would require multiple pages of docs.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really a terrible warning. It is just scary works with no explanation or advice. You may as well have said "Warning: if you use this code you may die. Good luck!"

@BridgeAR
Copy link
Member

This needs a rebase.

Raise awareness against the catastrophic memory fragmentation that can
be created while using the asynchronous zlib APIs.

Refs: nodejs#8871
@lpinca lpinca force-pushed the improve/threadpool-usage-note branch from 9574a79 to 951e50c Compare May 29, 2018 20:11
@lpinca
Copy link
Member Author

lpinca commented May 29, 2018

Done.

@BridgeAR
Copy link
Member

@BridgeAR BridgeAR added the author ready PRs that have at least one approval, no pending requests for changes, and a CI started. label May 29, 2018
apapirovski pushed a commit that referenced this pull request May 30, 2018
Raise awareness against the catastrophic memory fragmentation that can
be created while using the asynchronous zlib APIs.

PR-URL: #20380
Refs: #8871
Reviewed-By: Ruben Bridgewater <[email protected]>
Reviewed-By: Anatoli Papirovski <[email protected]>
@apapirovski
Copy link
Member

Landed in 0234068

addaleax pushed a commit that referenced this pull request May 31, 2018
Raise awareness against the catastrophic memory fragmentation that can
be created while using the asynchronous zlib APIs.

PR-URL: #20380
Refs: #8871
Reviewed-By: Ruben Bridgewater <[email protected]>
Reviewed-By: Anatoli Papirovski <[email protected]>
@lpinca lpinca deleted the improve/threadpool-usage-note branch May 31, 2018 19:16
@MylesBorins MylesBorins mentioned this pull request Jun 6, 2018
@lpinca lpinca mentioned this pull request Feb 6, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
author ready PRs that have at least one approval, no pending requests for changes, and a CI started. doc Issues and PRs related to the documentations. zlib Issues and PRs related to the zlib subsystem.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants