Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revert "module: have a single hooks thread for all workers" #53183

Merged
merged 2 commits into from
Jun 2, 2024

Conversation

mcollina
Copy link
Member

This reverts commit 22cb99d.

Given #53182 and #53097, it's better to revert this to fix the regressions instead. Note that the application is deadlocking when using register() within a worker thread.

Note that #52706 did not mention any of the limitations introduced by the change in OP.

This important change was only mentioned in the code and some internal discussionsz:
https://github.com/nodejs/node/pull/52706/files#diff-c7f1002d283d353145656887966dfbdd0ea26e13e398b21023d093c6248feaddR8.

@mcollina mcollina added the tsc-agenda Issues and PRs to discuss during the meetings of the TSC. label May 28, 2024
@nodejs-github-bot
Copy link
Collaborator

Review requested:

  • @nodejs/loaders

@nodejs-github-bot nodejs-github-bot added lib / src Issues and PRs related to general changes in the lib or src directory. needs-ci PRs that need a full CI run. labels May 28, 2024
@mcollina
Copy link
Member Author

I've added myself to @nodejs/loaders to be notified and catch those earlier. I'm sorry for the delayed review.

@mcollina
Copy link
Member Author

We could also land this revert in v22 until a follow-up PR is made to restore previous capabilities.

@mcollina
Copy link
Member Author

@dygabo you did great work for this change, but I don't think we can ship "one loader" without the support of calling register inside worker threads. That's needed for quite a lot of uses cases from running unit tests in workers, to hot reload, to just simply reusing code across the ecosystem.

@dygabo
Copy link
Member

dygabo commented May 28, 2024

@mcollina I see. just fwiw these decisions were discussed on the original PR and on slack. After review the conclusion was to explicitly have it like this until module.register is sorted out. I might be able to look into it but not sure when/if. Probably the commit itself can be reused and built upon.

@ShogunPanda
Copy link
Contributor

@dygabo
The issue I found is that if you call register within a worker thread the CustomizedLoader gets called and it will try to proxy to the main thread.
Except that on the main thread nobody called register and thus no hooks thread is responding.
Do know how we can eventually force a hooks thread to always exist (despite of register being called or not) so that we can solve this? (moreover, this will also allow us to use makeAsyncRequest to call register from a worker thread and thus lift the limitation imposed by the PR.

@dygabo
Copy link
Member

dygabo commented May 28, 2024

@ShogunPanda I have seen that but as discussed on the other PR I did not see a quick solution to that yet. That is the cause of the hanging indeed and removing it makes it not hang but that is not a solution to commit now IMO.

Current implementation starts from the premise that there is an internal worker for the customization hooks that gets created by the main thread (that is because the main thread has a --import and/or if it calls module.register to register some hooks. A solution should be to allow any thread to do that and the first worker that calls module.register would instantiate the InternalWorker. I could not find an easy way to do that yet either. Or as you proposed, make sure that it exists even if nobody on the main thread needs it. But having the 1:N in this case is still an issue that should be discussed at design level because it comes with effects. We might end up having the discussion that a worker called a module.register to register a hook but for example an import on the main thread should not be affected by that.
Having one thread to run the hooks but N chains will make the problem more complex. I especially think that any solution is opinionated and we might discuss the alternatives independent of what way we choose. Since we are talking about opinions. I consider living with this restriction maybe feasible but it definitely changes the current behavior. As alternative, having one hooks thread for all workers and allowing module.register from all user threads is nicer but IMO not straight forward. Might be worth looking into it though. It must be clear from the beginning that a worker calling that influences the way modules are being loaded on all other worker threads from that moment on. For that reason I considered main thread being responsible for the hooks chain would be a sensible thing to do.

Copy link
Member

@GeoffreyBooth GeoffreyBooth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change that this PR intends to revert was a bugfix, so reverting it just reintroduces the bug; so therefore we would need to land it a second time to fix the bug again. We’ve heard frequent feedback from users that changes to the hooks APIs are annoying, so we should try to avoid unnecessary churn. Before we revert this, we should investigate other solutions.

It seems clear that calling register from a worker thread and have it do nothing, or crash, is a bug. However there’s a band-aid for that: Node could error when register is called from a worker thread. That should be a simple fix that we could ship quickly without needing a revert/re-land cycle, and users can react to it by wrapping their register calls in if (isMainThread), which would not need to be reverted later. Then if we find a way to make register behave from worker threads the same way it acts on the main thread, we can enable it on worker threads and remove the error.

It’s never been part of the design to allow for different hooks to apply to different threads. From the start of the off-thread discussion, performance has always been a core concern; that’s why it was the intent that there never be more than one hooks thread per process, regardless of how many user threads there are. It’s interesting that folks have found uses for a “one hooks thread per worker thread” model, but that behavior was an accident. We could potentially support it in addition to the intended design, but I think it would be better to back up and ask what the use cases are that are satisfied by such a model and evaluate the best method for achieving them.

@ShogunPanda
Copy link
Contributor

Current implementation starts from the premise that there is an internal worker for the customization hooks that gets created by the main thread (that is because the main thread has a --import and/or if it calls module.register to register some hooks. A solution should be to allow any thread to do that and the first worker that calls module.register would instantiate the InternalWorker. I could not find an easy way to do that yet either.

Me neither, I have tried all day long.

Or as you proposed, make sure that it exists even if nobody on the main thread needs it.

I think this is the best way to solve the current starving.

But having the 1:N in this case is still an issue that should be discussed at design level because it comes with effects. We might end up having the discussion that a worker called a module.register to register a hook but for example an import on the main thread should not be affected by that.

I agree. They should not be affected.

Having one thread to run the hooks but N chains will make the problem more complex. I especially think that any solution is opinionated and we might discuss the alternatives independent of what way we choose. Since we are talking about opinions. I consider living with this restriction maybe feasible but it definitely changes the current behavior. As alternative, having one hooks thread for all workers and allowing module.register from all user threads is nicer but IMO not straight forward.

I instead think (and incurred) into the new limitation as not viable.

@GeoffreyBooth
Copy link
Member

Or as you proposed, make sure that it exists even if nobody on the main thread needs it.

I think this is the best way to solve the current starving.

This is not a viable solution. When Node is run in the default case where no hooks are registered, it needs to be running just a single thread. We can’t create the hooks thread all the time whether or not the user might use it; that would be a large performance penalty for anyone not using the hooks API.

@ShogunPanda
Copy link
Contributor

I see. It totally makes sense.

In my opinion I would keep the separate threads (as it is for instance in 22.1) and only start them when needed by a worker thread. This way we don't have to keep a single thread that might run different chains.
If we want to keep the single thread approach then we need the following:

  1. Allow register from the worker threads. The first invocation might eventually spawn the hook thread if missing.
  2. Tag hooks with the registering worker thread, so that when running the chain we can eventually filter them out. (with a logic still to determine).

Note that the need for different chains per worker is totally foreseeable, for instance:

  1. The user wants to spawn two workers, one that transpiles TypeScript and another that transpiles CoffeeScript.
  2. The user wants to spawn two workers that mock different APIs.

These are only examples, obviously.

@GeoffreyBooth
Copy link
Member

Note that the need for different chains per worker is totally foreseeable, for instance:

  1. The user wants to spawn two workers, one that transpiles TypeScript and another that transpiles CoffeeScript.
  2. The user wants to spawn two workers that mock different APIs.

That’s not a use case intended by the current design. That’s not to say it’s undesirable, but it’s a feature request for something in addition to the more common use case of “I want to write my entire app in TypeScript, and some of my app code runs in worker threads, and I want to register hooks to transpile the whole app.”

I think you should open a separate issue with the specific use case(s) you have in mind, with less focus on the implementation details of things like “the user wants to spawn two workers.” Like for example the first one, if phrased as “The user wants to transpile TypeScript and CoffeeScript” is completely possible today, by filtering on file extension: a load hook can do one thing for URLs ending in .ts and another for URLs ending in .coffee, regardless of which thread they’re referenced from. That’s a lot more direct than determining what customizations should be done based on what thread the module is on.

The second one could be phrased as “The user wants to define mocks that only apply to certain modules.” It's an implementation detail that the modules to apply the mocks to are all contained within a particular thread. I won’t propose solutions to this one here but my point is just that worker threads are only one way to approach this.

  1. Allow register from the worker threads. The first invocation might eventually spawn the hook thread if missing.

This is something we should allow, unless there’s some technical reason why it’s unachievable.

@jasnell
Copy link
Member

jasnell commented May 28, 2024

Given that this broke people, I think a revert and a second attempt at fixing the bug is most appropriate. If nothing else, revert in the release branch if we want to keep working on this in the main branch.

@GeoffreyBooth
Copy link
Member

GeoffreyBooth commented May 29, 2024

Given that this broke people, I think a revert and a second attempt at fixing the bug is most appropriate.

A revert trades one bug for another. The intended behavior of calling register from a worker thread is that it behaves the same way as calling register from the main thread: a hooks thread is created if it doesn’t already exist, and the new hooks are added and apply to all threads.

But that’s not what was happening before #52706: before that PR, calling register on the main thread applied customizations only to the main thread instead of to all threads. That’s never been intentional, and #50752 was opened six months ago. This is the bug that #52706 fixed. In exchange, we got #53182, where calling register from a file referenced by new Worker execArgv --import causes a hang. (Calling register from within the worker thread’s code is a noop, as it was in 22.1.0, but that’s a separate bug.)

I agree that the process hanging is a much worse bug than the hooks applying only to one thread instead of all threads. But the options aren’t to either just revert or not, choosing one bug or the other bug; there’s another change that we should be able to land before the next release: just throw an exception if register is called from new Worker execArgv --import. That’s better than the status quo, not as good as getting to the intended behavior; but it has the advantage that it doesn’t get people accustomed to incorrect behavior (of register not applying to all threads). Ideally we can go straight to the ideal behavior, where register works as intended from anywhere, but seeing as #50752 took six months to get fixed I think we need to assume that whatever option we choose might stick around for far longer than we’d like.

@alan-agius4
Copy link
Contributor

alan-agius4 commented May 29, 2024

As requested, I have opened another issue with a feature request to describe our use case for allowing worker threads to specify custom loader hooks.

It is important to note that the current change in #52706 is not backward compatible. In previous versions, the ESM loader specified on the main process did not propagate to child workers. Consequently, from a maintenance perspective, tools now need two methods to use ESM custom loaders with thread workers and must support Node.js LTS and Active versions.

TL;DR: It would be ideal if thread workers could specify additional ESM loader hooks. We the Angular team rely on this specific behaviour.

@mcollina
Copy link
Member Author

As an additional data point, #52706 broke using the ava test framework with typescript. Note that ava runs tests inside worker threads by default.

@jasnell
Copy link
Member

jasnell commented May 29, 2024

@GeoffreyBooth:

A revert trades one bug for another...

I understand that, but that's also exactly what this PR did as well and our priority/emphasis has always been on avoiding new breaks to existing code. If there is an alternative fix that can be agreed on quickly then ok, let's see that PR. If that PR is going to take some time to work up, then the right thing to do is revert until the corrected solution can be implemented.

@GeoffreyBooth
Copy link
Member

If there is an alternative fix that can be agreed on quickly then ok, let’s see that PR.

It looks like we have one: #53200

@mcollina mcollina added request-ci Add this label to start a Jenkins CI on a PR. commit-queue-rebase Add this label to allow the Commit Queue to land a PR in several commits. labels May 31, 2024
@mcollina
Copy link
Member Author

please keep the two commits when landing because one is a spurious formatting change needed to make the lining CI happy (no idea why it shows up in this specific PR).

@github-actions github-actions bot removed the request-ci Add this label to start a Jenkins CI on a PR. label May 31, 2024
@nodejs-github-bot

This comment was marked as outdated.

@targos
Copy link
Member

targos commented May 31, 2024

Why keep them separate? It seems better to squash.

I think we are in a case where the initial code was not linter-compliant but was just never changed since the linter was introduced or updated (we only lint modified files).

@mcollina mcollina added commit-queue-squash Add this label to instruct the Commit Queue to squash all the PR commits into the first one. and removed commit-queue-rebase Add this label to allow the Commit Queue to land a PR in several commits. labels May 31, 2024
@mcollina
Copy link
Member Author

Ok

@nodejs-github-bot

This comment was marked as outdated.

@nodejs-github-bot
Copy link
Collaborator

@mcollina
Copy link
Member Author

mcollina commented Jun 2, 2024

I'll open the revert revert PR tomorrow.

@nodejs-github-bot
Copy link
Collaborator

@nodejs-github-bot
Copy link
Collaborator

@aduh95 aduh95 added the commit-queue Add this label to land a pull request using GitHub Actions. label Jun 2, 2024
@nodejs-github-bot nodejs-github-bot removed the commit-queue Add this label to land a pull request using GitHub Actions. label Jun 2, 2024
@nodejs-github-bot nodejs-github-bot merged commit 8c5c2c1 into nodejs:main Jun 2, 2024
62 checks passed
@nodejs-github-bot
Copy link
Collaborator

Landed in 8c5c2c1

@GeoffreyBooth
Copy link
Member

I’ll open the revert revert PR tomorrow.

Is there a benefit to opening and landing a revert-revert if it can’t get released? Maybe we should just rebase #53200 on top of the commits from this branch, and then whenever that PR lands it can go out in a release?

Then there’s no potential confusion for releasers, of needing to be aware that a particular commit (the revert-revert) should never get released without the follow-up that fixes its bug.

@ShogunPanda
Copy link
Contributor

@GeoffreyBooth I agree. If #53200 or my future PR (or both) will land this is by far the easiest way.

targos pushed a commit that referenced this pull request Jun 3, 2024
This reverts commit 22cb99d.

PR-URL: #53183
Reviewed-By: Geoffrey Booth <[email protected]>
Reviewed-By: James M Snell <[email protected]>
Reviewed-By: Gerhard Stöbich <[email protected]>
@RafaelGSS RafaelGSS mentioned this pull request Jun 7, 2024
sophonieb pushed a commit to sophonieb/node that referenced this pull request Jun 20, 2024
This reverts commit 22cb99d.

PR-URL: nodejs#53183
Reviewed-By: Geoffrey Booth <[email protected]>
Reviewed-By: James M Snell <[email protected]>
Reviewed-By: Gerhard Stöbich <[email protected]>
bmeck pushed a commit to bmeck/node that referenced this pull request Jun 22, 2024
This reverts commit 22cb99d.

PR-URL: nodejs#53183
Reviewed-By: Geoffrey Booth <[email protected]>
Reviewed-By: James M Snell <[email protected]>
Reviewed-By: Gerhard Stöbich <[email protected]>
@marco-ippolito marco-ippolito added the dont-land-on-v20.x PRs that should not land on the v20.x-staging branch and should not be released in v20.x. label Jul 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
commit-queue-squash Add this label to instruct the Commit Queue to squash all the PR commits into the first one. dont-land-on-v20.x PRs that should not land on the v20.x-staging branch and should not be released in v20.x. lib / src Issues and PRs related to general changes in the lib or src directory. needs-ci PRs that need a full CI run. tsc-agenda Issues and PRs to discuss during the meetings of the TSC.
Projects
None yet
Development

Successfully merging this pull request may close these issues.