Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Forked processes are unable to resolve node_modules/ #312

Closed
lexor90 opened this issue Sep 7, 2018 · 12 comments
Closed

Forked processes are unable to resolve node_modules/ #312

lexor90 opened this issue Sep 7, 2018 · 12 comments

Comments

@lexor90
Copy link

lexor90 commented Sep 7, 2018

As reported in #304

Using cluster.fork() breaks the resolution, as the child process seems to have the original node resolver.

Currently, if a process spawns other processes these don't get the patched node_loader.js. It means that every time you fork a process it will start failing because it cannot resolve any require('xyz').

An example use case might be a web server where you clusterize the process sharing a single port (the master waits for connections, workers do the request handling - in node this is so easy that it's very common). This can be worked around starting node in a single process (no master-worker architecture) and then use some orchestrator to have multiple copies of the process (but of course you need an external load balancer, which might not be an option for everyone).

Another valid example might be a process willing to spawn separated processes for heavy data calculations to avoid blocking the main thread and have the master to respond to RPC calls.

@alexeagle
Copy link
Collaborator

@gregmagolan and I discussed how doing as suggested in #315 would fix this.

We could change our approach to have an action that lays out a node_modules tree with a composition of node_modules from the local repository, remote repositories, and JS outputs of rules. Then at runtime we'd only need to point nodejs at this output directory and could use vanilla nodejs resolution. It would move the complexity from runtime to build-time.

I think we'll do an investigation of this once we're done landing the fine-grained node modules everywhere and confirm the performance improvements.

@nimerritt
Copy link
Contributor

@alexeagle I did something very similarly internally at work but the performance was terrible since I needed to construct the node_modules layout for every ts_module (similar to ts_library) target in order to have TypeScript resolve everything since I couldn't figure out a good way to re-root a tree. See https://groups.google.com/forum/#!searchin/bazel-discuss/tree$20rooted%7Csort:date/bazel-discuss/JbA19d3vBis/Uv9j4TOwEQAJ In the end, I went back to using the module_mapping functionality provided by the rules_nodejs. I could have used the path mapping of the tsconfig.json file to resolve this partly, but wanted to keep the behaviour consistent with the runtime behaviour.

I'd be good to see the custom require code go away in any case. I'd be curious to know if you come up with a good solution to this!

@lewish
Copy link
Contributor

lewish commented Feb 13, 2019

@nimerritt could you elaborate on your workaround regarding using module_mapping?

@lewish
Copy link
Contributor

lewish commented Feb 18, 2019

For anyone else coming across this, I have a workaround.

You can use the nodejs_binary to generate a _bin_loader.js script for the entry point JS file that you would actually like to run. You then fork the loader script instead of the raw JS file. That way module mappings get loaded correctly and your modules can be resolved. You can bundle this generated output into the library itself as part of the data attribute so it actually get's packaged up with your code. The bin loader script also appears to be a no-op when used outside of the bazel context so this is working fine for us when published to NPM.

Our BUILD file ended up looking something like this:

ts_library(
    name = "lib_internal",
    srcs = glob(["**/*.ts"]),
    deps = [
        ...
    ],
)

nodejs_binary(
    name = "fork_script",
    data = [
        ":lib_internal",
    ],
    entry_point = "workspace_name/fork_script.js",
)

ts_library(
    name = "lib",
    srcs = [],
    data = [
        ":fork_script_bin_loader.js",
    ],
    deps = [
        ":lib_internal",
    ],
)

And then when you want to actually fork, you can do something like:

const child = fork(require.resolve("./fork_script_bin_loader"));

@alexeagle
Copy link
Collaborator

Thanks for documenting that @lewish - this is a nice workaround

@lewish
Copy link
Contributor

lewish commented Feb 20, 2019

Looks like the bin_loader script isn't actually a no-op outside of bazel, so I added in small hack for this:

const forkScript = process.env["BAZEL_TARGET"] ? "../fork_script_bin_loader" : "./fork_script";
const child = fork(require.resolve(forkScript));

@thesayyn
Copy link
Collaborator

Actually, this is not an enhancement I think. Forking is a must-have feature in the nodejs world.

@Toxicable
Copy link

I do think this feature is required, even to run terser in parallel #833, we have to hack to get around this, therefore fixing it would it much easier

@alexeagle Suggested having a "slow" version of nodejs_binary, whereby we build a single node_modules tree which would allow the native require to resolve all modules.
The downsides is it would be slow, moving around so many files could end up being very costly in IO.

@filipesilva Suggested the use of NODE_OPTIONS with the --require flag.
What this means is that instead of patching require then calling the users entry point as we currently do, we instead start node with and environment variable like NODE_OPTIONS="--require ./bazel-resolve-patch.js" meaning that node itself would call the patch before executing the entry point.
This will work for all types of child_process since the environment variable would be passed down and re-executed by the child.
However the same does not apply for worker_threads and it would have to be done by the user in the worker script with something like require(process.env.NODE_OPTIONS.split(' ')[1]);

While I like the second option a lot I think we can do better to come up with a method that is no slower than the current implementation and without having to change current worker scripts. However it may be a reasonable solution for now since child_process is much more common than worker_threads

@thesayyn
Copy link
Collaborator

@Toxicable will the forked process be able to find the patch.js file in second option because since the forked process can have different CWD rather than its parent's CWD?

@Toxicable
Copy link

@thesayyn not if we reference it as a relative script, but if we placed it in a node_modules, then nodejs goes through all parent dirs looking for node_modules

@thesayyn
Copy link
Collaborator

I this case it's not a elegant solution I think.

Custom loaders and likewise solutions can come up with a problem easily in the future.

The solution should output artifacts that can be run outside of the bazel world. I guess I'll +1 the first solution which we put the all artifacts into node_modules on build time.

@alexeagle
Copy link
Collaborator

@thesayyn you can +1 the new //internal/linker which puts all the artifacts into node_modules

It doesn't do it at build time though - that's too slow as observed earlier. Instead we sneakily do it as a separate program just before launching node.

I think we can close this now that the linker is in use. We'll keep #315 to track the work of actually moving all bazel node programs away from the custom resolver - but if you want to fork processes you now can.

@Toxicable let's work on getting your terser worker threads landed in //packages/terser now? let me know how I can help

alexeagle pushed a commit to alexeagle/rules_nodejs that referenced this issue Oct 17, 2020
alexeagle pushed a commit to alexeagle/rules_nodejs that referenced this issue Oct 18, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants