Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support fine grained dependencies on node_modules #18

Closed
kamalmarhubi opened this issue Sep 12, 2017 · 8 comments
Closed

Support fine grained dependencies on node_modules #18

kamalmarhubi opened this issue Sep 12, 2017 · 8 comments

Comments

@kamalmarhubi
Copy link

We have a monorepo with a vendored node_modules tree. If we evaluate bazel for this codebase, our ideal use would be to allow depending on individual modules in library targets. This would let our build take advantage of bazel's sandboxing and knowledge of dependencies to vastly reduce the work done, while maintaining reproducibility.

These rules appear to discourage fine-grained dependencies on modules, which will result in lots of wasted builds whenever a dependency is added, updated, or removed. Any suggestions on how to handle our use case / info on planned changes that might make it easier?

@alexeagle
Copy link
Collaborator

Just to clarify, you have a tree

repo
 - our_app
 - vendor
   - node_modules
     - <stuff in here is binary distributions layed out following the npm conventions, same as npm install would do>

Or is your vendored directory actually sources that need to be built?

vastly reduce the work done

do you have evidence that a smaller node_modules/* input to the build or to individual actions makes it any faster? I could imagine that for remote actions the input size is a problem, but I haven't seen this problem locally

As a workaround, the rules here accept a node_modules attribute which you could assign to a label that is filegroup-like, which would let you reduce which inputs the nodejs_binary uses.

@kamalmarhubi
Copy link
Author

kamalmarhubi commented Sep 17, 2017

@alexeagle we actually have

repo
├── node_modules
│   └── <stuff in here is binary distributions layed out following the npm conventions, same as npm install would do>
└── ourstuff

where we basically commit results of npm install to our repo. (This makes the few native modules we use special snowflakes, but that's something I would hope to fix via bazel.)

The concern I originally had with the monolithic node_modules dependency is that each executable / script in our repo only depends on only a subset of the hundreds of modules in node_modules. By having a single node_modules target, any update to any module will require a rebuild / retest of the whole repo, instead of only those targets dependent on the changed or added module.

I don't have any measurements, as there are no build rules for coffeescript, which is what a substantial portion of the repo is written in. However your question made me take a closer look at our commit history. It looks like node_modules changes less often than I'd originally thought, so this is probably not a huge issue during development. From an artifact size perspective, however, if we package nodejs_binary rules for distribution they'll end up being hundreds of MB if they use even one module. There may still be some value in allowing dependencies on individual modules.

Curious to hear your thoughts!

@alexeagle
Copy link
Collaborator

for nodejs_binary deployments on the server we don't care that much, a few hundred MB is the standard size for Java programs in a large monorepo with a lot of code sharing and a deep tech stack.
For a presumed npm_bundle rule (#10), none of the node_modules would be included, only the package.json file.

Note that you can have multiple node_modules_xx targets, each a filegroup with a manually curated set of glob patterns, to satisfy your use case - but I think the manual curation won't be practical over time or at scale. Maybe that can be a workaround if we do discover some reason that the whole of node_modules can't be an input.

There's a scheme in https://github.com/dropbox/rules_node to generate the bazel workspace from the package.json, which you might want to check out.

@pauldraper
Copy link

pauldraper commented Sep 5, 2018

For an idea of "how big are we talking", take https://angular.io/guide/quickstart . Following the instructions, I get a node_modules directory of 335 MB. Given that's just the recommended "getting started" Angular app, I think it's not unlikely for a combined node_modules/ for a monolithic repo to be better part of a gigabyte.

That causes several problems with rules_nodejs:

  1. Install times are lengthy if you use anything in node_modules.
  2. It disobeys the law of incrementality, which is that the time rebuild after a change is proportional to the size of the change (rather than the size of the code). (@kamalmarhubi Support fine grained dependencies on node_modules #18 (comment))
  3. It requires large uploads for distributed execution. (@alexeagle Support fine grained dependencies on node_modules #18 (comment))
  4. It doesn't facilitate well-structured code bases. For example, I cannot easily ask "What uses this dependency?"
  5. It results in large deployment targets. (@alexeagle
    Support fine grained dependencies on node_modules #18 (comment)) On the one hand, this may not be an issue since "disk space is cheap". On the other hand, as servers become more numerous and more ephemeral, many containerized deployments fight to optimize out every MB possible. "A 100 MB here or there isn't important" is not as true as we might like it to be.

(1) can only be fixed by making Bazel-native installs, in the style of https://github.com/johnynek/bazel-deps (Maven) or https://github.com/dropbox/rules_node (npm).

(2), (3), (4), and (5) can be fixed in less drastic ways. npm/yarn can still resolve and install the dependencies, compile native bindings, etc. But Bazel, via the generated lock file, could understand how those installed dependencies are linked. Rules could depend on specific dependencies and rules_nodejs would understand which files those are. This is not a very complicated or error-prone algorithm, and it would help enormously with those issues. (Or you could opt-out of the fine grained dependency management, and basic workflow is still all the same.)

Thoughts @alexeagle ?

@gregmagolan
Copy link
Collaborator

@kamalmarhubi @pauldraper Alex and I have a PR up for fine grained npm dependencies via yarn_install/npm_install based on ideas from https://github.com/pubref/rules_node.

This PR defined targets for each node module installed and will allow for the following:

        yarn_install(
          name = "npm",
          package_json = "//:package.json",
          yarn_lock = "//:yarn.lock",
        )

        nodejs_binary(
          name = "my_binary",
          ...
          data = [
              "@npm//:foo",
              "@npm//:bar",
              ...
          ],
        )

@gregmagolan
Copy link
Collaborator

@pauldraper We've seen performance issues with large node_module filegroups as well bazelbuild/bazel#5153

@pauldraper
Copy link

@gregmagolan 👍 that's exactly what I was thinking.

Looks like proposal is at https://docs.google.com/document/d/1AfjHMLVyE_vYwlHSK7k7yW_IIGppSxsQtPm9PTr1xEo/preview

@alexeagle
Copy link
Collaborator

This is in now, note that a breaking change is still coming in #337

alexeagle pushed a commit to alexeagle/rules_nodejs that referenced this issue Oct 17, 2020
It seems unneeded in the Angular repo now, and causes us to miss the pathmapping for '*'.

Fixes bazel-contrib#18

PiperOrigin-RevId: 162747459
alexeagle pushed a commit to alexeagle/rules_nodejs that referenced this issue Oct 18, 2020
It seems unneeded in the Angular repo now, and causes us to miss the pathmapping for '*'.

Fixes bazel-contrib#18

PiperOrigin-RevId: 162747459
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants