Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: npm_install modifying the source tree during the build is an antipattern #3693

Closed
joeleba opened this issue Sep 6, 2023 · 11 comments
Closed
Labels
bug Can Close? We will close this in 30 days if there is no further activity

Comments

@joeleba
Copy link

joeleba commented Sep 6, 2023

What happened?

tl,dr: I think modifying the source dir during the build is an antipattern.

Background

I ran into an issue while trying to build bazelbuild/buildtools with --experimental_merged_skyframe_analysis_and_execution (context: Project Skymeld bazelbuild/bazel#14057). Some actions were reporting missing inputs from the node_modules directory.

Without going into too much details, I managed to figure out the issue: compared to the regular version of blaze, Skymeld clears the syscall cache at a different point in time.

Take the following example:

WORKSPACE
BUILD
foo/
  a.in
  b.in

where WORKSPACE contains an instance of npm_install.

Assume that there are actions that refer to node_modules. When doing bazel build //foo/... with bazel at HEAD, this is what happens (chronological order):

Phase What happens State of syscall cache (readdir)
Loading npm_install in the WORKSPACE file is evaluated and node_modules is generated { foo -> [a.in, b.in]}
Analysis Prepare the actions cleared at the end of the analysis phase to save memory
Execution Execute the actions. Will refer to the syscall cache to get the file states. { foo -> [a.in, b.in, node_modules] }

The syscall cache is extra-Skyframe and generally assumes that the source doesn't change during the build (code). This node_modules directory is generated in the source dir and hence isn't considered "external" either.

So what happens if we don't clear the syscall cache after the analysis phase? One would expect a memory regression, but in this case, it's worse: we'll start seeing missing input errors:

Phase What happens State of syscall cache (readdir)
Loading npm_install in the WORKSPACE file is evaluated and node_modules is generated { foo -> [a.in, b.in]}
Analysis Prepare the actions No cache clearance
Execution Execute the actions. Will refer to the syscall cache to get the file states and reports that there's no node_modules directory { foo -> [a.in, b.in] }

Disabling the cache clearance: bazelbuild/bazel@54b4c39

Proposal

The syscall cache was there only for performance reason, but it accidentally tolerated this behavior. This will not be the case when we switch over to Skymeld mode, because then the syscall cache clearance happens at a different time.

IMO we have 2 options to move forward:

  1. Clear the syscall cache once after the Loading phase, or
  2. Reaffirm that bazel rules shouldn't modify the source dir during the build. This essentially means that we revert repository rule for node_modules #63.

I believe that (1) is just once more sweeping this problem under the rug and isn't a desirable option, not to mention the potential regression from clearing the syscall cache before the analysis work.

WDYT?

Version

Development (host) and target OS/architectures: Linux

Output of bazel --version: development version

Version of rules_nodejs, or other relevant rules from your
WORKSPACE or MODULE.bazel file: 4.0.0

Language(s) and/or frameworks involved:

How to reproduce

Build bazel from this branch: https://github.com/joeleba/bazel/tree/npm_install_repro


$ cd /tmp
$ git clone [email protected]:joeleba/bazel.git
$ cd bazel
$ bazel build //src:bazel && cp bazel-bin/src/bazel /tmp/bazel -f

Clone bazelbuild/buildtools and build ...:

$ cd /tmp
$ git clone [email protected]:bazelbuild/buildtools.git`
$ cd buildtools
$ /tmp/bazel build //...


### Any other information?

_No response_
@joeleba joeleba added the bug label Sep 6, 2023
@joeleba
Copy link
Author

joeleba commented Sep 6, 2023

cc @meisterT @lberki @alexeagle

@lberki
Copy link
Contributor

lberki commented Sep 6, 2023

Writing to the source tree during the build (even from repository modules) is definitely an antipattern. However, that does not answer the question as to what else to do; my understanding is that the node_modules directory is pretty much required by the Node ecosystem. There might be some leeway as to where one puts the WORKSPACE.bazel file, but I wouldn't count on it.

Historically, the answer to this was managed_directories(), which was used to tell Bazel which parts of the source tree it should not consider as "sources", but it was deprecated because supporting it required a lot of complexity (see bazelbuild/bazel#15463 and the linked design doc).

@joeleba , how does this exactly cause trouble? Disregarding the inherent ugliness of this, this would only be a problem if the build touched the node_modules/ directory or called readdir() on the top-level directory of the workspace.

@alexeagle , can you offer some guidance here? My ideas (I'm not sure what is my order of preference):

  1. Use the little-known .bazelignore functionality to ignore node_modules/ (this is similar to managed directories, but more limited). If it's really, truly necessary I'd sign off on adding bazelignore-like functionality to the WORKSPACE.bazel / MODULE.bazel file (essentially resurrecting managed directories, but in a much more limited way)
  2. Make rules_nodejs error out when node_modules does not exist, thus at least the problem with top-level readdir() is avoided.

@joeleba
Copy link
Author

joeleba commented Sep 6, 2023

@joeleba , how does this exactly cause trouble? Disregarding the inherent ugliness of this, this would only be a problem if the build touched the node_modules/ directory or called readdir() on the top-level directory of the workspace.

In bazelbuild/buildtools, there's a tsc target that contains a middleman action that refers to files under node_modules. In skymeld mode, or when we don't clear the syscall cache after analysis, the syscall cache would not know of the existence of node_modules because of a stale cache entry. Here's a snippet of the failure:

...
ERROR: /usr/local/google/home/leba/.cache/bazel/_bazel_leba/2d48da7e351519dedcec3731f196c474/external/buildozer_npm_deps/typescript/bin/BUILD.bazel:9:14: Middleman _middlemen/external_Sbuildozer_Unpm_Udeps_Stypescript_Sbin_Stsc.sh-runfiles failed: missing input file '@buildozer_npm_deps//:node_modules/typescript/loc/lcl/TRK/TypeScriptLanguageService/Microsoft.CodeAnalysis.TypeScript.EditorFeatures.dll.lcl'
ERROR: /usr/local/google/home/leba/.cache/bazel/_bazel_leba/2d48da7e351519dedcec3731f196c474/external/buildozer_npm_deps/typescript/bin/BUILD.bazel:9:14: Middleman _middlemen/external_Sbuildozer_Unpm_Udeps_Stypescript_Sbin_Stsc.sh-runfiles failed: missing input file '@buildozer_npm_deps//:node_modules/typescript/loc/lcl/TRK/TypeScriptTasks/TypeScript.Tasks.dll.lcl'
ERROR: /usr/local/google/home/leba/.cache/bazel/_bazel_leba/2d48da7e351519dedcec3731f196c474/external/buildozer_npm_deps/typescript/bin/BUILD.bazel:9:14: Middleman _middlemen/external_Sbuildozer_Unpm_Udeps_Stypescript_Sbin_Stsc.sh-runfiles failed: missing input file '@buildozer_npm_deps//:node_modules/typescript/package.json'
ERROR: /usr/local/google/home/leba/.cache/bazel/_bazel_leba/2d48da7e351519dedcec3731f196c474/external/buildozer_npm_deps/typescript/bin/BUILD.bazel:9:14: Middleman _middlemen/external_Sbuildozer_Unpm_Udeps_Stypescript_Sbin_Stsc.sh-runfiles failed: 169 input file(s) do not exist
Use --verbose_failures to see the command lines of failed build steps.
ERROR: /usr/local/google/home/leba/.cache/bazel/_bazel_leba/2d48da7e351519dedcec3731f196c474/external/buildozer_npm_deps/typescript/bin/BUILD.bazel:9:14 Middleman _middlemen/external_Sbuildozer_Unpm_Udeps_Stypescript_Sbin_Stsc.sh-runfiles failed: 169 input file(s) do not exist
INFO: Elapsed time: 15.659s, Critical Path: 0.78s
INFO: 132 processes: 115 internal, 17 local.
ERROR: Build did NOT complete successfully

Since you mentioned the possibility of adding stuff to MODULE.bazel: cc @meteorcloudy

@meteorcloudy
Copy link
Collaborator

Since you mentioned the possibility of adding stuff to MODULE.bazel

I think I mentioned managed_directories isn't available in Bzlmod, I don't think Bzlmod can actually help with this issue.

@lberki
Copy link
Contributor

lberki commented Sep 6, 2023

managed_directories isn't a feature of Bazel anymore :)

But if we want to add a .bazelignore analogue, one would have to add that extra symbol.

@joeleba
Copy link
Author

joeleba commented Sep 6, 2023

Since you mentioned the possibility of adding stuff to MODULE.bazel: cc @meteorcloudy

FWIW, "you" here meant @lberki.

@joeleba
Copy link
Author

joeleba commented Sep 7, 2023

From reading the doc, we don't seem to have this problem with rules_js because the node_modules dir isn't under the source tree.

@alexeagle could you please confirm?

@joeleba
Copy link
Author

joeleba commented Sep 12, 2023

cc @gregmagolan

copybara-service bot pushed a commit to bazelbuild/bazel that referenced this issue Sep 13, 2023
…ase.

Some rules in Bazel (`npm_install` is a particular example) modify the source
tree after the target pattern parsing phase. This would make the syscall cache
stale.

This CL mitigate that issue by clearing the syscall cache after the target pattern
parsing phase. This would have very little impact on the overall performance,
since there's relatively quite a small number of entries in the cache at this
point.

See bazel-contrib/rules_nodejs#3693

PiperOrigin-RevId: 565007842
Change-Id: Ibc4f62aeb2f5347e0ecf2a186652944ea6fcdbe5
Copy link

This issue has been automatically marked as stale because it has had no recent activity. It will be closed if no further activity occurs in 30 days. Note as of rules_nodejs v6 the rules_nodejs repository contains only the core nodejs toolchain and @bazel/runfiles package. All rulesets are removed and unmaintained. For alternate rulesets suggestions include https://github.com/aspect-build/rules_js, https://github.com/aspect-build/rules_ts Collaborators can add a "cleanup" or "need: discussion" label to keep it open indefinitely. Thanks for your contributions to rules_nodejs!

@github-actions github-actions bot added the Can Close? We will close this in 30 days if there is no further activity label Nov 12, 2023
@alexeagle
Copy link
Collaborator

Hey sorry we never replied @joeleba - that's correct that rules_js writes only to the output tree. rules_nodejs 6.0.0 fixed this, in the sense that this package manager code no longer exists, the scope is now only to provide a nodejs toolchain.

@jin
Copy link

jin commented Sep 11, 2024

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Can Close? We will close this in 30 days if there is no further activity
Projects
None yet
Development

No branches or pull requests

5 participants