Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Large number of inputs causes slowness / file handle exhaustion #5153

Closed
alexeagle opened this issue May 3, 2018 · 16 comments
Closed

Large number of inputs causes slowness / file handle exhaustion #5153

alexeagle opened this issue May 3, 2018 · 16 comments
Labels
category: misc > misc team-Core Skyframe, bazel query, BEP, options parsing, bazelrc untriaged

Comments

@alexeagle
Copy link
Contributor

Angular builds with Bazel, and uses npm to fetch dependencies. The number of files fetched is very large:

Projects/angular$ ls -R node_modules | wc -l
64545

The directory structure here has semantics that are relied on by nodejs module resolution logic, so we want to lay out this entire directory as an input to all nodejs rules.
I observe a lot of time spent creating runfiles, and also MacOS users report they run out of file handles.

Proposal: a native rule similar to filegroup except that it yields a TreeArtifact.
Note that this would potentially alleviate the problem in #374 - we wouldn't need to reference files in this directory with labels.
@dslomov observes that this is tricky because TreeArtifacts have only been outputs in the past, so they don't have any semantics for watching for changes, like source inputs have.

@buchgr
Copy link
Contributor

buchgr commented May 4, 2018

It was my understanding that we were planning to get rid of TreeArtifact's completely (correct me if I am wrong @ulfjack) to streamline the execution code. @ulfjack might have talked only about the execution phase though, keeping them in analysis.

@alexeagle
Copy link
Contributor Author

Hmm, we already rely on TreeArtifact in Angular's rules, please keep us advised about that...

@buchgr
Copy link
Contributor

buchgr commented May 4, 2018

I might be completely misinformed. I was also only referring to Bazel internals, not the API exposed via Skylark. Let's wait what @ulfjack responds :-)

@ulfjack
Copy link
Contributor

ulfjack commented May 17, 2018

We are planning to keep TreeArtifact. They are specific to outputs, and have special infrastructure for that. This special infrastructure is unnecessary for source directories.

Lukacs and my plan is to support referencing source directories. This is technically possible today, although there are some safeguards against doing so, and it's not supported in all the necessary places. E.g., filegroup can refer to a directory, and that can be in the runfiles of a test. This will work with local execution, but (maybe) not with remote caching, and certainly not with remote execution. It currently also doesn't see changes to files in the directory (breaks correctness).

I actually have a change to fix incremental builds with source directories, and that removes some of the safeguards. Additional work will be required to make it work correctly with remote caching and to make it work with remote execution at all (and possibly sandboxing).

However, it's not clear that this will solve the problems described above. In order to make incremental builds correct, we have to track all the files explicitly, and it is unclear whether we can create runfiles trees with directory symlinks. At least it avoids the label problem.

@ulfjack
Copy link
Contributor

ulfjack commented May 17, 2018

It is unclear how this is related to running out of file handles - what's opening all those file handles? Is that a Bazel bug?

@alexeagle
Copy link
Contributor Author

It's only a guess that the file handles are related to Bazel laying out runfiles. I don't have a principled way to reproduce and haven't tried to debug it at all.
I have a screenshot of the Bazel UI with 7 threads doing "creating runfiles" so I am pretty sure this is slow. When we have the ability to get profiling/diagnostics out of bazel after the fact maybe I can dump the diagnostics whenever I see that happen.

@ulfjack
Copy link
Contributor

ulfjack commented May 18, 2018

Can you post the screenshot?

We're using a highly optimized C++ binary to create the runfiles trees, so it's a bit surprising if that's slow. That said, if you have a lot of large symlink trees, it's not out of the question that this could take seconds. We have seen cases where creating symlink trees is slowing down sandboxing, which is why we recommend putting sandboxes on tmpfs.

Another option to optimize runfiles tree performance is to not use runfiles trees. We have libraries that allow accessing runfiles and they can (potentially) work without symlink trees (as long as you don't try to get to the underlying file system tree). We don't use symlink trees on Windows, and you can disable symlink tree generation entirely. I'm not sure if the libraries for non-Windows platforms support runfiles manifests, but we can find out.

@alexeagle
Copy link
Contributor Author

As an experiment, we tried including the full set of node_modules in Angular's build, rather than hand-pruning it as we do today.

bazel query 'deps(:node_modules)' | wc -l goes from 4504 to 45377
The build time on CI increases build/test time from ~22m to ~80m

We probably need some help to isolate where the time is being spent.

@EricBurnett
Copy link

Alex, are you saying that the full set of dependencies in your build is 45k files, or that every single action is taking all 45k files as input? If for every action a symlink or hardlink tree of 45k inputs is being generated I would fully expect that to push out your build time: that's not a fast enough operation at that scale. An optimized C++ binary would presumably do better, but just as a rough approximation:
$ time for i in seq 1 45000; do ln -s b.txt 45k_symlinks/$i; done
real 0m59.259s
user 0m2.080s
sys 0m14.104s

If you're thinking that every action should have that many inputs specifically, I suspect that's the wrong approach. If I've misunderstood, ignore me :).

@alexeagle
Copy link
Contributor Author

This is an intentional design tradeoff in rules_nodejs. In the current nodejs ecosystem, every invocation of node has access to all of the dependencies. Under Bazel, we could tend to the Bazel idiom, requiring every target to declare dependencies on individual nodejs packages that it expects to require, but this would make Bazel usage more onerous. We instead took the "meet users where they are" approach, and make nodejs execution under Bazel follow the node idiom where all node modules are inputs to every node binary execution. This was done in May 2017, see go/bazel-node-rules

@deadmoose
Copy link

This was done in May 2017, see go/bazel-node-rules

For those of us following along from the outside world, resolve the go link, please.

@alexeagle
Copy link
Contributor Author

The doc is google-internal but I summarized it on pubref/rules_node#19

@alexeagle
Copy link
Contributor Author

Update, we have a design doc to make rules_nodejs have smaller inputs by doing what pubref/rules_node does, while staying backwards compatible: https://docs.google.com/document/d/1AfjHMLVyE_vYwlHSK7k7yW_IIGppSxsQtPm9PTr1xEo/preview

@janakdr
Copy link
Contributor

janakdr commented Apr 20, 2020

Is there a remaining request here for Bazel?

@janakdr janakdr closed this as completed Aug 18, 2020
@pauldraper
Copy link
Contributor

Another option to optimize runfiles tree performance is to not use runfiles trees.

How? Is that possible per action?

@alexeagle
Copy link
Contributor Author

Rules could just not include inputs in runfiles and instead pass them in some runtime-specific provider

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: misc > misc team-Core Skyframe, bazel query, BEP, options parsing, bazelrc untriaged
Projects
None yet
Development

No branches or pull requests

9 participants