-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Large number of inputs causes slowness / file handle exhaustion #5153
Comments
Hmm, we already rely on TreeArtifact in Angular's rules, please keep us advised about that... |
I might be completely misinformed. I was also only referring to Bazel internals, not the API exposed via Skylark. Let's wait what @ulfjack responds :-) |
We are planning to keep TreeArtifact. They are specific to outputs, and have special infrastructure for that. This special infrastructure is unnecessary for source directories. Lukacs and my plan is to support referencing source directories. This is technically possible today, although there are some safeguards against doing so, and it's not supported in all the necessary places. E.g., filegroup can refer to a directory, and that can be in the runfiles of a test. This will work with local execution, but (maybe) not with remote caching, and certainly not with remote execution. It currently also doesn't see changes to files in the directory (breaks correctness). I actually have a change to fix incremental builds with source directories, and that removes some of the safeguards. Additional work will be required to make it work correctly with remote caching and to make it work with remote execution at all (and possibly sandboxing). However, it's not clear that this will solve the problems described above. In order to make incremental builds correct, we have to track all the files explicitly, and it is unclear whether we can create runfiles trees with directory symlinks. At least it avoids the label problem. |
It is unclear how this is related to running out of file handles - what's opening all those file handles? Is that a Bazel bug? |
It's only a guess that the file handles are related to Bazel laying out runfiles. I don't have a principled way to reproduce and haven't tried to debug it at all. |
Can you post the screenshot? We're using a highly optimized C++ binary to create the runfiles trees, so it's a bit surprising if that's slow. That said, if you have a lot of large symlink trees, it's not out of the question that this could take seconds. We have seen cases where creating symlink trees is slowing down sandboxing, which is why we recommend putting sandboxes on tmpfs. Another option to optimize runfiles tree performance is to not use runfiles trees. We have libraries that allow accessing runfiles and they can (potentially) work without symlink trees (as long as you don't try to get to the underlying file system tree). We don't use symlink trees on Windows, and you can disable symlink tree generation entirely. I'm not sure if the libraries for non-Windows platforms support runfiles manifests, but we can find out. |
As an experiment, we tried including the full set of node_modules in Angular's build, rather than hand-pruning it as we do today.
We probably need some help to isolate where the time is being spent. |
Alex, are you saying that the full set of dependencies in your build is 45k files, or that every single action is taking all 45k files as input? If for every action a symlink or hardlink tree of 45k inputs is being generated I would fully expect that to push out your build time: that's not a fast enough operation at that scale. An optimized C++ binary would presumably do better, but just as a rough approximation: If you're thinking that every action should have that many inputs specifically, I suspect that's the wrong approach. If I've misunderstood, ignore me :). |
This is an intentional design tradeoff in rules_nodejs. In the current nodejs ecosystem, every invocation of |
For those of us following along from the outside world, resolve the go link, please. |
The doc is google-internal but I summarized it on pubref/rules_node#19 |
Update, we have a design doc to make rules_nodejs have smaller inputs by doing what pubref/rules_node does, while staying backwards compatible: https://docs.google.com/document/d/1AfjHMLVyE_vYwlHSK7k7yW_IIGppSxsQtPm9PTr1xEo/preview |
Is there a remaining request here for Bazel? |
How? Is that possible per action? |
Rules could just not include inputs in runfiles and instead pass them in some runtime-specific provider |
Angular builds with Bazel, and uses npm to fetch dependencies. The number of files fetched is very large:
The directory structure here has semantics that are relied on by nodejs module resolution logic, so we want to lay out this entire directory as an input to all nodejs rules.
I observe a lot of time spent creating runfiles, and also MacOS users report they run out of file handles.
Proposal: a native rule similar to
filegroup
except that it yields a TreeArtifact.Note that this would potentially alleviate the problem in #374 - we wouldn't need to reference files in this directory with labels.
@dslomov observes that this is tricky because TreeArtifacts have only been outputs in the past, so they don't have any semantics for watching for changes, like source inputs have.
The text was updated successfully, but these errors were encountered: