-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bazel query silently thrashes the analysis cache #10902
Comments
@kastiglione Please provide more information. You're certainly on the right track that @irengrig Please find a different assignee. I don't have time to look at this. |
Removing P1, we should indeed get more information. /cc @lberki can you help investigate the problem? |
@kastiglione - is it possible that you're setting flags in a |
@gregestren yes, that's a possible cause. After posting this, we learned about #10961. In particular, we had one flag that had to be moved to the I'll retry query and see if that issue was the cause/solution to this issue. |
I just did a It seems to me that this case should result in a user visible warning. |
This issue is not specific to query. Bazel is supposed to print "... changed, discarding analysis cache". However, it doesn't appear to do that for options in StarlarkSemanticOptions. Can easily be tested by doing |
StarlarkSemanticOptions isn't in a build's "configuration", hence it not getting caught by the message-printing code. Recognizing and changing that class' impact is much more awkward than "normal" flags. Next step for this should be to summarize exactly which code paths recognize these changes and how they can link to the message generating code. |
Annoyingly, bazel/src/main/java/com/google/devtools/build/lib/remote/RemoteModule.java Lines 686 to 690 in f15d08d
|
I don't think that's accurate. Just because there are options specified on the build command doesn't mean that they cause a discard of the analysis phase. Let us know if you can repro an analysis phase discard due to remote options changes. |
@ulfjack Below is a minimalistic example. Notice that I had INFO: 0 processes before the query, but got INFO: 1 process: 1 remote cache hit after the query. The observed behaviour is the same as if the analysis cache has been discarded, but it might be something else. I might be wrong, but my suspicion is bazel/src/main/java/com/google/devtools/build/lib/skyframe/SkyframeExecutor.java Lines 1484 to 1490 in 41a3cb0
bazel/src/main/java/com/google/devtools/build/lib/skyframe/SkyframeExecutor.java Lines 2685 to 2688 in 41a3cb0
|
@buchgr, your commit d480c5f broke Skyframe caching of actions across query invocations, even for builds that don't use the functionality (as long as they use remote execution). I think it's the injection of the remote default exec properties. The --remote_download_outputs defaults to ALL, and ALL is injected if the flag isn't set or remote execution is disabled. |
Just to clarify: I was able to reproduce the issue brought up by @moroten. I also audited the code for remote options, and it's only two options that are intentionally injected into the Skyframe dependency graph, and one of them ( My understanding is that it intends to cause action re-execution on changes to the command-line-specified remote exec properties. Unfortunately, because they're only injected on build commands and not on query, that causes a skyframe invalidation of all actions (query implicitly clears the injected nodes if nothing is specified otherwise). There are a few options for what to do, but none are very appealing. |
Top-level output files can be symlinks to non-top-level output files, such as in the case of a cc_binary linking against a .so built in the same build. I reuse the existing mayInsensitivelyPropagateInputs() method, which was originally introduced for action rewinding, but seems to have exactly the intended semantics here as well. Unfortunately, this requires a small change to the BlazeModule API to pass in the full analysis result (so we can access the action graph, so we can read the aformentioned flag). I considered using execution-phase information on whether an output is a symlink. However, at that point it's too late! The current design only allows pulling an output file *when its generating action runs*. It would be better if this was changed to pull output files independently of action execution. That would avoid a lot of problems with the current design, such as bazelbuild#10902. Fixes bazelbuild#11532. Change-Id: Iaf1e48895311fcf52d9e1802d53598288788a921
I think build-without-the-bytes should be rewritten to separate download of outputs from action execution. In the current implementation, outputs can only be downloaded when an action executes, which requires the injection of these flags into skyframe in order to trigger action re-execution (and which causes this exact bug). |
This bug should be assigned to |
Top-level output files can be symlinks to non-top-level output files, such as in the case of a cc_binary linking against a .so built in the same build. I reuse the existing mayInsensitivelyPropagateInputs() method, which was originally introduced for action rewinding, but seems to have exactly the intended semantics here as well. Unfortunately, this requires a small change to the BlazeModule API to pass in the full analysis result (so we can access the action graph, so we can read the aformentioned flag). I considered using execution-phase information on whether an output is a symlink. However, at that point it's too late! The current design only allows pulling an output file *when its generating action runs*. It would be better if this was changed to pull output files independently of action execution. That would avoid a lot of problems with the current design, such as bazelbuild#10902. Fixes bazelbuild#11532. Change-Id: Iaf1e48895311fcf52d9e1802d53598288788a921
Top-level output files can be symlinks to non-top-level output files, such as in the case of a cc_binary linking against a .so built in the same build. I reuse the existing mayInsensitivelyPropagateInputs() method, which was originally introduced for action rewinding, but seems to have exactly the intended semantics here as well. Unfortunately, this requires a small change to the BlazeModule API to pass in the full analysis result (so we can access the action graph, so we can read the aformentioned flag). I considered using execution-phase information on whether an output is a symlink. However, at that point it's too late! The current design only allows pulling an output file *when its generating action runs*. It would be better if this was changed to pull output files independently of action execution. That would avoid a lot of problems with the current design, such as bazelbuild#10902. Fixes bazelbuild#11532. Change-Id: Iaf1e48895311fcf52d9e1802d53598288788a921
Top-level output files can be symlinks to non-top-level output files, such as in the case of a cc_binary linking against a .so built in the same build. I reuse the existing mayInsensitivelyPropagateInputs() method, which was originally introduced for action rewinding, but seems to have exactly the intended semantics here as well. Unfortunately, this requires a small change to the BlazeModule API to pass in the full analysis result (so we can access the action graph, so we can read the aformentioned flag). I considered using execution-phase information on whether an output is a symlink. However, at that point it's too late! The current design only allows pulling an output file *when its generating action runs*. It would be better if this was changed to pull output files independently of action execution. That would avoid a lot of problems with the current design, such as bazelbuild#10902. Fixes bazelbuild#11532. Change-Id: Iaf1e48895311fcf52d9e1802d53598288788a921
Top-level output files can be symlinks to non-top-level output files, such as in the case of a cc_binary linking against a .so built in the same build. I reuse the existing mayInsensitivelyPropagateInputs() method, which was originally introduced for action rewinding, but seems to have exactly the intended semantics here as well. Unfortunately, this requires a small change to the BlazeModule API to pass in the full analysis result (so we can access the action graph, so we can read the aformentioned flag). I considered using execution-phase information on whether an output is a symlink. However, at that point it's too late! The current design only allows pulling an output file *when its generating action runs*. It would be better if this was changed to pull output files independently of action execution. That would avoid a lot of problems with the current design, such as #10902. Fixes #11532. Change-Id: Iaf1e48895311fcf52d9e1802d53598288788a921 Closes #11536. Change-Id: Ie1bf49a8d08f0b2422426ecd95fe79b3686f8427 PiperOrigin-RevId: 332939828
Top-level output files can be symlinks to non-top-level output files, such as in the case of a cc_binary linking against a .so built in the same build. I reuse the existing mayInsensitivelyPropagateInputs() method, which was originally introduced for action rewinding, but seems to have exactly the intended semantics here as well. Unfortunately, this requires a small change to the BlazeModule API to pass in the full analysis result (so we can access the action graph, so we can read the aformentioned flag). I considered using execution-phase information on whether an output is a symlink. However, at that point it's too late! The current design only allows pulling an output file *when its generating action runs*. It would be better if this was changed to pull output files independently of action execution. That would avoid a lot of problems with the current design, such as bazelbuild#10902. Fixes bazelbuild#11532. Change-Id: Iaf1e48895311fcf52d9e1802d53598288788a921 Closes bazelbuild#11536. Change-Id: Ie1bf49a8d08f0b2422426ecd95fe79b3686f8427 PiperOrigin-RevId: 332939828
cc @coeuvre |
It's possible I incidentally fixed this with f6f8dfe. Can someone check? There is still one injection of remote options (whether remote execution is enabled), so it may not have been enough. |
I also ran into this issue. Taking the approach that @kastiglione suggested worked for me with moving applicable cmd line options in the .bazelrc files from |
@ulfjack I am looking into separating download from execution. Any suggestions for the implementation details? |
I was also just bitten by this and had to change some of the flags in the Has the section of code that causes this problem been identified -- could we at least quickly add a warning to check your .bazelrc flags there if a fix isn't coming soon? It turns out I'm also affected by the missing |
More communication from the tool is perfectly reasonable. I don't know how hard the implementation would be. The main reason is different flags invalidate the graph in different places. Some of the bazel/src/main/java/com/google/devtools/build/lib/skyframe/SkyframeExecutor.java Line 1140 in a01e94a
So checking for changes there and reporting changes could work... for those flags. Other flags are injected in other places. I suspect remote-related flags have their own hooks. So there's a bit of whack-a-mole to capture everything. There's an important subtlety here that it's not necessarily just analysis being invalidated, but the entire graph (including package semantics, which are an even deeper dependency). So I don't know... is the whack-a-mole approach reasonable for this? |
Thank you for contributing to the Bazel repository! This issue has been marked as stale since it has not had any activity in the last 2+ years. It will be closed in the next 14 days unless any other activity occurs or one of the following labels is added: "not stale", "awaiting-bazeler". Please reach out to the triage team ( |
This is not stale. Actually, with my recent internal work, this might be resolved. I will get back to this soon once those changes are submitted. |
Fixed by d426b3d. |
Description of the problem / feature request:
bazel query
invalidates the analysis cache for subsequent builds, and neither the output ofbazel build
norbazel query
indicate the problem.The best solution would be to call
bazel query
in a way that uses and preserves the current analysis cache. A secondary solution is to have one or both commands log about the problem.bazel build
does output a message to say that the analysis cache is being discarded – when build flags are changed. Howeverbazel query
is silently causing discarding the analysis cache, so we didn't know this was happening.Chrome traces were showing over 10s of overhead at the start of each build, and we didn't know why. Removing our use of
bazel query
between builds solved this problem.Feature requests: what underlying problem are you trying to solve with this feature?
Allow
bazel query
to reuse and preserve the analysis cache from the recentbazel build
, if applicable.Bugs: what's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.
In the second build, the output should say:
Now, to see the bug:
In the second build, the output will show that bazel has had to load >0 packages and configure >0 targets.
What operating system are you running Bazel on?
macOS
What's the output of
bazel info release
?release 2.1.0
The text was updated successfully, but these errors were encountered: