Bazel query silently thrashes the analysis cache #10902

kastiglione · 2020-03-04T17:29:22Z

Description of the problem / feature request:

bazel query invalidates the analysis cache for subsequent builds, and neither the output of bazel build nor bazel query indicate the problem.

The best solution would be to call bazel query in a way that uses and preserves the current analysis cache. A secondary solution is to have one or both commands log about the problem.

bazel build does output a message to say that the analysis cache is being discarded – when build flags are changed. However bazel query is silently causing discarding the analysis cache, so we didn't know this was happening.

Chrome traces were showing over 10s of overhead at the start of each build, and we didn't know why. Removing our use of bazel query between builds solved this problem.

Feature requests: what underlying problem are you trying to solve with this feature?

Allow bazel query to reuse and preserve the analysis cache from the recent bazel build, if applicable.

Bugs: what's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.

Do a build
Change a file
Do the same build again

In the second build, the output should say:

INFO: Analyzed target //package:target (0 packages loaded, 0 targets configured).

Now, to see the bug:

Do a build
Do a bazel query
Change a file
Do the same build again

In the second build, the output will show that bazel has had to load >0 packages and configure >0 targets.

What operating system are you running Bazel on?

macOS

What's the output of `bazel info release`?

release 2.1.0

The text was updated successfully, but these errors were encountered:

haxorz · 2020-03-06T15:16:15Z

@kastiglione Please provide more information. You're certainly on the right track that bazel flags are the culprit, but haven't told us anything about your flag usage. I think bazel's INFO log suffices to fully debug this. If you can't upload the full file for sensitivity reasons, please read over it and copy/paste the relevant parts. Someone on the Bazel team can help you decipher the log if you need help.

@irengrig Please find a different assignee. I don't have time to look at this.

irengrig · 2020-03-06T15:46:02Z

Removing P1, we should indeed get more information.

/cc @lberki can you help investigate the problem?

gregestren · 2020-03-17T16:00:59Z

@kastiglione - is it possible that you're setting flags in a .bazelrc that apply to build but not query?

kastiglione · 2020-03-17T16:31:16Z

@gregestren yes, that's a possible cause. After posting this, we learned about #10961. In particular, we had one flag that had to be moved to the common settings, and this was the cause of thrashing with some bazel run usage we had. See #10961 (comment)

I'll retry query and see if that issue was the cause/solution to this issue.

kastiglione · 2020-03-17T17:11:02Z

I just did a build, then query, then build again, and I no longer see re-analysis. This was solved by moving --incompatible_allow_tags_propagation from build to common in our .bazelrc.

It seems to me that this case should result in a user visible warning.

janakdr · 2020-04-20T14:59:23Z

This issue is not specific to query. Bazel is supposed to print "... changed, discarding analysis cache". However, it doesn't appear to do that for options in StarlarkSemanticOptions. Can easily be tested by doing for opt in "" "no"; do bazel build --${opt}debug_depset_depth --nobuild //target ; done.

gregestren · 2020-04-20T18:30:00Z

StarlarkSemanticOptions isn't in a build's "configuration", hence it not getting caught by the message-printing code.

Recognizing and changing that class' impact is much more awkward than "normal" flags. Next step for this should be to summarize exactly which code paths recognize these changes and how they can link to the message generating code.

moroten · 2020-04-25T18:54:36Z

Annoyingly, RemoteOptions are not supported by query and cannot be moved from build to common. So when using remote execution, a bazel query discard the analysis cache. See also

bazel/src/main/java/com/google/devtools/build/lib/remote/RemoteModule.java

Lines 686 to 690 in f15d08d

    
           public Iterable<Class<? extends OptionsBase>> getCommandOptions(Command command) { 
        
             return ImmutableList.of("build", "test", "fetch").contains(command.name()) 
        
                 ? ImmutableList.of(RemoteOptions.class, AuthAndTLSOptions.class) 
        
                 : ImmutableList.of(); 
        
           }

ulfjack · 2020-04-25T20:11:50Z

I don't think that's accurate. Just because there are options specified on the build command doesn't mean that they cause a discard of the analysis phase. Let us know if you can repro an analysis phase discard due to remote options changes.

moroten · 2020-04-26T21:05:35Z

@ulfjack Below is a minimalistic example. Notice that I had INFO: 0 processes before the query, but got INFO: 1 process: 1 remote cache hit after the query. The observed behaviour is the same as if the analysis cache has been discarded, but it might be something else.

I might be wrong, but my suspicion is

bazel/src/main/java/com/google/devtools/build/lib/skyframe/SkyframeExecutor.java

Lines 1484 to 1490 in 41a3cb0

    
           void setRemoteOutputsMode(RemoteOutputsMode remoteOutputsMode) { 
        
             PrecomputedValue.REMOTE_OUTPUTS_MODE.set(injectable(), remoteOutputsMode); 
        
           } 
        
           private void setRemoteExecutionEnabled(boolean enabled) { 
        
             PrecomputedValue.REMOTE_EXECUTION_ENABLED.set(injectable(), enabled); 
        
           }

called from

bazel/src/main/java/com/google/devtools/build/lib/skyframe/SkyframeExecutor.java

Lines 2685 to 2688 in 41a3cb0

    
             /** 
        
              * Initializes and syncs the graph with the given options, readying it for the next evaluation. 
        
              */ 
        
             public void sync(

$ cat <<EOF >BUILD.bazel
cc_binary(
    name = "foo",
    srcs = ["foo.c"],
)
EOF

$ cat <<EOF >foo.c
int main() {
    return 0;
}
EOF

$ bazel build -s --remote_instance_name=developer --remote_download_toplevel --remote_executor=grpc://remote-execution:12345 --genrule_strategy=remote,sandboxed,local --spawn_strategy=remote,sandboxed,local //:foo
SUBCOMMAND: # //:foo [action 'Compiling foo.c', configuration: a4e1f2caf6a8f50757c8ef3548a6910ec759e8fbd1f331986e9a8d0f83ca072c]
SUBCOMMAND: # //:foo [action 'Linking foo', configuration: a4e1f2caf6a8f50757c8ef3548a6910ec759e8fbd1f331986e9a8d0f83ca072c]
INFO: 2 processes: 1 remote cache hit, 1 remote.

$ bazel build -s --remote_instance_name=developer --remote_download_toplevel --remote_executor=grpc://remote-execution:12345 --genrule_strategy=remote,sandboxed,local --spawn_strategy=remote,sandboxed,local //:foo
INFO: 0 processes.

$ bazel query //:foo
Loading: 0 packages loaded
//:foo

$ bazel build -s --remote_instance_name=developer --remote_download_toplevel --remote_executor=grpc://remote-execution:12345 --genrule_strategy=remote,sandboxed,local --spawn_strategy=remote,sandboxed,local //:foo
SUBCOMMAND: # //:foo [action 'Compiling foo.c', configuration: a4e1f2caf6a8f50757c8ef3548a6910ec759e8fbd1f331986e9a8d0f83ca072c]
INFO: 1 process: 1 remote cache hit.

$ bazel build -s --remote_instance_name=developer --remote_download_toplevel --remote_executor=grpc://remote-execution:12345 --genrule_strategy=remote,sandboxed,local --spawn_strategy=remote,sandboxed,local //:foo
INFO: 0 processes.

ulfjack · 2020-04-27T11:34:52Z

@buchgr, your commit d480c5f broke Skyframe caching of actions across query invocations, even for builds that don't use the functionality (as long as they use remote execution).

I think it's the injection of the remote default exec properties. The --remote_download_outputs defaults to ALL, and ALL is injected if the flag isn't set or remote execution is disabled.

ulfjack · 2020-04-29T10:00:44Z

Just to clarify: I was able to reproduce the issue brought up by @moroten. I also audited the code for remote options, and it's only two options that are intentionally injected into the Skyframe dependency graph, and one of them (remote_download_outputs) appears to be benign.

My understanding is that it intends to cause action re-execution on changes to the command-line-specified remote exec properties. Unfortunately, because they're only injected on build commands and not on query, that causes a skyframe invalidation of all actions (query implicitly clears the injected nodes if nothing is specified otherwise). There are a few options for what to do, but none are very appealing.

Top-level output files can be symlinks to non-top-level output files, such as in the case of a cc_binary linking against a .so built in the same build. I reuse the existing mayInsensitivelyPropagateInputs() method, which was originally introduced for action rewinding, but seems to have exactly the intended semantics here as well. Unfortunately, this requires a small change to the BlazeModule API to pass in the full analysis result (so we can access the action graph, so we can read the aformentioned flag). I considered using execution-phase information on whether an output is a symlink. However, at that point it's too late! The current design only allows pulling an output file *when its generating action runs*. It would be better if this was changed to pull output files independently of action execution. That would avoid a lot of problems with the current design, such as bazelbuild#10902. Fixes bazelbuild#11532. Change-Id: Iaf1e48895311fcf52d9e1802d53598288788a921

ulfjack · 2020-06-04T10:27:13Z

I think build-without-the-bytes should be rewritten to separate download of outputs from action execution. In the current implementation, outputs can only be downloaded when an action executes, which requires the injection of these flags into skyframe in order to trigger action re-execution (and which causes this exact bug).

ulfjack · 2020-06-04T10:28:29Z

This bug should be assigned to team-remote-exec or whoever is responsible for build-without-the-bytes.

Top-level output files can be symlinks to non-top-level output files, such as in the case of a cc_binary linking against a .so built in the same build. I reuse the existing mayInsensitivelyPropagateInputs() method, which was originally introduced for action rewinding, but seems to have exactly the intended semantics here as well. Unfortunately, this requires a small change to the BlazeModule API to pass in the full analysis result (so we can access the action graph, so we can read the aformentioned flag). I considered using execution-phase information on whether an output is a symlink. However, at that point it's too late! The current design only allows pulling an output file *when its generating action runs*. It would be better if this was changed to pull output files independently of action execution. That would avoid a lot of problems with the current design, such as bazelbuild#10902. Fixes bazelbuild#11532. Change-Id: Iaf1e48895311fcf52d9e1802d53598288788a921

Top-level output files can be symlinks to non-top-level output files, such as in the case of a cc_binary linking against a .so built in the same build. I reuse the existing mayInsensitivelyPropagateInputs() method, which was originally introduced for action rewinding, but seems to have exactly the intended semantics here as well. Unfortunately, this requires a small change to the BlazeModule API to pass in the full analysis result (so we can access the action graph, so we can read the aformentioned flag). I considered using execution-phase information on whether an output is a symlink. However, at that point it's too late! The current design only allows pulling an output file *when its generating action runs*. It would be better if this was changed to pull output files independently of action execution. That would avoid a lot of problems with the current design, such as #10902. Fixes #11532. Change-Id: Iaf1e48895311fcf52d9e1802d53598288788a921 Closes #11536. Change-Id: Ie1bf49a8d08f0b2422426ecd95fe79b3686f8427 PiperOrigin-RevId: 332939828

Top-level output files can be symlinks to non-top-level output files, such as in the case of a cc_binary linking against a .so built in the same build. I reuse the existing mayInsensitivelyPropagateInputs() method, which was originally introduced for action rewinding, but seems to have exactly the intended semantics here as well. Unfortunately, this requires a small change to the BlazeModule API to pass in the full analysis result (so we can access the action graph, so we can read the aformentioned flag). I considered using execution-phase information on whether an output is a symlink. However, at that point it's too late! The current design only allows pulling an output file *when its generating action runs*. It would be better if this was changed to pull output files independently of action execution. That would avoid a lot of problems with the current design, such as bazelbuild#10902. Fixes bazelbuild#11532. Change-Id: Iaf1e48895311fcf52d9e1802d53598288788a921 Closes bazelbuild#11536. Change-Id: Ie1bf49a8d08f0b2422426ecd95fe79b3686f8427 PiperOrigin-RevId: 332939828

meisterT · 2020-10-20T06:01:06Z

cc @coeuvre

janakdr · 2020-10-20T14:25:54Z

It's possible I incidentally fixed this with f6f8dfe. Can someone check? There is still one injection of remote options (whether remote execution is enabled), so it may not have been enough.

olschofkah · 2020-11-05T14:34:06Z

I also ran into this issue. Taking the approach that @kastiglione suggested worked for me with moving applicable cmd line options in the .bazelrc files from build to common. If it helps solve the issue, this behavior started happening for me specifically with version 2.2.0. It would be a big improvement if query just logged that it had to discard the analysis due to options specified.

coeuvre · 2020-12-08T09:30:42Z

I think build-without-the-bytes should be rewritten to separate download of outputs from action execution. In the current implementation, outputs can only be downloaded when an action executes, which requires the injection of these flags into skyframe in order to trigger action re-execution (and which causes this exact bug).

@ulfjack I am looking into separating download from execution. Any suggestions for the implementation details?

bjacklyn · 2021-02-11T22:28:28Z

I was also just bitten by this and had to change some of the flags in the .bazelrc to be under common rather than under build so that they are shared with query.

Has the section of code that causes this problem been identified -- could we at least quickly add a warning to check your .bazelrc flags there if a fix isn't coming soon?

It turns out I'm also affected by the missing --repo_env param for query which causes this same problem with the thrashed analysis cache - #11943 - although there seems to be a (highly discouraged) workaround for that one.

gregestren · 2021-02-11T22:58:03Z

More communication from the tool is perfectly reasonable.

I don't know how hard the implementation would be. The main reason is different flags invalidate the graph in different places. Some of the --incompatible flags mentioned are injected into the build graph at

bazel/src/main/java/com/google/devtools/build/lib/skyframe/SkyframeExecutor.java

Line 1140 in a01e94a

PrecomputedValue.STARLARK_SEMANTICS.set(injectable(), starlarkSemantics);

So checking for changes there and reporting changes could work... for those flags.

Other flags are injected in other places. I suspect remote-related flags have their own hooks. So there's a bit of whack-a-mole to capture everything.

There's an important subtlety here that it's not necessarily just analysis being invalidated, but the entire graph (including package semantics, which are an even deeper dependency).

So I don't know... is the whack-a-mole approach reasonable for this?

github-actions · 2023-04-26T01:33:14Z

Thank you for contributing to the Bazel repository! This issue has been marked as stale since it has not had any activity in the last 2+ years. It will be closed in the next 14 days unless any other activity occurs or one of the following labels is added: "not stale", "awaiting-bazeler". Please reach out to the triage team (@bazelbuild/triage) if you think this issue is still relevant or you are interested in getting the issue resolved.

coeuvre · 2023-04-26T08:34:12Z

This is not stale. Actually, with my recent internal work, this might be resolved. I will get back to this soon once those changes are submitted.

coeuvre · 2023-07-05T13:07:13Z

Fixed by d426b3d.

irengrig added P1 I'll work on this now. (Assignee required) team-Core Skyframe, bazel query, BEP, options parsing, bazelrc untriaged labels Mar 6, 2020

irengrig assigned haxorz Mar 6, 2020

haxorz assigned irengrig and unassigned haxorz Mar 6, 2020

irengrig added more data needed and removed P1 I'll work on this now. (Assignee required) labels Mar 6, 2020

dslomov unassigned irengrig Mar 12, 2020

janakdr added team-Configurability platforms, toolchains, cquery, select(), config transitions and removed more data needed team-Core Skyframe, bazel query, BEP, options parsing, bazelrc untriaged labels Apr 20, 2020

janakdr assigned gregestren Apr 20, 2020

gregestren added the P3 We're not considering working on this, but happy to review a PR. (No assignee) label Apr 20, 2020

ulfjack mentioned this issue Jun 2, 2020

Fetching symlink inputs for top-level output #11536

Closed

ulfjack mentioned this issue Jun 4, 2020

Tracking issue for "Remote Builds without the Bytes" #6862

Closed

19 tasks

gregestren added the type: bug label Jan 22, 2021

gregestren removed their assignment Feb 11, 2021

irfansharif mentioned this issue Oct 21, 2021

dev,bazel: actions/build cache partially discarded when switching between dev and bazel cockroachdb/cockroach#71835

Closed

github-actions bot added the stale Issues or PRs that are stale (no activity for 30 days) label Apr 26, 2023

coeuvre added P2 We'll consider working on this in future. (Assignee optional) and removed P3 We're not considering working on this, but happy to review a PR. (No assignee) labels Apr 26, 2023

coeuvre self-assigned this Apr 26, 2023

sgowroji added not stale Issues or PRs that are inactive but not considered stale and removed stale Issues or PRs that are stale (no activity for 30 days) labels Apr 26, 2023

coeuvre closed this as completed Jul 5, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bazel query silently thrashes the analysis cache #10902

Bazel query silently thrashes the analysis cache #10902

kastiglione commented Mar 4, 2020 •

edited

Loading

haxorz commented Mar 6, 2020 •

edited

Loading

irengrig commented Mar 6, 2020

gregestren commented Mar 17, 2020

kastiglione commented Mar 17, 2020 •

edited

Loading

kastiglione commented Mar 17, 2020

janakdr commented Apr 20, 2020

gregestren commented Apr 20, 2020

moroten commented Apr 25, 2020

ulfjack commented Apr 25, 2020

moroten commented Apr 26, 2020

ulfjack commented Apr 27, 2020

ulfjack commented Apr 29, 2020

ulfjack commented Jun 4, 2020

ulfjack commented Jun 4, 2020

meisterT commented Oct 20, 2020

janakdr commented Oct 20, 2020

olschofkah commented Nov 5, 2020

coeuvre commented Dec 8, 2020

bjacklyn commented Feb 11, 2021

gregestren commented Feb 11, 2021 •

edited

Loading

github-actions bot commented Apr 26, 2023

coeuvre commented Apr 26, 2023

coeuvre commented Jul 5, 2023

Bazel query silently thrashes the analysis cache #10902

Bazel query silently thrashes the analysis cache #10902

Comments

kastiglione commented Mar 4, 2020 • edited Loading

Description of the problem / feature request:

Feature requests: what underlying problem are you trying to solve with this feature?

Bugs: what's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.

What operating system are you running Bazel on?

What's the output of bazel info release?

haxorz commented Mar 6, 2020 • edited Loading

irengrig commented Mar 6, 2020

gregestren commented Mar 17, 2020

kastiglione commented Mar 17, 2020 • edited Loading

kastiglione commented Mar 17, 2020

janakdr commented Apr 20, 2020

gregestren commented Apr 20, 2020

moroten commented Apr 25, 2020

ulfjack commented Apr 25, 2020

moroten commented Apr 26, 2020

ulfjack commented Apr 27, 2020

ulfjack commented Apr 29, 2020

ulfjack commented Jun 4, 2020

ulfjack commented Jun 4, 2020

meisterT commented Oct 20, 2020

janakdr commented Oct 20, 2020

olschofkah commented Nov 5, 2020

coeuvre commented Dec 8, 2020

bjacklyn commented Feb 11, 2021

gregestren commented Feb 11, 2021 • edited Loading

github-actions bot commented Apr 26, 2023

coeuvre commented Apr 26, 2023

coeuvre commented Jul 5, 2023

kastiglione commented Mar 4, 2020 •

edited

Loading

What's the output of `bazel info release`?

haxorz commented Mar 6, 2020 •

edited

Loading

kastiglione commented Mar 17, 2020 •

edited

Loading

gregestren commented Feb 11, 2021 •

edited

Loading