Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cache misses when using volatile keys in x_defs #2102

Closed
robbertvanginkel opened this issue Jun 19, 2019 · 6 comments · Fixed by #2110
Closed

Cache misses when using volatile keys in x_defs #2102

robbertvanginkel opened this issue Jun 19, 2019 · 6 comments · Fixed by #2110

Comments

@robbertvanginkel
Copy link
Contributor

I'm running into unexpected caching behavior when using rules_go with stamping trough x_defs. Mainly based on experience with c binaries I'd expect to be able to get 100% cache hit, but the go binaries keep re-linking when using stamping with variables from the workspace status.

What did you do?

$ cat WORKSPACE
load("@bazel_tools//tools/build_defs/repo:http.bzl", "http_archive")
http_archive(
    name = "io_bazel_rules_go",
    urls = ["https://storage.googleapis.com/bazel-mirror/github.com/bazelbuild/rules_go/releases/download/0.18.6/rules_go-0.18.6.tar.gz"],
    sha256 = "f04d2373bcaf8aa09bccb08a98a57e721306c8f6043a2a0ee610fd6853dcde3d",
)
load("@io_bazel_rules_go//go:deps.bzl", "go_rules_dependencies", "go_register_toolchains")
go_rules_dependencies()
go_register_toolchains()
$ cat BUILD.bazel 
load("@io_bazel_rules_go//go:def.bzl", "go_binary")
go_binary(
    name = "go",
    srcs = ["main.go"],
    x_defs = {"main.buildtime" : "{BUILD_TIMESTAMP}"}
)
$ cat main.go 
package main

import "fmt"

var buildtime = ""

func main() {
	fmt.Printf("Hello from Go at %s\n", buildtime)
}

Also available at https://gist.github.com/robbertvanginkel/925b10198e98010d0ca7ad722b5acbef

What did you expect to see?

Expected 100% cache hit when doing:

$ bazel build --disk_cache ... //go_binary
$ bazel clean
$ bazel build --disk_cache ... //go_binary

What did you see instead?

Only a partial cache hit, with recompilation of

$ bazel build --disk_cache /tmp/examplecache //:go
INFO: Invocation ID: 8a7942c6-2371-4a47-8132-b23a05e5d37d
INFO: Analyzed target //:go (24 packages loaded, 6583 targets configured).
INFO: Found 1 target...
Target //:go up-to-date:
  bazel-bin/darwin_amd64_stripped/go
INFO: Elapsed time: 1.798s, Critical Path: 0.30s
INFO: 3 processes: 2 remote cache hit, 1 darwin-sandbox.
INFO: Build completed successfully, 7 total actions

When diffing the execution logs according to the process described at https://docs.bazel.build/versions/master/remote-execution-caching-debug.html#comparing-the-execution-logs, the root cause seems to be the volatile-status.txt file:

$ diff -C 3 *.log.txt
*** go.log.txt	2019-06-19 12:19:02.000000000 -0700
--- go_after_clean.log.txt	2019-06-19 12:19:06.000000000 -0700
***************
*** 3828,3834 ****
  inputs {
    path: "bazel-out/volatile-status.txt"
    digest {
!     hash: "4835a9b695649ddd095b9a107914ead01ce809b910da3eacb10d7c230d7c28a5"
      size_bytes: 27
      hash_function_name: "SHA-256"
    }
--- 3828,3834 ----
  inputs {
    path: "bazel-out/volatile-status.txt"
    digest {
!     hash: "339f338a031a16b62ce2e864af7483e4bc28f9dd7fd38fa6aa796eb949ecf8ec"
      size_bytes: 27
      hash_function_name: "SHA-256"
    }
***************
*** 5585,5591 ****
  actual_outputs {
    path: "bazel-out/darwin-fastbuild/bin/darwin_amd64_stripped/go"
    digest {
!     hash: "224388c2265d9f83b9c126a86a0b23fa58c9b940bbb1fb7dece0fe24537324a9"
      size_bytes: 1661880
      hash_function_name: "SHA-256"
    }
--- 5585,5591 ----
  actual_outputs {
    path: "bazel-out/darwin-fastbuild/bin/darwin_amd64_stripped/go"
    digest {
!     hash: "9468c7bb6f3994f1606edf74715d1cf6900e2dd0e9a185bf31ac0f45297d874a"
      size_bytes: 1661880
      hash_function_name: "SHA-256"
    }

This is unexpected, as the docs on volatile-status describe it should not trigger rebuilds.

For reference, the gist also contains a stamping example for a c binary, where this caching works as expected:

$ bazel build --disk_cache /tmp/examplecache //:c
INFO: Invocation ID: 8ef8a212-1f96-417f-a5a2-3f0b8e537676
INFO: Analyzed target //:c (14 packages loaded, 192 targets configured).
INFO: Found 1 target...
Target //:c up-to-date:
  bazel-bin/c
INFO: Elapsed time: 1.817s, Critical Path: 0.03s
INFO: 3 processes: 3 remote cache hit.
INFO: Build completed successfully, 7 total actions
$ bazel clean
INFO: Invocation ID: 95e55dc7-012e-466d-9694-6647dda1e3a0
INFO: Starting clean. 
$ bazel build --disk_cache /tmp/examplecache //:c
INFO: Invocation ID: 8d31f38a-7435-485b-a4e2-c4868d7d7acd
INFO: Analyzed target //:c (14 packages loaded, 190 targets configured).
INFO: Found 1 target...
Target //:c up-to-date:
  bazel-bin/c
INFO: Elapsed time: 1.668s, Critical Path: 0.03s
INFO: 3 processes: 3 remote cache hit.
INFO: Build completed successfully, 7 total actions

Which leads me to believe this is an issue with rules_go.

In searching around a little bit I've found bazelbuild/bazel#1054, which describes that starlark rules cannot access the workspace status. This seems to be fixed now though, however it seems like --[no]stamp cannot be interpreted/read from starlark, which might contribute to what I'm experiencing. (As what I'm seeing with go is very similar as with c when running with --stamp).

@robbertvanginkel
Copy link
Contributor Author

After examining the c example a little bit better, I realised that when --nostamp is used they don't depend on the workspace status files but instead on a different dummy file which sets all values to redacted.

Looking through the rules_go code, it seems like the dependency on the status files is only added when the {} syntax is used to refer to a workspace status variable. As a workaround, I'm now using:

x_defs = select({
    "//:stamped": {
        "main.buildtime" : "{BUILD_TIMESTAMP}",
    },
    "//conditions:default": {},
}),

Where //:stamped refers to a config_setting tied to the --stamp flag:

config_setting(
    name = "stamped",
    values = {
        "stamp": "true",
    },
)

This works for now, but it'd be great if --[no]stamp would work for the go rules by default.

@jayconrod
Copy link
Contributor

Hmm, I tried your example. Bazel seems to relink both the Go and C binaries only after a bazel clean. That seems reasonable to me.

$ bazel run --stamp :c
Hello from C at 1561404477!
$ bazel run --stamp :c
Hello from C at 1561404477!
$ bazel clean
$ bazel run --stamp :c
Hello from C at 1561404521!
$ bazel run :go
Hello from Go at 1561404538
$ bazel run :go
Hello from Go at 1561404538
$ bazel clean
$ bazel run :go
Hello from Go at 1561404574

Given that the behavior is consistent between Starlark and native rules, I'm not sure there's anything to do here.

In terms of rules_go implementation, the --stamp flag is not exposed to Starlark rules. Instead, we add dependencies on ctx.info_file and ctx.version_file (volatile-status.txt and stable-status.txt) if x_defs contains anything that needs to be read from those files. Bazel has some special logic for handling these files. I'm not sure how that interacts with shared or remote caches.

@robbertvanginkel
Copy link
Contributor Author

robbertvanginkel commented Jun 24, 2019

The clean was mostly to simulate the behavior experienced when filling the cache from one machine, and then consuming it from a different one. We have a sizable (thousands) number of linked binary/test targets that depend on a x_defs referencing a workspace attribute. Some benchmarks with tests showed that we did not get as much cross machine cache hit because nearly all binaries got relinked for the workspace status attributes.

Reading a little more through the c rules, I think building with --nostamp is what is supposed to give more cache hits across machines/cleans:

$ bazel run --nostamp :c
Hello from C at 0!
$ bazel run --nostamp :c
Hello from C at 0!
$ bazel clean
$ bazel run --nostamp :c
Hello from C at 0!

Which currently isn't how rules_go behaves:

$ bazel run --nostamp :go
Hello from Go at 1561405741
$ bazel run --nostamp :go
Hello from Go at 1561405741
$ bazel clean
$ bazel run --nostamp :go
Hello from Go at 1561405792

(unless resorting to the config_value as described above).

What I'd like is that rules_go has the same --nostamp behavior as the c rules, where redacted/static values are always provided and nostamp always avoids taking a dependency on the workspace status contents.

If that isn't Starlark accessible, then this is more motivation to fix bazelbuild/bazel#1054. However, it seems like a little boilerplate with config_setting might make this doable?

@jayconrod
Copy link
Contributor

Ah, okay, I think I misunderstood earlier.

I'm not sure if it occurred to me before that we could access the --stamp flag via config_setting (or if it was supported when x_defs was originally implemented). But we could do that. We do something similar with strip already.

If you're open to sending a PR for this, I'd be happy to merge it. I think the right way to implement this is to add a bool attribute to go_context_data indicating whether we should stamp. It can be set on @io_bazel_rules_go//:go_context_data using a select as you've described. It can be propagated through the _GoContextData provider and set in the object returned by go_context. Then anything that calls go.binary can decide whether to pass in ctx.version_file and ctx.info_file.

@robbertvanginkel
Copy link
Contributor Author

I only recently learned about config_setting and its usage with select. I'll see if I can come up with a general PR later this week, thanks for the pointers!

jayconrod pushed a commit that referenced this issue Jun 28, 2019
This adds support for rules_go to make stamping of workspace variables conditional to Bazel's --nostamp flag.

Fixes #2102.
@siddharthab
Copy link
Contributor

Just came across this issue.

I do not get cache misses when the values in volatile status change, and values in stable status remain the same. This is because bazel makes a special case for volatile-status.txt and does not include it in the action digest.

The trick is to ensure that your stable-status.txt does not change, even if you don't use any variables from stable-status.txt. We do this by using the value of an environment variable in the workspace status script. If that environment variable is set, we emit constant stable status variables, similar to the redacted mechanism you saw in C++ rules.

The stamp flag also is not well documented, and C++ stamping can only be achieved if you were lucky enough to find examples, because the linkstamp attribute and what it does is undocumented.

I think users managing stamping or no stamping through their workspace status script will be the way to go forward in general, because the redaction mechanism can not be generalized to arbitrary status variables. So far, they only work when your variables are from a limited set.

Even if we start respecting the stamp configuration in rules_go, there are many other bazel rules that don't respect it, so you only get limited mileage out of it. But I do believe that it is a good idea to explicitly control stamping both at the rule level and globally.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants