Cache misses when using volatile keys in x_defs #2102

robbertvanginkel · 2019-06-19T21:07:47Z

I'm running into unexpected caching behavior when using rules_go with stamping trough x_defs. Mainly based on experience with c binaries I'd expect to be able to get 100% cache hit, but the go binaries keep re-linking when using stamping with variables from the workspace status.

What did you do?

$ cat WORKSPACE
load("@bazel_tools//tools/build_defs/repo:http.bzl", "http_archive")
http_archive(
    name = "io_bazel_rules_go",
    urls = ["https://storage.googleapis.com/bazel-mirror/github.com/bazelbuild/rules_go/releases/download/0.18.6/rules_go-0.18.6.tar.gz"],
    sha256 = "f04d2373bcaf8aa09bccb08a98a57e721306c8f6043a2a0ee610fd6853dcde3d",
)
load("@io_bazel_rules_go//go:deps.bzl", "go_rules_dependencies", "go_register_toolchains")
go_rules_dependencies()
go_register_toolchains()
$ cat BUILD.bazel 
load("@io_bazel_rules_go//go:def.bzl", "go_binary")
go_binary(
    name = "go",
    srcs = ["main.go"],
    x_defs = {"main.buildtime" : "{BUILD_TIMESTAMP}"}
)
$ cat main.go 
package main

import "fmt"

var buildtime = ""

func main() {
	fmt.Printf("Hello from Go at %s\n", buildtime)
}

Also available at https://gist.github.com/robbertvanginkel/925b10198e98010d0ca7ad722b5acbef

What did you expect to see?

Expected 100% cache hit when doing:

$ bazel build --disk_cache ... //go_binary
$ bazel clean
$ bazel build --disk_cache ... //go_binary

What did you see instead?

Only a partial cache hit, with recompilation of

$ bazel build --disk_cache /tmp/examplecache //:go
INFO: Invocation ID: 8a7942c6-2371-4a47-8132-b23a05e5d37d
INFO: Analyzed target //:go (24 packages loaded, 6583 targets configured).
INFO: Found 1 target...
Target //:go up-to-date:
  bazel-bin/darwin_amd64_stripped/go
INFO: Elapsed time: 1.798s, Critical Path: 0.30s
INFO: 3 processes: 2 remote cache hit, 1 darwin-sandbox.
INFO: Build completed successfully, 7 total actions

When diffing the execution logs according to the process described at https://docs.bazel.build/versions/master/remote-execution-caching-debug.html#comparing-the-execution-logs, the root cause seems to be the volatile-status.txt file:

$ diff -C 3 *.log.txt
*** go.log.txt	2019-06-19 12:19:02.000000000 -0700
--- go_after_clean.log.txt	2019-06-19 12:19:06.000000000 -0700
***************
*** 3828,3834 ****
  inputs {
    path: "bazel-out/volatile-status.txt"
    digest {
!     hash: "4835a9b695649ddd095b9a107914ead01ce809b910da3eacb10d7c230d7c28a5"
      size_bytes: 27
      hash_function_name: "SHA-256"
    }
--- 3828,3834 ----
  inputs {
    path: "bazel-out/volatile-status.txt"
    digest {
!     hash: "339f338a031a16b62ce2e864af7483e4bc28f9dd7fd38fa6aa796eb949ecf8ec"
      size_bytes: 27
      hash_function_name: "SHA-256"
    }
***************
*** 5585,5591 ****
  actual_outputs {
    path: "bazel-out/darwin-fastbuild/bin/darwin_amd64_stripped/go"
    digest {
!     hash: "224388c2265d9f83b9c126a86a0b23fa58c9b940bbb1fb7dece0fe24537324a9"
      size_bytes: 1661880
      hash_function_name: "SHA-256"
    }
--- 5585,5591 ----
  actual_outputs {
    path: "bazel-out/darwin-fastbuild/bin/darwin_amd64_stripped/go"
    digest {
!     hash: "9468c7bb6f3994f1606edf74715d1cf6900e2dd0e9a185bf31ac0f45297d874a"
      size_bytes: 1661880
      hash_function_name: "SHA-256"
    }

This is unexpected, as the docs on volatile-status describe it should not trigger rebuilds.

For reference, the gist also contains a stamping example for a c binary, where this caching works as expected:

$ bazel build --disk_cache /tmp/examplecache //:c
INFO: Invocation ID: 8ef8a212-1f96-417f-a5a2-3f0b8e537676
INFO: Analyzed target //:c (14 packages loaded, 192 targets configured).
INFO: Found 1 target...
Target //:c up-to-date:
  bazel-bin/c
INFO: Elapsed time: 1.817s, Critical Path: 0.03s
INFO: 3 processes: 3 remote cache hit.
INFO: Build completed successfully, 7 total actions
$ bazel clean
INFO: Invocation ID: 95e55dc7-012e-466d-9694-6647dda1e3a0
INFO: Starting clean. 
$ bazel build --disk_cache /tmp/examplecache //:c
INFO: Invocation ID: 8d31f38a-7435-485b-a4e2-c4868d7d7acd
INFO: Analyzed target //:c (14 packages loaded, 190 targets configured).
INFO: Found 1 target...
Target //:c up-to-date:
  bazel-bin/c
INFO: Elapsed time: 1.668s, Critical Path: 0.03s
INFO: 3 processes: 3 remote cache hit.
INFO: Build completed successfully, 7 total actions

Which leads me to believe this is an issue with rules_go.

In searching around a little bit I've found bazelbuild/bazel#1054, which describes that starlark rules cannot access the workspace status. This seems to be fixed now though, however it seems like --[no]stamp cannot be interpreted/read from starlark, which might contribute to what I'm experiencing. (As what I'm seeing with go is very similar as with c when running with --stamp).

The text was updated successfully, but these errors were encountered:

robbertvanginkel · 2019-06-20T18:11:16Z

After examining the c example a little bit better, I realised that when --nostamp is used they don't depend on the workspace status files but instead on a different dummy file which sets all values to redacted.

Looking through the rules_go code, it seems like the dependency on the status files is only added when the {} syntax is used to refer to a workspace status variable. As a workaround, I'm now using:

x_defs = select({
    "//:stamped": {
        "main.buildtime" : "{BUILD_TIMESTAMP}",
    },
    "//conditions:default": {},
}),

Where //:stamped refers to a config_setting tied to the --stamp flag:

config_setting(
    name = "stamped",
    values = {
        "stamp": "true",
    },
)

This works for now, but it'd be great if --[no]stamp would work for the go rules by default.

jayconrod · 2019-06-24T19:42:14Z

Hmm, I tried your example. Bazel seems to relink both the Go and C binaries only after a bazel clean. That seems reasonable to me.

$ bazel run --stamp :c
Hello from C at 1561404477!
$ bazel run --stamp :c
Hello from C at 1561404477!
$ bazel clean
$ bazel run --stamp :c
Hello from C at 1561404521!

$ bazel run :go
Hello from Go at 1561404538
$ bazel run :go
Hello from Go at 1561404538
$ bazel clean
$ bazel run :go
Hello from Go at 1561404574

Given that the behavior is consistent between Starlark and native rules, I'm not sure there's anything to do here.

In terms of rules_go implementation, the --stamp flag is not exposed to Starlark rules. Instead, we add dependencies on ctx.info_file and ctx.version_file (volatile-status.txt and stable-status.txt) if x_defs contains anything that needs to be read from those files. Bazel has some special logic for handling these files. I'm not sure how that interacts with shared or remote caches.

robbertvanginkel · 2019-06-24T21:16:31Z

The clean was mostly to simulate the behavior experienced when filling the cache from one machine, and then consuming it from a different one. We have a sizable (thousands) number of linked binary/test targets that depend on a x_defs referencing a workspace attribute. Some benchmarks with tests showed that we did not get as much cross machine cache hit because nearly all binaries got relinked for the workspace status attributes.

Reading a little more through the c rules, I think building with --nostamp is what is supposed to give more cache hits across machines/cleans:

$ bazel run --nostamp :c
Hello from C at 0!
$ bazel run --nostamp :c
Hello from C at 0!
$ bazel clean
$ bazel run --nostamp :c
Hello from C at 0!

Which currently isn't how rules_go behaves:

$ bazel run --nostamp :go
Hello from Go at 1561405741
$ bazel run --nostamp :go
Hello from Go at 1561405741
$ bazel clean
$ bazel run --nostamp :go
Hello from Go at 1561405792

(unless resorting to the config_value as described above).

What I'd like is that rules_go has the same --nostamp behavior as the c rules, where redacted/static values are always provided and nostamp always avoids taking a dependency on the workspace status contents.

If that isn't Starlark accessible, then this is more motivation to fix bazelbuild/bazel#1054. However, it seems like a little boilerplate with config_setting might make this doable?

jayconrod · 2019-06-24T21:54:55Z

Ah, okay, I think I misunderstood earlier.

I'm not sure if it occurred to me before that we could access the --stamp flag via config_setting (or if it was supported when x_defs was originally implemented). But we could do that. We do something similar with strip already.

If you're open to sending a PR for this, I'd be happy to merge it. I think the right way to implement this is to add a bool attribute to go_context_data indicating whether we should stamp. It can be set on @io_bazel_rules_go//:go_context_data using a select as you've described. It can be propagated through the _GoContextData provider and set in the object returned by go_context. Then anything that calls go.binary can decide whether to pass in ctx.version_file and ctx.info_file.

robbertvanginkel · 2019-06-24T23:24:18Z

I only recently learned about config_setting and its usage with select. I'll see if I can come up with a general PR later this week, thanks for the pointers!

This adds support for rules_go to make stamping of workspace variables conditional to Bazel's --nostamp flag. Fixes #2102.

siddharthab · 2019-07-11T07:48:15Z

Just came across this issue.

I do not get cache misses when the values in volatile status change, and values in stable status remain the same. This is because bazel makes a special case for volatile-status.txt and does not include it in the action digest.

The trick is to ensure that your stable-status.txt does not change, even if you don't use any variables from stable-status.txt. We do this by using the value of an environment variable in the workspace status script. If that environment variable is set, we emit constant stable status variables, similar to the redacted mechanism you saw in C++ rules.

The stamp flag also is not well documented, and C++ stamping can only be achieved if you were lucky enough to find examples, because the linkstamp attribute and what it does is undocumented.

I think users managing stamping or no stamping through their workspace status script will be the way to go forward in general, because the redaction mechanism can not be generalized to arbitrary status variables. So far, they only work when your variables are from a limited set.

Even if we start respecting the stamp configuration in rules_go, there are many other bazel rules that don't respect it, so you only get limited mileage out of it. But I do believe that it is a good idea to explicitly control stamping both at the rule level and globally.

jayconrod added enhancement go labels Jun 24, 2019

robbertvanginkel mentioned this issue Jun 25, 2019

Add configurable stamp support with --[no]stamp and stamp attrs #2110

Merged

jayconrod closed this as completed in #2110 Jun 28, 2019

jayconrod pushed a commit that referenced this issue Jun 28, 2019

Add configurable stamp support with --[no]stamp and stamp attrs (#2110)

37896e3

This adds support for rules_go to make stamping of workspace variables conditional to Bazel's --nostamp flag. Fixes #2102.

siddharthab mentioned this issue Oct 6, 2019

rules_go upgrade from v0.18.6 => 0.19.4 results in stamping not expanding variables #2224

Closed

siddharthab mentioned this issue Feb 2, 2020

fix: r_binary should respect --stamp flag grailbio/rules_r#37

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cache misses when using volatile keys in x_defs #2102

Cache misses when using volatile keys in x_defs #2102

robbertvanginkel commented Jun 19, 2019

robbertvanginkel commented Jun 20, 2019

jayconrod commented Jun 24, 2019

robbertvanginkel commented Jun 24, 2019 •

edited

Loading

jayconrod commented Jun 24, 2019

robbertvanginkel commented Jun 24, 2019

siddharthab commented Jul 11, 2019

Cache misses when using volatile keys in x_defs #2102

Cache misses when using volatile keys in x_defs #2102

Comments

robbertvanginkel commented Jun 19, 2019

What did you do?

What did you expect to see?

What did you see instead?

robbertvanginkel commented Jun 20, 2019

jayconrod commented Jun 24, 2019

robbertvanginkel commented Jun 24, 2019 • edited Loading

jayconrod commented Jun 24, 2019

robbertvanginkel commented Jun 24, 2019

siddharthab commented Jul 11, 2019

robbertvanginkel commented Jun 24, 2019 •

edited

Loading