Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ci] Try to use GCP Bazel cache for more jobs #20836

Merged
merged 4 commits into from
Jan 17, 2024

Conversation

jwnrt
Copy link
Contributor

@jwnrt jwnrt commented Jan 15, 2024

Summary

This change lets more jobs read from the cache, and also allows more jobs to write to the cache when running for the master branch.

It's being tested in this Azure run which will write to the cache. A second run will need to be triggered to see if we actually read from the cache.

Details

This change:

  1. Switches usage of ./bazelisk.sh to ci/bazelisk.sh in pipelines.
    • This script connects Bazel to our GCP bucket containing a Bazel cache.
    • This allows CI to read from the cache.
    • If the GCP_BAZEL_CACHE_KEY variable is set, it also writes to the cache.
  2. Adds downloads for the file containing cache write credentials to more jobs.
    • This file is stored encrypted in Azure and cannot be downloaded by PRs from forks.
    • The location to this file is automatically stored in the GCP_BAZEL_CACHE_KEY_SECUREFILEPATH environment variable.
    • The file is only downloaded for the master branch, however I have changed this for this test run to show that the cache can be written to.
    • In summary: having this downloaded allows the master branch to write to the cache.

@jwnrt jwnrt marked this pull request as ready for review January 15, 2024 15:42
@jwnrt jwnrt requested a review from rswarbrick as a code owner January 15, 2024 15:42
@jwnrt jwnrt requested review from nbdd0121 and pamaury January 15, 2024 15:55
@pamaury
Copy link
Contributor

pamaury commented Jan 15, 2024

Thanks @jwnrt for looking at this, this looks like a great idea. It seems that now we will have many instances of

  - task: DownloadSecureFile@1
    condition: eq(variables['Build.SourceBranchName'], 'master')
    name: GCP_BAZEL_CACHE_KEY
    inputs:
      secureFile: "bazel_cache_gcp_key.json"

which is quite error-prone and very specific to our workflow. What do you think about having a yaml file that we can include so we can do instead something like:

- template: ci/use-bazel-remove-cache.yaml

and maybe add a parameter to specific whether access is just read, or read-write?

@jwnrt
Copy link
Contributor Author

jwnrt commented Jan 16, 2024

I've extracted this step into a template but haven't added an r/rw parameter since these credentials are only for writing. The GCP bucket is always read if using the ci/bazelisk.sh.

I've tried to make this clearer by giving the template the name load-bazel-cache-write-creds.yml so it's more obvious they just for writing.

@jwnrt
Copy link
Contributor Author

jwnrt commented Jan 16, 2024

Oh also I should mention that even with this change, I don't think the Bazel cache is working in CI at all.

Before this PR, the sw_build job was set up correctly to use the cache (it had the GCP credentials to write and it was using ci/bazelisk.sh to read) but as far as I can tell has always rebuilt.

I tried to set up a local environment to read from the cache too, but no success. These are the (convoluted) steps I used (thanks to @nbdd0121 for helping with this):

  1. Added the following to ~/.bazelrc (and ~root/.bazelrc for good measure) to replicate what ci/bazelisk.sh builds:
build --remote_cache=https://storage.googleapis.com/opentitan-bazel-cache
build --remote_upload_local_results=false
build --remote_default_exec_properties=OSVersion="Ubuntu 20.04.6 LTS"
  1. Ensured my local environment variables matched CI for those set in rules/nonhermetic.bzl:
    • Set HOME="/root" - this is what the CI uses and the HOME env var is non-hermetic, so may have influenced the cache?
    • Set PATH=/tools/verilator/v4.210/bin:/tools/verible/bin:/root/.local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin which is what sw_build uses - we extracted this by running env in this PR's CI run.
    • $XILINXD_LICENSE_FILE already matched, and the other vars were unset.
  2. Execute the same build command that sw_build does but using ./bazelisk.sh instead of ci/bazelisk.sh to ensure our own .bazelrc is used.

We did not observe the cache being used at all. There must be something else contributing to the cache that we didn't account for.

@jwnrt jwnrt force-pushed the ci-bazel-cache branch 3 times, most recently from d236951 to 73b8d77 Compare January 17, 2024 15:37
@@ -15,12 +15,13 @@ echo "Running bazelisk in $(pwd)."

# An additional bazelrc must be synthesized to specify precisely how to use the
# GCP bazel cache.
GCP_CREDS_FILE="$GCP_BAZEL_CACHE_KEY_SECUREFILEPATH"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am assuming that the _SECUREFILEPATH is added by DownloadSecureFile to the name? It's not really clear in the documentation https://learn.microsoft.com/en-us/azure/devops/pipelines/tasks/reference/download-secure-file-v1?view=azure-pipelines

Copy link
Contributor Author

@jwnrt jwnrt Jan 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DownloadSecureFile will store the path to the file in a variable called $(GCP_BAZEL_CACHE_KEY.secureFilePath), and Azure also has a rule that you can access $(AZURE_VARIABLES.WITH.SCOPES) variables through environment variables by replacing .s with _s

Copy link
Contributor

@pamaury pamaury left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @jwnrt, this looks good to me. I am no bash expert though so I might have missed something.

@jwnrt
Copy link
Contributor Author

jwnrt commented Jan 17, 2024

Thanks for reviewing, I asked @nbdd0121 to look over the Bash and we simplified it a bit more

These flags need to come _before_ the subcommand, i.e.:

```
bazel --bazelrc=... cquery # VALID
bazel cquery --bazelrc=... # INVALID
```

which is done in this script by pushing flags that come before the
subcommand name into an array and re-applying them in the correct order
later.

If these Bash arrays are confusing and you've come across this commit in
a `git blame` then hopefully this will help:

1. Bash supports arrays defined using parentheses: `array=("foo" "bar")`
2. Arrays can't be passed as arguments to functions, they are expanded
   and get mixed into other arguments, so we use a global instead.
4. the syntax for expanding an array into its elements with correct
   quoting on each is: `"${array[@]}"`.

Signed-off-by: James Wainwright <[email protected]>
@jwnrt jwnrt merged commit 4d88b95 into lowRISC:master Jan 17, 2024
32 checks passed
@pamaury pamaury mentioned this pull request Feb 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants