Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Directory artifact not present in sandbox using Bazel 6.0.0rc2 #16789

Closed
gregmagolan opened this issue Nov 18, 2022 · 11 comments
Closed

Directory artifact not present in sandbox using Bazel 6.0.0rc2 #16789

gregmagolan opened this issue Nov 18, 2022 · 11 comments
Assignees
Labels
P1 I'll work on this now. (Assignee required) team-Remote-Exec Issues and PRs for the Execution (Remote) team type: bug

Comments

@gregmagolan
Copy link
Contributor

gregmagolan commented Nov 18, 2022

Description of the bug:

Using Bazel 6.0.0rc1 or rc2 and with --remote_download_outputs=toplevel (or minimal), remote_cache (or disk_cache) & sandboxed execution, directory artifacts are not present in runfiles in the sandbox.

Conditions for issue

  • Bazel 6.0.0rc1 or rc2
  • directory artifact (created with ctx.actions.declare_directory) in runfiles of a test or binary target
  • sandboxed execution
  • --remote_cache (or --disk_cache enabled)
  • --remote_download_outputs=toplevel (or minimal)

Issue

The output directory is not present in the sandbox when the test or binary target is run.

What's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.

Clone https://github.com/gregmagolan/bazel_6_remote_download_outputs_repro

Run ./repro.sh to run a test that fails if the directory artifact is not present in runfiles. Removing the --remote_download_outputs flag -or- running with --spawn_strategy=local makes the test pass.

Which operating system are you running Bazel on?

MacOS (but the issue has also been observed on Linux CI)

What is the output of bazel info release?

release 6.0.0rc2

If bazel info release returns development version or (@non-git), tell us how you built Bazel.

No response

What's the output of git remote get-url origin; git rev-parse master; git rev-parse HEAD ?

[email protected]:gregmagolan/bazel_6_remote_download_outputs_repro.git
master
fatal: ambiguous argument 'master': unknown revision or path not in the working tree.
Use '--' to separate paths from revisions, like this:
'git <command> [<revision>...] -- [<file>...]'
ac7517ea0ef1dad51847b40899c25079b427fb6e

Have you found anything relevant by searching the web?

No

Any other information, logs, or outputs that you want to share?

First observed when updating rules_js to Bazel 6.0.0rc1. rules_js makes heavy use of directory artifacts for the node_modules virtual store

@gregmagolan
Copy link
Contributor Author

cc @fmeum

@gregmagolan
Copy link
Contributor Author

cc @alexeagle

@sgowroji sgowroji added team-Remote-Exec Issues and PRs for the Execution (Remote) team untriaged type: bug labels Nov 18, 2022
@fmeum
Copy link
Collaborator

fmeum commented Nov 18, 2022

@bazel-io flag

@bazel-io bazel-io added the potential release blocker Flagged by community members using "@bazel-io flag". Should be added to a release blocker milestone label Nov 18, 2022
@meteorcloudy
Copy link
Member

@bazel-io fork 6.0.0

@bazel-io bazel-io removed the potential release blocker Flagged by community members using "@bazel-io flag". Should be added to a release blocker milestone label Nov 18, 2022
@meteorcloudy
Copy link
Member

Does this issue also happen with Bazel@HEAD? I wonder what change we need to make/cherry-pick to fix this.

@gregmagolan
Copy link
Contributor Author

I tried last_green in .bazelversion and that still fails:

$ bazel version
2022/11/18 08:03:14 Using unreleased version at commit 44918c540250a9e87eba8088e5676e3346695f29
Bazelisk version: development
Build target: bazel-out/darwin-opt/bin/src/main/java/com/google/devtools/build/lib/bazel/BazelServer_deploy.jar
Build time: Thu Jan 01 00:00:00 1970 (0)
Build timestamp: Thu Jan 01 00:00:00 1970 (0)
Build timestamp as int: 0

I'll try a local build @Head.

@gregmagolan
Copy link
Contributor Author

@meteorcloudy I tested with a local built at HEAD:

commit 8349c95fc98beb4008085942a67a57f0c4da074b (HEAD -> master, origin/master, origin/HEAD)
Author: Fabian Meumertzheim <[email protected]>
Date:   Fri Nov 18 08:00:19 2022 -0800

    Also collect clang resource directory with `-no-canonical-prefixes`
    
    `clang -print-resource-dir` without `-no-canonical-prefixes` returns a different path than is actually used to include `asan_blacklist.txt` on macOS with non-Apple clang.
    
    Closes #16730.
    
    PiperOrigin-RevId: 489475662
    Change-Id: If17f347d76f86e0ec5804f9e8789f44f46ab27b4

and the repro is still failing

@meteorcloudy
Copy link
Member

/cc @coeuvre Can you look into this issue?

@coeuvre
Copy link
Member

coeuvre commented Nov 21, 2022

bisect points me to dfccbf9. Working on the fix.

@coeuvre coeuvre added P1 I'll work on this now. (Assignee required) and removed untriaged labels Nov 22, 2022
@coeuvre coeuvre self-assigned this Nov 22, 2022
@coeuvre coeuvre reopened this Nov 29, 2022
coeuvre added a commit to coeuvre/bazel that referenced this issue Nov 29, 2022
…rame later

Currently, they are symlinks.

For other outputs, let skyframe read action file system to construct the metadata.

Before this change, we inject metadata of symlink outputs, tree outputs and file outputs inside `RemoteActionFileSystem#flush()` if these outputs are stored inside the in-memory fs. If the outputs are somehow generated in the local fs, skyframe will read the fs later to construct metadata for them.

However, for tree outputs, skyframe always create an empty directory before action execution in the in-memory fs. So inside the `flush`, we always inject metadata for it. It means local tree outputs are ignored. We could update the code to also read local file system when reading tree files, but then the problem is how to construct metadata from file status which is well done by skyframe.

So instead of injecting metadata by traversal the filesystem inside `flush`, we just let skyframe to do the job.

Fixes bazelbuild#16789.

Closes bazelbuild#16812.

PiperOrigin-RevId: 491622005
Change-Id: I10434e6856a1b2a207f39e07122a9b646edf518c
ShreeM01 added a commit that referenced this issue Nov 29, 2022
… by skyf… (#16879)

* Update RemoteActionFileSystem to apply permission changes to local files even if remote file exists.

Previously, it only applies permission changes to local files when the remote one doesn't exist. It is fine because the call sites only use these method when remote file are missing. However, this isn't true with future changes.

PiperOrigin-RevId: 490872822
Change-Id: I7a19d99cd828294cbafa7b5f3fdc368d64e556ec

* Fix permission operations for the case of only remote directory.

PiperOrigin-RevId: 491280334
Change-Id: I30afef9f069eca8aee4d983664f42b3961e95adf

* Only inject metadata for outputs that cannot be reconstructed by skyframe later

Currently, they are symlinks.

For other outputs, let skyframe read action file system to construct the metadata.

Before this change, we inject metadata of symlink outputs, tree outputs and file outputs inside `RemoteActionFileSystem#flush()` if these outputs are stored inside the in-memory fs. If the outputs are somehow generated in the local fs, skyframe will read the fs later to construct metadata for them.

However, for tree outputs, skyframe always create an empty directory before action execution in the in-memory fs. So inside the `flush`, we always inject metadata for it. It means local tree outputs are ignored. We could update the code to also read local file system when reading tree files, but then the problem is how to construct metadata from file status which is well done by skyframe.

So instead of injecting metadata by traversal the filesystem inside `flush`, we just let skyframe to do the job.

Fixes #16789.

Closes #16812.

PiperOrigin-RevId: 491622005
Change-Id: I10434e6856a1b2a207f39e07122a9b646edf518c

Co-authored-by: kshyanashree <[email protected]>
@gregmagolan
Copy link
Contributor Author

Confirmed that this change fixes the problem observed on rules_js with Bazel 6.0.0rc1 & rc2. aspect-build/rules_js#659 updates rules_js to last_green and CI is green with --remote_download_toplevel set.

Thanks for the quick fix @coeuvre !

@coeuvre
Copy link
Member

coeuvre commented Nov 30, 2022

Thanks for confirming!

@coeuvre coeuvre closed this as completed Nov 30, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P1 I'll work on this now. (Assignee required) team-Remote-Exec Issues and PRs for the Execution (Remote) team type: bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants