Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Skymeld] Symlink planting for external repos races against action execution #22073

Closed
fmeum opened this issue Apr 22, 2024 · 7 comments
Closed
Assignees
Labels
team-Core Skyframe, bazel query, BEP, options parsing, bazelrc

Comments

@fmeum
Copy link
Collaborator

fmeum commented Apr 22, 2024

Description of the bug:

When Skymeld is enabled, execroot symlinks for the root of external repositories are created when a TopLevelTargetReadyForSymlinkPlanting event is fired and received by IncrementalPackageRoots#topLevelTargetReadyForSymlinkPlanting. Notably, there is no happens-before relationship between the actual filesystem operation creating the symlink and the execution of dependent actions, which results in a race that can result in the symlink not being available during action execution, thus causing an IOException.

Which category does this issue belong to?

Core

What's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.

  1. Increase the likelihood of action execution winning the race by inserting Thread.sleep(10000) before line https://cs.opensource.google/bazel/bazel/+/d8613c222f233406165c623b1a92179faed142ba:src/main/java/com/google/devtools/build/lib/buildtool/SymlinkForest.java;l=474.
  2. Run bazel build //... in https://github.com/bazelbuild/examples/tree/main/frontend.

This should reliably reproduce bazelbuild/examples#421.

Which operating system are you running Bazel on?

Any

What is the output of bazel info release?

HEAD

If bazel info release returns development version or (@non-git), tell us how you built Bazel.

No response

What's the output of git remote get-url origin; git rev-parse HEAD ?

No response

Is this a regression? If yes, please try to identify the Bazel commit where the bug was introduced.

No response

Have you found anything relevant by searching the web?

No response

Any other information, logs, or outputs that you want to share?

No response

@fmeum
Copy link
Collaborator Author

fmeum commented Apr 22, 2024

Cc @joeleba

@joeleba
Copy link
Member

joeleba commented Apr 22, 2024

Thanks for filing the bug. I'll look into this. Incidentally I was looking at aspect-build/rules_js#1412 today and was about to ask for a minimal repro.

@fmeum
Copy link
Collaborator Author

fmeum commented Apr 22, 2024

@bazel-io flag

@bazel-io bazel-io added the potential release blocker Flagged by community members using "@bazel-io flag". Should be added to a release blocker milestone label Apr 22, 2024
@iancha1992
Copy link
Member

@bazel-io fork 7.2.0

@bazel-io bazel-io removed the potential release blocker Flagged by community members using "@bazel-io flag". Should be added to a release blocker milestone label Apr 22, 2024
@fmeum
Copy link
Collaborator Author

fmeum commented Apr 23, 2024

It's not unlikely that this is also the cause of #20886.

@joeleba
Copy link
Member

joeleba commented Apr 24, 2024

Update: I expect to have a fix submitted sometime this week.

bazel-io pushed a commit to bazel-io/bazel that referenced this issue Apr 25, 2024
Consider the following scenario:

- Top level targets `A` and `B`
- To execute actions in `A`, we need to plant symlinks for `NestedSet<Package>: [A1, C1, [D1]]`,
- To execute actions in `B`, we need to plant symlinks for `NestedSet<Package>: [B1, C1, [D1]]`

In the end, we expect to see symlinks to `A1`, `B1`, `C1` and `D1`.

**What went wrong**

With the current code, there are 2 possible race conditions:

1. Transitive NestedSet level:
- Start to plant symlinks for `A`
- `NestedSet [D1]` added to `handledPackageNestedSets`. _No symlink planted yet_.
- Start to plant symlinks for `B`
- `NestedSet [D1]` seen as "handled" and immediately skipped. Planted `B1` and `C1`. `B` moves on to execution.
=> actions from `B` that requires `D1` would fail (no such file or directory).

2. Individual symlink level:
- Start to plant symlinks for `A`
- `C1` added to `lazilyPlantedSymlinks`. _No symlink planted yet_.
- Start to plant symlinks for `B`
- `C1` already seen in `lazilyPlantedSymlinks` and immediately skipped. Planted `B1` and `D1`. `B` moves on to execution.
=> actions from `B` that requires `C1` would fail (no such file or directory).

**The Solution**

In order to prevent this race condition, we can plant the symlinks for top level targets `A` and `B` sequentially. This gives us the guarantee that: for an action `foo` under a top level target `A`, `foo` is only executed when all the necessary symlinks for `A` are already planted.

The above scenario would look like:
- Start to plant symlinks for `A`
- The `TopLevelTargetReadyForSymlinkPlanting` event for `B` arrived and is held in the sequential event queue
- Plant all symlinks. `lazilyPlantedSymlinks: [A1, C1, D1]`. `A` moves on to execution.
- Start to plant symlinks for `B`
- `NestedSet [D1]` already seen in `handledPackageNestedSets` and immediately skipped.
- `C1` already seen in `lazilyPlantedSymlinks` and immediately skipped.
- Planted `B1`. `B` moves on to execution.

As an (hopefully not premature) optimization, the symlinks under a single top level target are planted in parallel.

Fixes bazelbuild#22073

Verified locally with something similar to the repro in bazelbuild#22073 (comment).

PiperOrigin-RevId: 628080361
Change-Id: Ic6c1a6606d26400c46aa98bfeddc844abd075d0a
github-merge-queue bot pushed a commit that referenced this issue Apr 26, 2024
Consider the following scenario:

- Top level targets `A` and `B`
- To execute actions in `A`, we need to plant symlinks for
`NestedSet<Package>: [A1, C1, [D1]]`,
- To execute actions in `B`, we need to plant symlinks for
`NestedSet<Package>: [B1, C1, [D1]]`

In the end, we expect to see symlinks to `A1`, `B1`, `C1` and `D1`.

**What went wrong**

With the current code, there are 2 possible race conditions:

1. Transitive NestedSet level:
- Start to plant symlinks for `A`
- `NestedSet [D1]` added to `handledPackageNestedSets`. _No symlink
planted yet_.
- Start to plant symlinks for `B`
- `NestedSet [D1]` seen as "handled" and immediately skipped. Planted
`B1` and `C1`. `B` moves on to execution.
=> actions from `B` that requires `D1` would fail (no such file or
directory).

2. Individual symlink level:
- Start to plant symlinks for `A`
- `C1` added to `lazilyPlantedSymlinks`. _No symlink planted yet_.
- Start to plant symlinks for `B`
- `C1` already seen in `lazilyPlantedSymlinks` and immediately skipped.
Planted `B1` and `D1`. `B` moves on to execution.
=> actions from `B` that requires `C1` would fail (no such file or
directory).

**The Solution**

In order to prevent this race condition, we can plant the symlinks for
top level targets `A` and `B` sequentially. This gives us the guarantee
that: for an action `foo` under a top level target `A`, `foo` is only
executed when all the necessary symlinks for `A` are already planted.

The above scenario would look like:
- Start to plant symlinks for `A`
- The `TopLevelTargetReadyForSymlinkPlanting` event for `B` arrived and
is held in the sequential event queue
- Plant all symlinks. `lazilyPlantedSymlinks: [A1, C1, D1]`. `A` moves
on to execution.
- Start to plant symlinks for `B`
- `NestedSet [D1]` already seen in `handledPackageNestedSets` and
immediately skipped.
- `C1` already seen in `lazilyPlantedSymlinks` and immediately skipped.
- Planted `B1`. `B` moves on to execution.

As an (hopefully not premature) optimization, the symlinks under a
single top level target are planted in parallel.

Fixes #22073

Verified locally with something similar to the repro in
#22073 (comment).

PiperOrigin-RevId: 628080361
Change-Id: Ic6c1a6606d26400c46aa98bfeddc844abd075d0a

Commit
52adf0b

Co-authored-by: Googler <[email protected]>
Kila2 pushed a commit to Kila2/bazel that referenced this issue May 13, 2024
Consider the following scenario:

- Top level targets `A` and `B`
- To execute actions in `A`, we need to plant symlinks for `NestedSet<Package>: [A1, C1, [D1]]`,
- To execute actions in `B`, we need to plant symlinks for `NestedSet<Package>: [B1, C1, [D1]]`

In the end, we expect to see symlinks to `A1`, `B1`, `C1` and `D1`.

**What went wrong**

With the current code, there are 2 possible race conditions:

1. Transitive NestedSet level:
- Start to plant symlinks for `A`
- `NestedSet [D1]` added to `handledPackageNestedSets`. _No symlink planted yet_.
- Start to plant symlinks for `B`
- `NestedSet [D1]` seen as "handled" and immediately skipped. Planted `B1` and `C1`. `B` moves on to execution.
=> actions from `B` that requires `D1` would fail (no such file or directory).

2. Individual symlink level:
- Start to plant symlinks for `A`
- `C1` added to `lazilyPlantedSymlinks`. _No symlink planted yet_.
- Start to plant symlinks for `B`
- `C1` already seen in `lazilyPlantedSymlinks` and immediately skipped. Planted `B1` and `D1`. `B` moves on to execution.
=> actions from `B` that requires `C1` would fail (no such file or directory).

**The Solution**

In order to prevent this race condition, we can plant the symlinks for top level targets `A` and `B` sequentially. This gives us the guarantee that: for an action `foo` under a top level target `A`, `foo` is only executed when all the necessary symlinks for `A` are already planted.

The above scenario would look like:
- Start to plant symlinks for `A`
- The `TopLevelTargetReadyForSymlinkPlanting` event for `B` arrived and is held in the sequential event queue
- Plant all symlinks. `lazilyPlantedSymlinks: [A1, C1, D1]`. `A` moves on to execution.
- Start to plant symlinks for `B`
- `NestedSet [D1]` already seen in `handledPackageNestedSets` and immediately skipped.
- `C1` already seen in `lazilyPlantedSymlinks` and immediately skipped.
- Planted `B1`. `B` moves on to execution.

As an (hopefully not premature) optimization, the symlinks under a single top level target are planted in parallel.

Fixes bazelbuild#22073

Verified locally with something similar to the repro in bazelbuild#22073 (comment).

PiperOrigin-RevId: 628080361
Change-Id: Ic6c1a6606d26400c46aa98bfeddc844abd075d0a
@iancha1992
Copy link
Member

A fix for this issue has been included in Bazel 7.2.0 RC1. Please test out the release candidate and report any issues as soon as possible.
If you're using Bazelisk, you can point to the latest RC by setting USE_BAZEL_VERSION=7.2.0rc1. Thanks!

sfc-gh-mgalindo pushed a commit to Snowflake-Labs/bazel that referenced this issue Jun 21, 2024
Consider the following scenario:

- Top level targets `A` and `B`
- To execute actions in `A`, we need to plant symlinks for `NestedSet<Package>: [A1, C1, [D1]]`,
- To execute actions in `B`, we need to plant symlinks for `NestedSet<Package>: [B1, C1, [D1]]`

In the end, we expect to see symlinks to `A1`, `B1`, `C1` and `D1`.

**What went wrong**

With the current code, there are 2 possible race conditions:

1. Transitive NestedSet level:
- Start to plant symlinks for `A`
- `NestedSet [D1]` added to `handledPackageNestedSets`. _No symlink planted yet_.
- Start to plant symlinks for `B`
- `NestedSet [D1]` seen as "handled" and immediately skipped. Planted `B1` and `C1`. `B` moves on to execution.
=> actions from `B` that requires `D1` would fail (no such file or directory).

2. Individual symlink level:
- Start to plant symlinks for `A`
- `C1` added to `lazilyPlantedSymlinks`. _No symlink planted yet_.
- Start to plant symlinks for `B`
- `C1` already seen in `lazilyPlantedSymlinks` and immediately skipped. Planted `B1` and `D1`. `B` moves on to execution.
=> actions from `B` that requires `C1` would fail (no such file or directory).

**The Solution**

In order to prevent this race condition, we can plant the symlinks for top level targets `A` and `B` sequentially. This gives us the guarantee that: for an action `foo` under a top level target `A`, `foo` is only executed when all the necessary symlinks for `A` are already planted.

The above scenario would look like:
- Start to plant symlinks for `A`
- The `TopLevelTargetReadyForSymlinkPlanting` event for `B` arrived and is held in the sequential event queue
- Plant all symlinks. `lazilyPlantedSymlinks: [A1, C1, D1]`. `A` moves on to execution.
- Start to plant symlinks for `B`
- `NestedSet [D1]` already seen in `handledPackageNestedSets` and immediately skipped.
- `C1` already seen in `lazilyPlantedSymlinks` and immediately skipped.
- Planted `B1`. `B` moves on to execution.

As an (hopefully not premature) optimization, the symlinks under a single top level target are planted in parallel.

Fixes bazelbuild#22073

Verified locally with something similar to the repro in bazelbuild#22073 (comment).

PiperOrigin-RevId: 628080361
Change-Id: Ic6c1a6606d26400c46aa98bfeddc844abd075d0a
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
team-Core Skyframe, bazel query, BEP, options parsing, bazelrc
Projects
None yet
Development

No branches or pull requests

4 participants