Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Asynchronous sandbox cleanup leads to inode exhaustion with defaulted --experimental_sandbox_async_tree_delete_idle_threads=4 #20965

Closed
jeremiahar opened this issue Jan 20, 2024 · 26 comments
Assignees
Labels
P1 I'll work on this now. (Assignee required) team-Local-Exec Issues and PRs for the Execution (Local) team type: bug

Comments

@jeremiahar
Copy link

jeremiahar commented Jan 20, 2024

Description of the bug:

It seems that Bazel does not clean up the sandbox directories fast enough, which eventually leads to running out of inodes when performing a build.

Which category does this issue belong to?

Local Execution

What's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.

I'm not sure what the simplest way to reproduce this bug is. It might be because I am running a 96 core machine with

build --jobs=92 --local_cpu_resources=92 --sandbox_base=/dev/shm

and lots of symlinks from c++ compilation with tmpfs. The documentation hints that deletion threads are limited to 1 during the build which is probably the problem.

Which operating system are you running Bazel on?

Linux 6.7.0-arch3-1 #1 SMP PREEMPT_DYNAMIC Sat, 13 Jan 2024 14:37:14 +0000 x86_64 GNU/Linux

What is the output of bazel info release?

release 7.0.1

If bazel info release returns development version or (@non-git), tell us how you built Bazel.

No response

What's the output of git remote get-url origin; git rev-parse master; git rev-parse HEAD ?

No response

Is this a regression? If yes, please try to identify the Bazel commit where the bug was introduced.

Probably d169556 which changed the default of experimental_sandbox_async_tree_delete_idle_threads to 4. I did not actually run the bisection, but I will note that 7.0.0-pre.20231011.2 did not have this problem and 7.0.1 does have the issue.

Have you found anything relevant by searching the web?

#19266 (comment)

Any other information, logs, or outputs that you want to share?

It fails with an error message such as:

ERROR: /home/jeremiah/projects/REDACTED/BUILD:3:18: Compiling REDACTED.cpp failed: I/O exception during sandboxed execution: /dev/shm/bazel-sandbox.215dc078d692e667b99e0907ffd4a9aa9a5f6df28dfed5a7e2f2dcdc3fa54f19/linux-sandbox/8034/execroot/_main/external/boost/boost/metaparse/v1/error (No space left on device)

Inodes after failure:

Filesystem Inodes IUsed IFree IUse% Mounted on
tmpfs 95M 95M 1.7K 100% /dev/shm

Inodes after manually removing sandbox directory:

Filesystem Inodes IUsed IFree IUse% Mounted on
tmpfs 95M 543 95M 1% /dev/shm

I am using --sandbox_base=/dev/shm which is mounted as tmpfs.

Filesystem Size Used Avail Use% Mounted on
tmpfs 378G 746M 377G 1% /dev/shm

After the build fails, the sandbox directory is slowly cleaned up, unless bazel is shutdown, which is what originally alerted me to the cause.

It seems that this was probably caused by d169556 which enabled asynchronous sandbox cleanup by changing the default of experimental_sandbox_async_tree_delete_idle_threads to 4.

I confirmed that adding experimental_sandbox_async_tree_delete_idle_threads=0 fixes the issue for me.

@fmeum
Copy link
Collaborator

fmeum commented Jan 20, 2024

Cc @oquenchil

@dzbarsky
Copy link

We experienced the same thing with JS tests with big runfiles trees and a /dev/shm sandbox base

@sgowroji sgowroji added the team-Rules-CPP Issues for C++ rules label Jan 22, 2024
@ewianda
Copy link

ewianda commented Jan 23, 2024

Experiencing same issue

@fmeum
Copy link
Collaborator

fmeum commented Jan 23, 2024

@bazel-io flag

@bazel-io bazel-io added the potential release blocker Flagged by community members using "@bazel-io flag". Should be added to a release blocker milestone label Jan 23, 2024
@fmeum
Copy link
Collaborator

fmeum commented Jan 23, 2024

@sgowroji This sounds more like an issue for Team-Local-Exec as it isn't language-specific, but related to sandboxing in general. I mentioned @oquenchil since he was involved in the flag "flip".

@sgowroji sgowroji added team-Local-Exec Issues and PRs for the Execution (Local) team and removed team-Rules-CPP Issues for C++ rules labels Jan 23, 2024
@joeleba joeleba added P1 I'll work on this now. (Assignee required) and removed untriaged labels Jan 23, 2024
@iancha1992
Copy link
Member

@bazel-io fork 7.1.0

@bazel-io bazel-io removed the potential release blocker Flagged by community members using "@bazel-io flag". Should be added to a release blocker milestone label Jan 23, 2024
@fmeum
Copy link
Collaborator

fmeum commented Jan 23, 2024

@iancha1992 This may need to be picked into a 7.0.2 depending on assessed severity.

@oquenchil
Copy link
Contributor

How many inodes are we talking about? I can't tell from the error which number is the inodes.

Depending on how widespread this is we have two options, if few users are affected they can pass -- experimental_sandbox_async_tree_delete_idle_threads=0. If it's more widespread we should un-flip the flag and make the default 0.

The long term solution would be to keep track of how many inodes are free and block before planting any more symlinks. I won't rush that solution for 7.0.2 though.

@jeremiahar
Copy link
Author

The number of inodes on /dev/shm is 95M. Before removing the sandbox directory, there were 1700 free (it's not 0 because Bazel cleans some up before I check the number.) After manually removing the sandbox directory, there are 95M free.

@jeremiahar
Copy link
Author

It takes around 65 seconds to run out of inodes when the build is running. Once the build fails, if I just leave Bazel alone to clean up the sandbox itself, it takes around 554 seconds to clean up the sandbox with --experimental_sandbox_async_tree_delete_idle_threads not mentioned in bazelrc (so defaulted to 4 at version 7.0.1).

@dzbarsky
Copy link

dzbarsky commented Jan 24, 2024 via email

@jeremiahar
Copy link
Author

In general it seems off to me that we generate inodes based on jobs (typically CPU count) but don't scale cleanup based on CPUs. So the more powerful your machine, the easier it is to have creation outpace deletion, right?

On Tue, Jan 23, 2024, 6:44 PM Jeremiah Rodriguez @.> wrote: It takes around 65 seconds to run out of inodes when the build is running. Once the build fails, if I just leave Bazel alone to clean up the sandbox itself, it takes around 554 seconds to clean up the sandbox with --experimental_sandbox_async_tree_delete_idle_threads not mentioned in bazelrc (so defaulted to 4 at version 7.0.1). — Reply to this email directly, view it on GitHub <#20965 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAL6JETCT4UMCCSCTVVLSFDYQBDM5AVCNFSM6AAAAABCDPDUH6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMBXGA4TSNZQHE . You are receiving this because you commented.Message ID: @.>

Agreed. With tmpfs, there would be a cpu bottleneck from in memory operations. So the rate of symlink creation during the build will scale up to the cpu and kernel limits, but the rate of symlink deletion during the build is capped at 1 core (according to the documentation).

The docs state:

[--experimental_sandbox_async_tree_delete_idle_threads]
If 0, delete sandbox trees as soon as an action completes (causing completion of the action to be delayed). If greater than zero, execute the deletion of such threes on an asynchronous thread pool that has size 1 when the build is running and grows to the size specified by this flag when the server is idle.

@oquenchil
Copy link
Contributor

The reason you give sounds logical. A single deletion thread cannot keep up.

Something else I was wondering, do you have --reuse_sandbox_directories enabled? That should in theory reduce the number of deletions necessary. It was also enabled by default for 7.0 so I guess you do unless you disabled it explicitly.

I still don't have a sense of how much of an edge case your set up is in order to evaluate whether the flag should be unflipped immediately or have users affected turn it off by themselves. I can start working from today on a proper fix but I don't want this to be blocking the release.

Would this affect most JS users?

@fmeum
Copy link
Collaborator

fmeum commented Jan 30, 2024

@oquenchil The 7.0.2 release has already happened, so there is no risk of blocking it. :⁠-⁠)

@jeremiahar
Copy link
Author

The reason you give sounds logical. A single deletion thread cannot keep up.

Something else I was wondering, do you have --reuse_sandbox_directories enabled? That should in theory reduce the number of deletions necessary. It was also enabled by default for 7.0 so I guess you do unless you disabled it explicitly.

I still don't have a sense of how much of an edge case your set up is in order to evaluate whether the flag should be unflipped immediately or have users affected turn it off by themselves. I can start working from today on a proper fix but I don't want this to be blocking the release.

Would this affect most JS users?

I do not have --reuse_sandbox_directories disabled. Perhaps the others can add some additional information to give a sense of the scope of the problem.

@dzbarsky
Copy link

dzbarsky commented Feb 1, 2024

We also have the reuse flag enabled. (As an aside, I am a little confused by the sandbox reuse. It works per-mnemonic but all tests share one so it seems easy to ping-pong a sandbox between different runfiles trees, potentially with little overlap. I'd imagine it works better for build actions where you have different mnemonics and a large toolchain gets consistently reused for every action with given mnemonic.)

@oquenchil
Copy link
Contributor

oquenchil commented Feb 1, 2024

between different runfiles trees,

Coincidentally I realized this yesterday evening and I'm implementing a fix now.

There is another issue in the current code. The second parameter here should be the sandbox base. Otherwise in cases like yours where you pass a different filesystem to the sandbox base, the stashing becomes copying.

Once I get these two changes in, the execution should be much faster apart from fixing the inode exhaustion problem.

@meteorcloudy
Copy link
Member

Any update? Should this still be a release blocker for 7.1?

@oquenchil
Copy link
Contributor

Almost done. Is this the last blocker?

@meteorcloudy
Copy link
Member

Nope, there are still a number of them, I'm just doing a round of review.

copybara-service bot pushed a commit that referenced this issue Feb 14, 2024
Because the sandbox_base can be set via the configuration flag `--sandbox_base`, it might be that the stashes end up in a different filesystem than the actual sandbox, which makes `reuse_sandbox_directories` inefficient.

Apart from making sure that the stash is under `sandboxBase`, this change refactors the logic about which directories to delete from `sandboxBase` and when. The stash will persist between incremental builds but after a bazel server restart it will be deleted.

This change also refactors the tests since we had three different tests which weren't testing anything different. The change also includes more profiling spans related to reuse_sandbox_directories.

Works towards fixing #20965.

PiperOrigin-RevId: 607023622
Change-Id: I59f9d2066e966dd60dd11f7d82815d0c89f2ce23
oquenchil added a commit that referenced this issue Feb 20, 2024
Because the sandbox_base can be set via the configuration flag `--sandbox_base`, it might be that the stashes end up in a different filesystem than the actual sandbox, which makes `reuse_sandbox_directories` inefficient.

Apart from making sure that the stash is under `sandboxBase`, this change refactors the logic about which directories to delete from `sandboxBase` and when. The stash will persist between incremental builds but after a bazel server restart it will be deleted.

This change also refactors the tests since we had three different tests which weren't testing anything different. The change also includes more profiling spans related to reuse_sandbox_directories.

Works towards fixing #20965.

PiperOrigin-RevId: 607023622
Change-Id: I59f9d2066e966dd60dd11f7d82815d0c89f2ce23
@iancha1992
Copy link
Member

A fix for this issue has been included in Bazel 7.1.0 RC1. Please test out the release candidate and report any issues as soon as possible. Thanks!

@jeremiahar
Copy link
Author

I tested a full rebuild of my project with Bazel 7.1.0 RC1. The result is that the build is actually able to complete now without specifying --experimental_sandbox_async_tree_delete_idle_threads=0. The inode usage gets up to 51%, but at least they are not fully exhausted now before the build completes. For comparison, when disabling async tree deletion, the inode usage only gets up to around 12%. So the situation has improved a lot, at least in my use case.

@oquenchil
Copy link
Contributor

Glad to hear Jeremiah. Could you please comment on whether the build/tests got any faster? With the reuse changes it should have got much faster so it'd be disappointing to hear it didn't.

@oquenchil
Copy link
Contributor

You can build/test with --profile=/tmp/profile.gz.

Then get the time with:

zcat /tmp/profile.gz | jq '.traceEvents | .[] | select(.name == "sandbox.createFileSystem") | .dur' | paste -sd+ - |  bc	

At least there I'd expect to see an improvement.

More info about profiling: https://bazel.build/advanced/performance/json-trace-profile

@jeremiahar
Copy link
Author

I ran a few full rebuilds of my project:

  • Bazel 7.0.2 + --experimental_sandbox_async_tree_delete_idle_threads=0

    • Completes in 2254 seconds
    • Output of your command: 20925239343
  • Bazel 7.1.0rc1 + --experimental_sandbox_async_tree_delete_idle_threads=0

    • Completes in 1920 seconds
    • Output of your command: 7356170441
  • Bazel 7.1.0rc1

    • Completes in 1858 seconds
    • Output of your command: 5730664498

I'm not sure what the units Bazel's profiling is using, but I pasted the output at your request.

As far as I can tell, it shaved a few minutes off of my build. That's pretty good, considering I have a 1167 second critical path consisting of a clang compiler build + emscripten cache build. So I'm sure it helps the incremental builds even more. Nice work!

@oquenchil
Copy link
Contributor

Thank you Jeremiah. That's still quite a bit of overhead but we'll add more improvements.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P1 I'll work on this now. (Assignee required) team-Local-Exec Issues and PRs for the Execution (Local) team type: bug
Projects
None yet
Development

No branches or pull requests