Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use local_termination_grace_seconds when testing LinuxSandbox availability #18151

Closed
wants to merge 2 commits into from

Conversation

tsawada
Copy link
Contributor

@tsawada tsawada commented Apr 20, 2023

A 1s timeout was introduced in checking whether LinuxSandbox is available, to prevent a complete hangup on broken systems. However, it turned out that it occasionally results in misjudging that linux-sandbox being not available.
local_termination_grace_seconds defaults to 15s, which hopefully gives more headroom and configurability in various setups.

Fixes #18071

@tsawada
Copy link
Contributor Author

tsawada commented Apr 20, 2023

CC @meisterT

@github-actions github-actions bot added awaiting-review PR is awaiting review from an assigned reviewer team-Local-Exec Issues and PRs for the Execution (Local) team labels Apr 20, 2023
@meisterT
Copy link
Member

cc @larsrc-google

This whole situation sounds like a dilemma to me. Perhaps we need to take a step back and think about other ways to detect whether sandboxing is available?

@tsawada
Copy link
Contributor Author

tsawada commented Apr 26, 2023

IMHO, some form of timeout is necessary to prevent complete freezes even when we use other ways to detect sandboxes, but simply it should have large enough headroom so that it'd never affect the normal execution paths on properly setup environments.

@meisterT
Copy link
Member

meisterT commented May 3, 2023

If we keep the check as is, then yes we need a timeout and your case shows that it should be higher.

However, I was wondering if there is any other way to check whether this check makes sense...

@tsawada
Copy link
Contributor Author

tsawada commented May 9, 2023

I agree there could be a way and that'd be cleaner, but this 1s timeout is currently blocking us from upgrading to v6.0+, and I believe this is affecting more users (by occasionally falling back to weaker sandboxes).
Would it be possible to merge this for now? If you think 30s is excessive, maybe even 10s would make the situation better for us.

@larsrc-google
Copy link
Contributor

Actually, let's just do the 30s and see if it saves you. It's pretty rare that the check actually hangs and needs the timeout (#15373 is the only case we know of). That said, @sluongng's advice on the bug is valid - your system is in a bad shape if running a no-op command takes >1s. One could argue that the 1s timeout works as a warning that something else needs to be fixed, it's just a pretty obscure warning.

@sluongng
Copy link
Contributor

sluongng commented May 9, 2023

This might cause Bazel to start up taking up to 30 seconds to complete though.

At the very least, switch to use local_termination_grace_seconds value would make it configurable.

@sgowroji
Copy link
Member

Hi @larsrc-google, Since I can see that this PR has been approved, please let me know whether I should proceed with importing it. Thanks

@sluongng
Copy link
Contributor

My opinion is that this is one of those "user is holding it wrong (because the tool is hard to hold)" kinda case.

I object to merging the change as is and I would urge the author to follow my comments at #18071.

For this change to merge, I would prefer if we (a) allow the value to be configurable and (b) leave the default value as-is.
The flag key should be either local_termination_grace_seconds or a new key that's close to it as the functionality of the 2 flags is somewhat similar: wait for timeout and kill the sandbox.

@larsrc-google
Copy link
Contributor

local_termination_grace_seconds is not unreasonable here. The difference between its default and this PR is relatively small. I'll send in a change for that.

@tsawada tsawada changed the title Extend the timeout for testing the sandbox on Linux to 30s Use local_termination_grace_seconds for LinuxSandbox timeout May 21, 2023
@tsawada tsawada changed the title Use local_termination_grace_seconds for LinuxSandbox timeout Use local_termination_grace_seconds when testing LinuxSandbox availability May 21, 2023
@tsawada
Copy link
Contributor Author

tsawada commented May 21, 2023

@larsrc-google PTAL
Edited the PR title and description. Lmk if this should be another PR.

@tsawada
Copy link
Contributor Author

tsawada commented May 30, 2023

@larsrc-google ping 😃

@tsawada
Copy link
Contributor Author

tsawada commented Jun 1, 2023

@larsrc-google @sgowroji Thanks! It'd be great if this is taken / backported into 6.x releases. Please let me know if there's anything else I should do.

@sgowroji
Copy link
Member

sgowroji commented Jun 1, 2023

Hi @larsrc-google, , Since I can see that this PR has been approved, please let me know whether I should proceed with importing it.Thanks!

@larsrc-google
Copy link
Contributor

Yes, please import it, and if possible backport to the latest 6.X release.

@sgowroji sgowroji added awaiting-PR-merge PR has been approved by a reviewer and is ready to be merge internally and removed awaiting-review PR is awaiting review from an assigned reviewer labels Jun 1, 2023
@copybara-service copybara-service bot closed this in d0e19ae Jun 1, 2023
@sgowroji sgowroji removed the awaiting-PR-merge PR has been approved by a reviewer and is ready to be merge internally label Jun 1, 2023
@iancha1992
Copy link
Member

@bazel-io fork 6.3.0

@iancha1992
Copy link
Member

@larsrc-google @sluongng @tsawada
There is a merge conflict when the cherry-pick to release-6.3.0 was attempted (from src/main/java/com/google/devtools/build/lib/sandbox/BUILD). Could you please submit a PR or let us know what commits to include?

@tsawada
Copy link
Contributor Author

tsawada commented Jun 2, 2023

@iancha1992 For release-6.3.0 we only need to cherry-pick 5cac33f . Please skip 39b7d6d

@iancha1992 iancha1992 reopened this Jun 2, 2023
@iancha1992 iancha1992 closed this Jun 2, 2023
iancha1992 pushed a commit to iancha1992/bazel that referenced this pull request Jun 2, 2023
…ility

A 1s timeout was introduced in checking whether LinuxSandbox is available, to prevent a complete hangup on broken systems. However, it turned out that it occasionally results in misjudging that linux-sandbox being not available.
`local_termination_grace_seconds` defaults to 15s, which hopefully gives more headroom and configurability in various setups.

Fixes bazelbuild#18071

Closes bazelbuild#18151.

PiperOrigin-RevId: 536953768
Change-Id: I5d344ee5bf06cb9b13a2cba9d077f0981f4430a3
iancha1992 added a commit that referenced this pull request Jun 6, 2023
…ility (#18568)

A 1s timeout was introduced in checking whether LinuxSandbox is available, to prevent a complete hangup on broken systems. However, it turned out that it occasionally results in misjudging that linux-sandbox being not available.
`local_termination_grace_seconds` defaults to 15s, which hopefully gives more headroom and configurability in various setups.

Fixes #18071

Closes #18151.

PiperOrigin-RevId: 536953768
Change-Id: I5d344ee5bf06cb9b13a2cba9d077f0981f4430a3

Co-authored-by: Takeo Sawada <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
team-Local-Exec Issues and PRs for the Execution (Local) team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

linux-sandbox is not available occasionally since Bazel 6.0.0
6 participants