Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rewrite error message when Pantsd is shut down during run #12107

Merged
merged 9 commits into from
May 24, 2021

Conversation

Eric-Arellano
Copy link
Contributor

@Eric-Arellano Eric-Arellano commented May 21, 2021

Users don't know what Nailgun is, which is an implementation detail. We should mention Pantsd so that they have some context what's going on, along with suggesting the possibility of OOM kills and a possible remedy.

Before:

native_engine.NailgunClientException: Nailgun client error: "Nailgun client error: Client exited before the server's result could be returned."

After:

native_engine.PantsdClientException: the pantsd process was killed during the run.

If this was not intentionally done by you, Pants may have been killed by the operating system due to memory overconsumption (i.e. OOM-killed). You can set the global option --pantsd-max-memory-usage to reduce Pantsd's memory consumption by retaining less in its in-memory cache (run ./pants help-advanced global). You can also disable pantsd with the global option --pantsd to avoid persisting memory across Pants runs, although you will miss out on additional caching.

If neither of those help, please consider filing a GitHub issue or reaching out on Slack so that we can investigate the possible memory overconsumption (https://www.pantsbuild.org/docs/getting-help).

[ci skip-build-wheels]

@Eric-Arellano Eric-Arellano requested a review from tdyas May 21, 2021 21:26
# Building wheels and fs_util will be skipped. Delete if not intended.
[ci skip-build-wheels]
Comment on lines -158 to +155
.map_err(|err| NailgunClientError::PreConnect(format!("Failed to start remote task: {}", err)))?;
.map_err(|err| NailgunClientError::PreConnect(format!("Failed to start: {}", err)))?;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what this wording should be. Really, it's a failure to start the Pants command, meaning sending the sys.argv to Nailgun like ./pants lint ::.

@Eric-Arellano
Copy link
Contributor Author

Given that multiple users have reported this issue in 2.5 and 2.4, I plan to cherry-pick to at least 2.5. This seems low risk and will hopefully make this error less mysterious.

@Eric-Arellano
Copy link
Contributor Author

Hm, actually the "Pantsd client error" is redundant because the exception class already can indicate that. I think we should rename NailgunClientException to be PantsdClientException

# Building wheels and fs_util will be skipped. Delete if not intended.
[ci skip-build-wheels]
Copy link
Contributor

@tdyas tdyas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "client" and "server" in the latter part of the exception message is probably also confusing to users. Approving since this message is still better than the existing message.

@Eric-Arellano
Copy link
Contributor Author

You mean this part?

Client exited before the server's result could be returned.

I totally agree. What do you think about us "catching" this error and transforming it our own, like this?

Pantsd was shut down during the run. If this was not intentionally done by you, Pants may have been OOM-killed (out of memory). You can set the global option --pantsd-max-memory-usage to reduce Pantsd's memory consumption.

# Building wheels and fs_util will be skipped. Delete if not intended.
[ci skip-build-wheels]
@Eric-Arellano Eric-Arellano changed the title Clarify Pantsd error message wording Rewrite error message when Pantsd is shut down during run May 21, 2021
@Eric-Arellano Eric-Arellano requested a review from tdyas May 21, 2021 23:23
# Building wheels and fs_util will be skipped. Delete if not intended.
[ci skip-build-wheels]
# Building wheels and fs_util will be skipped. Delete if not intended.
[ci skip-build-wheels]
# Building wheels and fs_util will be skipped. Delete if not intended.
[ci skip-build-wheels]
@benjyw
Copy link
Contributor

benjyw commented May 24, 2021

FWIW, since 100% of the people reading these errors are engineers, succinct jargon might be preferable over laborious exposition...

@Eric-Arellano
Copy link
Contributor Author

since 100% of the people reading these errors are engineers, succinct jargon might be preferable over laborious exposition...

I originally only said OOM-killed, but I agreed with Tom's recommendation to use more than the acronym.

I agree most senior engineers know what OOM manager is, but many junior engineers don't. I only learned about in the past ~1-2 years while working on Pants, and I started programming 6 years ago.

@tdyas
Copy link
Contributor

tdyas commented May 24, 2021

I agree most senior engineers know what OOM manager is, but many junior engineers don't. I only learned about in the past ~1-2 years while working on Pants, and I started programming 6 years ago.

Also some engineers (e.g., data scientists) are experts in their particular field, but should not be expected to also be experts in how the kernel manages memory.

@Eric-Arellano
Copy link
Contributor Author

#12115 verifies that pantsd indeed can get killed by OOM Manager, specifically on Linux. So the advice in this new error message is accurate.

Although, I'm going to do one more pass on this. That PR shows how Pantsd can fail regardless of --pantsd-max-memory-usage. The failure happened in a single run with a fresh Pantsd. If Pants is simply overconsuming resources, the option is irrelevant.

1) disable pantsd, which is appropriate in CI
2) report it to us

# Building wheels and fs_util will be skipped. Delete if not intended.
[ci skip-build-wheels]
@Eric-Arellano Eric-Arellano requested a review from tdyas May 24, 2021 20:57
@cczona
Copy link
Member

cczona commented May 24, 2021

I agree it's verbose. Normally I'd expect the pros/cons left to a link to the docs. But appreciate a touch of over-explaining here, given the reader might not be a Pants user. If it's, say, someone running the CI server then they probably are going to appreciate that Pants' error message makes it easy for them to be responsive to the problem without having to learn about Pants.

@Eric-Arellano Eric-Arellano merged commit cd0c20d into pantsbuild:main May 24, 2021
@Eric-Arellano Eric-Arellano deleted the pantsd-failures branch May 24, 2021 23:18
Eric-Arellano added a commit to Eric-Arellano/pants that referenced this pull request May 24, 2021
…#12107)

Users don't know what Nailgun is, which is an implementation detail. We should mention Pantsd so that they have some context what's going on, along with suggesting the possibility of OOM kills and a possible remedy.

Before:

> native_engine.NailgunClientException: Nailgun client error: "Nailgun client error: Client exited before the server\'s result could be returned."

After:

> native_engine.PantsdClientException: the pantsd process was killed during the run.
>
> If this was not intentionally done by you, Pants may have been killed by the operating system due to memory overconsumption (i.e. OOM-killed). You can set the global option `--pantsd-max-memory-usage` to reduce Pantsd's memory consumption by retaining less in its in-memory cache (run `./pants help-advanced global`). You can also disable pantsd with the global option `--pantsd` to avoid persisting memory across Pants runs, although you will miss out on additional caching.
>
> If neither of those help, please consider filing a GitHub issue or reaching out on Slack so that we can investigate the possible memory overconsumption (https://www.pantsbuild.org/docs/getting-help).

[ci skip-build-wheels]
Eric-Arellano added a commit to Eric-Arellano/pants that referenced this pull request May 24, 2021
…#12107)

Users don't know what Nailgun is, which is an implementation detail. We should mention Pantsd so that they have some context what's going on, along with suggesting the possibility of OOM kills and a possible remedy.

Before:

> native_engine.NailgunClientException: Nailgun client error: "Nailgun client error: Client exited before the server\'s result could be returned."

After:

> native_engine.PantsdClientException: the pantsd process was killed during the run.
>
> If this was not intentionally done by you, Pants may have been killed by the operating system due to memory overconsumption (i.e. OOM-killed). You can set the global option `--pantsd-max-memory-usage` to reduce Pantsd's memory consumption by retaining less in its in-memory cache (run `./pants help-advanced global`). You can also disable pantsd with the global option `--pantsd` to avoid persisting memory across Pants runs, although you will miss out on additional caching.
>
> If neither of those help, please consider filing a GitHub issue or reaching out on Slack so that we can investigate the possible memory overconsumption (https://www.pantsbuild.org/docs/getting-help).

[ci skip-build-wheels]
Eric-Arellano added a commit that referenced this pull request May 25, 2021
…k of #12107) (#12116)

Users don't know what Nailgun is, which is an implementation detail. We should mention Pantsd so that they have some context what's going on, along with suggesting the possibility of OOM kills and a possible remedy.

Before:

> native_engine.NailgunClientException: Nailgun client error: "Nailgun client error: Client exited before the server\'s result could be returned."

After:

> native_engine.PantsdClientException: the pantsd process was killed during the run.
>
> If this was not intentionally done by you, Pants may have been killed by the operating system due to memory overconsumption (i.e. OOM-killed). You can set the global option `--pantsd-max-memory-usage` to reduce Pantsd's memory consumption by retaining less in its in-memory cache (run `./pants help-advanced global`). You can also disable pantsd with the global option `--pantsd` to avoid persisting memory across Pants runs, although you will miss out on additional caching.
>
> If neither of those help, please consider filing a GitHub issue or reaching out on Slack so that we can investigate the possible memory overconsumption (https://www.pantsbuild.org/docs/getting-help).

[ci skip-build-wheels]
Eric-Arellano added a commit that referenced this pull request May 25, 2021
…12117)

Users don't know what Nailgun is, which is an implementation detail. We should mention Pantsd so that they have some context what's going on, along with suggesting the possibility of OOM kills and a possible remedy.

Before:

> native_engine.NailgunClientException: Nailgun client error: "Nailgun client error: Client exited before the server\'s result could be returned."

After:

> native_engine.PantsdClientException: the pantsd process was killed during the run.
>
> If this was not intentionally done by you, Pants may have been killed by the operating system due to memory overconsumption (i.e. OOM-killed). You can set the global option `--pantsd-max-memory-usage` to reduce Pantsd's memory consumption by retaining less in its in-memory cache (run `./pants help-advanced global`). You can also disable pantsd with the global option `--pantsd` to avoid persisting memory across Pants runs, although you will miss out on additional caching.
>
> If neither of those help, please consider filing a GitHub issue or reaching out on Slack so that we can investigate the possible memory overconsumption (https://www.pantsbuild.org/docs/getting-help).

[ci skip-build-wheels]
Eric-Arellano added a commit that referenced this pull request May 26, 2021
…` units (#12123)

Given that we've found this option can be crucial to avoiding the OOM killer, we want to make it more ergonomic to work with: #12107. 

Most users don't think in terms of bytes, but rather MB/GB - we should save them from having to pull out online calculators.

We use the library Pint to robustly support memory size conversions, including to properly support floats without floating-point arithmetic. A major benefit of Pint is that it has a function `.to_compact()` that allows us to generate a meaningful representation of the number of bytes, e.g. knowing when to say `12MB` vs `1GB` based on how many bytes there are. Pint has no dependencies, has a wheel, and is actively maintained.

[ci skip-build-wheels]
@wisechengyi wisechengyi mentioned this pull request May 30, 2021
Eric-Arellano pushed a commit that referenced this pull request Oct 27, 2022
#12107 rewrote the error message associated with OOM-killed events. In my experience following the prescribed advice, I was completely unable to prevent Pants from crashing. Only after I exited all applications did the crashing go away. Based on my experience, I revised the error message to suggest exiting other applications because other applications can starve Pants of the memory it needs to run.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants