-
-
Notifications
You must be signed in to change notification settings - Fork 644
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rewrite error message when Pantsd is shut down during run #12107
Conversation
# Building wheels and fs_util will be skipped. Delete if not intended. [ci skip-build-wheels]
f4c5659
to
ab5a1d8
Compare
.map_err(|err| NailgunClientError::PreConnect(format!("Failed to start remote task: {}", err)))?; | ||
.map_err(|err| NailgunClientError::PreConnect(format!("Failed to start: {}", err)))?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure what this wording should be. Really, it's a failure to start the Pants command, meaning sending the sys.argv to Nailgun like ./pants lint ::
.
Given that multiple users have reported this issue in 2.5 and 2.4, I plan to cherry-pick to at least 2.5. This seems low risk and will hopefully make this error less mysterious. |
Hm, actually the "Pantsd client error" is redundant because the exception class already can indicate that. I think we should rename |
# Building wheels and fs_util will be skipped. Delete if not intended. [ci skip-build-wheels]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The "client" and "server" in the latter part of the exception message is probably also confusing to users. Approving since this message is still better than the existing message.
You mean this part?
I totally agree. What do you think about us "catching" this error and transforming it our own, like this?
|
# Building wheels and fs_util will be skipped. Delete if not intended. [ci skip-build-wheels]
# Building wheels and fs_util will be skipped. Delete if not intended. [ci skip-build-wheels]
# Building wheels and fs_util will be skipped. Delete if not intended. [ci skip-build-wheels]
# Building wheels and fs_util will be skipped. Delete if not intended. [ci skip-build-wheels]
[ci skip-build-wheels]
FWIW, since 100% of the people reading these errors are engineers, succinct jargon might be preferable over laborious exposition... |
I originally only said OOM-killed, but I agreed with Tom's recommendation to use more than the acronym. I agree most senior engineers know what OOM manager is, but many junior engineers don't. I only learned about in the past ~1-2 years while working on Pants, and I started programming 6 years ago. |
Also some engineers (e.g., data scientists) are experts in their particular field, but should not be expected to also be experts in how the kernel manages memory. |
[ci skip-rust] [ci skip-build-wheels]
#12115 verifies that pantsd indeed can get killed by OOM Manager, specifically on Linux. So the advice in this new error message is accurate. Although, I'm going to do one more pass on this. That PR shows how Pantsd can fail regardless of |
1) disable pantsd, which is appropriate in CI 2) report it to us # Building wheels and fs_util will be skipped. Delete if not intended. [ci skip-build-wheels]
I agree it's verbose. Normally I'd expect the pros/cons left to a link to the docs. But appreciate a touch of over-explaining here, given the reader might not be a Pants user. If it's, say, someone running the CI server then they probably are going to appreciate that Pants' error message makes it easy for them to be responsive to the problem without having to learn about Pants. |
…#12107) Users don't know what Nailgun is, which is an implementation detail. We should mention Pantsd so that they have some context what's going on, along with suggesting the possibility of OOM kills and a possible remedy. Before: > native_engine.NailgunClientException: Nailgun client error: "Nailgun client error: Client exited before the server\'s result could be returned." After: > native_engine.PantsdClientException: the pantsd process was killed during the run. > > If this was not intentionally done by you, Pants may have been killed by the operating system due to memory overconsumption (i.e. OOM-killed). You can set the global option `--pantsd-max-memory-usage` to reduce Pantsd's memory consumption by retaining less in its in-memory cache (run `./pants help-advanced global`). You can also disable pantsd with the global option `--pantsd` to avoid persisting memory across Pants runs, although you will miss out on additional caching. > > If neither of those help, please consider filing a GitHub issue or reaching out on Slack so that we can investigate the possible memory overconsumption (https://www.pantsbuild.org/docs/getting-help). [ci skip-build-wheels]
…#12107) Users don't know what Nailgun is, which is an implementation detail. We should mention Pantsd so that they have some context what's going on, along with suggesting the possibility of OOM kills and a possible remedy. Before: > native_engine.NailgunClientException: Nailgun client error: "Nailgun client error: Client exited before the server\'s result could be returned." After: > native_engine.PantsdClientException: the pantsd process was killed during the run. > > If this was not intentionally done by you, Pants may have been killed by the operating system due to memory overconsumption (i.e. OOM-killed). You can set the global option `--pantsd-max-memory-usage` to reduce Pantsd's memory consumption by retaining less in its in-memory cache (run `./pants help-advanced global`). You can also disable pantsd with the global option `--pantsd` to avoid persisting memory across Pants runs, although you will miss out on additional caching. > > If neither of those help, please consider filing a GitHub issue or reaching out on Slack so that we can investigate the possible memory overconsumption (https://www.pantsbuild.org/docs/getting-help). [ci skip-build-wheels]
…k of #12107) (#12116) Users don't know what Nailgun is, which is an implementation detail. We should mention Pantsd so that they have some context what's going on, along with suggesting the possibility of OOM kills and a possible remedy. Before: > native_engine.NailgunClientException: Nailgun client error: "Nailgun client error: Client exited before the server\'s result could be returned." After: > native_engine.PantsdClientException: the pantsd process was killed during the run. > > If this was not intentionally done by you, Pants may have been killed by the operating system due to memory overconsumption (i.e. OOM-killed). You can set the global option `--pantsd-max-memory-usage` to reduce Pantsd's memory consumption by retaining less in its in-memory cache (run `./pants help-advanced global`). You can also disable pantsd with the global option `--pantsd` to avoid persisting memory across Pants runs, although you will miss out on additional caching. > > If neither of those help, please consider filing a GitHub issue or reaching out on Slack so that we can investigate the possible memory overconsumption (https://www.pantsbuild.org/docs/getting-help). [ci skip-build-wheels]
…12117) Users don't know what Nailgun is, which is an implementation detail. We should mention Pantsd so that they have some context what's going on, along with suggesting the possibility of OOM kills and a possible remedy. Before: > native_engine.NailgunClientException: Nailgun client error: "Nailgun client error: Client exited before the server\'s result could be returned." After: > native_engine.PantsdClientException: the pantsd process was killed during the run. > > If this was not intentionally done by you, Pants may have been killed by the operating system due to memory overconsumption (i.e. OOM-killed). You can set the global option `--pantsd-max-memory-usage` to reduce Pantsd's memory consumption by retaining less in its in-memory cache (run `./pants help-advanced global`). You can also disable pantsd with the global option `--pantsd` to avoid persisting memory across Pants runs, although you will miss out on additional caching. > > If neither of those help, please consider filing a GitHub issue or reaching out on Slack so that we can investigate the possible memory overconsumption (https://www.pantsbuild.org/docs/getting-help). [ci skip-build-wheels]
…` units (#12123) Given that we've found this option can be crucial to avoiding the OOM killer, we want to make it more ergonomic to work with: #12107. Most users don't think in terms of bytes, but rather MB/GB - we should save them from having to pull out online calculators. We use the library Pint to robustly support memory size conversions, including to properly support floats without floating-point arithmetic. A major benefit of Pint is that it has a function `.to_compact()` that allows us to generate a meaningful representation of the number of bytes, e.g. knowing when to say `12MB` vs `1GB` based on how many bytes there are. Pint has no dependencies, has a wheel, and is actively maintained. [ci skip-build-wheels]
#12107 rewrote the error message associated with OOM-killed events. In my experience following the prescribed advice, I was completely unable to prevent Pants from crashing. Only after I exited all applications did the crashing go away. Based on my experience, I revised the error message to suggest exiting other applications because other applications can starve Pants of the memory it needs to run.
Users don't know what Nailgun is, which is an implementation detail. We should mention Pantsd so that they have some context what's going on, along with suggesting the possibility of OOM kills and a possible remedy.
Before:
After:
[ci skip-build-wheels]