-
-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extra way to check OOM of a command ran inside Docker #3294
Conversation
Easy way to reproduce this error: 1. Import a project that compiles lxml (Read the Docs, for example) 2. The `pip` command will be killed and Docker won't return a specific status code, it will just return 1 3. In these cases, the `Killed` word will probably appear in the last 15 lines
I think Docker is not reporting a proper OOM exit status code because the command executed was In fact, I can see in the output:
|
# not use the specific exit code, so we check if the word `Killed` | ||
# is in the last 15 lines of the command's output | ||
killed_in_output = 'Killed' in '\n'.join(self.output.splitlines()[-15:]) | ||
if self.exit_code == DOCKER_OOM_EXIT_CODE or killed_in_output: | ||
self.output = _('Command killed due to excessive memory ' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't like to override the output of the command here, I think that it should be good to have a specific exception for this and show the error similar than other errors (like timeout, for example)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe 'Killed' is normally the last line of output. Was this not the case for your build? We might be able to severely reduce the number of false positives here by limiting this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, do we know if this string changes if you run Sphinx with another locale?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also also, perhaps logic should be:
if self.exit_code == DOCKER_OOM_EXIT_CODE or self.exit_code == -1 and killed_in_output
I'm not sure if -1 is correct here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe 'Killed' is normally the last line of output. Was this not the case for your build?
No.
My last lines of the output are:
building 'lxml.etree' extension
creating build/temp.linux-x86_64-2.7
creating build/temp.linux-x86_64-2.7/src
creating build/temp.linux-x86_64-2.7/src/lxml
x86_64-linux-gnu-gcc -pthread -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -fno-strict-aliasing -Wdate-time -D_FORTIFY_SOURCE=2 -g -fstack-protector-strong -Wformat -Werror=format-security -fPIC -I/usr/include/libxml2 -I/tmp/pip-build-_MvJYS/lxml/src/lxml/includes -I/usr/include/python2.7 -c src/lxml/lxml.etree.c -o build/temp.linux-x86_64-2.7/src/lxml/lxml.etree.o -w
x86_64-linux-gnu-gcc: internal compiler error: Killed (program cc1)
Please submit a full bug report,
with preprocessed source if appropriate.
See <file:///usr/share/doc/gcc-5/README.Bugs> for instructions.
error: command 'x86_64-linux-gnu-gcc' failed with exit status 4
----------------------------------------
Command "/home/humitos/rtfd/code/readthedocs.org/user_builds/rtd/envs/latest/bin/python -u -c "import setuptools, tokenize;__file__='/tmp/pip-build-_MvJYS/lxml/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" install --record /tmp/pip-3RRzJB-record/install-record.txt --single-version-externally-managed --compile --install-headers /home/humitos/rtfd/code/readthedocs.org/user_builds/rtd/envs/latest/include/site/python2.7/lxml" failed with error code 1 in /tmp/pip-build-_MvJYS/lxml/
Also, do we know if this string changes if you run Sphinx with another locale?
This is a good question. But I think the locale would be the one used by pip, since it's a command ran by pip but at the end it's x86_64-linux-gnu-gcc
what's ran and if the LANG is not exported I would say that it's the system's default one used.
I'm not sure if -1 is correct here
Nop. My exit_code
was 1
for this particular case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes, pip rather. Not sure if there is a good way to get errors out of pip.
We should check for exit_code == 1 in this logic
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can add that.
Do you think that it's better to append our error message to the self.output
instead of replacing it?
So, in my case, instead of just Command killed due to excessive memory consumption
it would have some information where the user can see that it gets killed because of lxml
and maybe it's something that's not needed to build the docs, so the user can mock it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We've had to do similar hacks for PDF output in the past, but I agree, it's probably necessary.
I didn't think about why this isn't being caught, that's an interesting thought though -- that it's not bubbling up to Docker because its a subprocess. There might be something kernel level we can do here to make this more consistent.
# not use the specific exit code, so we check if the word `Killed` | ||
# is in the last 15 lines of the command's output | ||
killed_in_output = 'Killed' in '\n'.join(self.output.splitlines()[-15:]) | ||
if self.exit_code == DOCKER_OOM_EXIT_CODE or killed_in_output: | ||
self.output = _('Command killed due to excessive memory ' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe 'Killed' is normally the last line of output. Was this not the case for your build? We might be able to severely reduce the number of false positives here by limiting this.
# not use the specific exit code, so we check if the word `Killed` | ||
# is in the last 15 lines of the command's output | ||
killed_in_output = 'Killed' in '\n'.join(self.output.splitlines()[-15:]) | ||
if self.exit_code == DOCKER_OOM_EXIT_CODE or killed_in_output: | ||
self.output = _('Command killed due to excessive memory ' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, do we know if this string changes if you run Sphinx with another locale?
# not use the specific exit code, so we check if the word `Killed` | ||
# is in the last 15 lines of the command's output | ||
killed_in_output = 'Killed' in '\n'.join(self.output.splitlines()[-15:]) | ||
if self.exit_code == DOCKER_OOM_EXIT_CODE or killed_in_output: | ||
self.output = _('Command killed due to excessive memory ' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also also, perhaps logic should be:
if self.exit_code == DOCKER_OOM_EXIT_CODE or self.exit_code == -1 and killed_in_output
I'm not sure if -1 is correct here
Another example at .org : https://readthedocs.org/projects/jsk-docs/builds/6301911/ It's clear that |
Also, the exit code of 137 is special, it means the process got a signal of SIGKILL, and usually means an OOM condition in our case -- we won't kill the process otherwise. |
Yesterday I took a look at the documentation of I think this it's kind of enought for now. |
* 'master' of github.com:rtfd/readthedocs.org: Add GLOBAL_PIP_CACHE setting (readthedocs#3299) Remove invalid attribute `exception` (readthedocs#3298) Update build from database at __exit__ (readthedocs#3292) Extra way to check OOM of a command ran inside Docker (readthedocs#3294) Add supervisord contrib example (readthedocs#3262) Allow promos to be placed in the docs footer (readthedocs#3267) Do not convert to bytes the `refresh_token` (readthedocs#3273)
Easy way to reproduce this error:
pip
command will be killed and Docker won't return a specificstatus code, it will just return 1
Killed
word will probably appear in the last15 lines
If I increase the DOCKER_LIMITS memory in my local instance I don't get this error.