Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"process ID out of range" error when using rez-env from within a Docker container with namespaced PIDs #1732

Closed
darkvertex opened this issue Apr 24, 2024 · 3 comments · Fixed by #1735
Labels
bug shell Shell related issues

Comments

@darkvertex
Copy link

At my studio we have a Rocky image that resembles our workstation configuration and certain pipeline services use rez-env within containers to run some things.

I noticed while transitioning to use Rez in my containerized environment that it seems unhappy about namespaced PIDs, which is the default when you docker-run.

Environment

  • OS: Rocky 8.9 in Docker
  • Rez version: 2.112.0
  • Rez python version: Python 3.9.18

To Reproduce

I prepped a minimalist Dockerfile to reproduce the issue. Save this Dockerfile below:

FROM rockylinux:8.9

# Install python, bash, wget, locale stuff, lsb_release (for Rez) and do any security updates:
RUN yum install -y python3.9 bash glibc-langpack-en wget redhat-lsb-core && yum update -y && yum clean all
# Symlink "python3" as "python" so that "rez-bind --quickstart" does not complain:
RUN ln -s /usr/bin/python3 /usr/bin/python
# Set locale to English:
ENV LANG en_US.UTF-8
ENV LC_ALL en_US.UTF-8
# Python things:
ENV PYTHONDONTWRITEBYTECODE=1
ENV PYTHONFAULTHANDLER=1
ENV PYTHON_KEYRING_BACKEND=keyring.backends.null.Keyring

# Install Rez:
ARG REZ_VERSION=2.112.0
ENV REZ_VERSION=${REZ_VERSION}
ENV REZ_INSTALL_ROOT=/usr/local/rez
RUN wget -O /tmp/rez.tar.gz https://github.com/AcademySoftwareFoundation/rez/archive/refs/tags/${REZ_VERSION}.tar.gz && \
    mkdir --parents --mode=777 /tmp/rez && \
    tar -xzvf /tmp/rez.tar.gz -C /tmp/rez/ && \
    python3 /tmp/rez/rez-${REZ_VERSION}/install.py $REZ_INSTALL_ROOT/ && \
    rm -rf /tmp/rez /tmp/rez.tar.gz && \
    echo -e "export PATH=\$PATH:${REZ_INSTALL_ROOT}/bin/rez\nsource ${REZ_INSTALL_ROOT}/completion/complete.sh" >> /etc/profile.d/fps.sh
RUN $REZ_INSTALL_ROOT/bin/rez/rez-bind --quickstart
ENV PATH "$PATH:$REZ_INSTALL_ROOT/bin/rez"

then docker build . --tag=rocky_rez and brew some tea or coffee cause it'll take a minute or two.

To see the issue do any rez-env. Since this is a vanilla environment, use the python package made by the quickstarter:

$ docker run --rm -it rocky_rez rez-env python -- python --version
error: process ID out of range

Usage:
 ps [options]

 Try 'ps --help <simple|list|output|threads|misc|all>'
  or 'ps --help <s|l|o|t|m|a>'
 for additional help text.

For more details see ps(1).
Python 3.9.18

As you can see, it does run the thing so it is kind of a harmless error, just scary looking.

One way to get around it is to pass --pid=host so that PID namespace isolation is no longer in place, thus the pids are shared with the host's, but this isn't great because it weakens the security of the container, as it is no longer process-isolated:

$ docker run --rm -it --pid=host rocky_rez rez-env python -- python --version
Python 3.9.18

...but it runs without an error.

I have a suspicion it is probably due to this command in the codebase here:

args = ['ps', '-o', 'args=', '-p', str(os.getppid())]

and I guess it's because by default a container runs as PID 1, without a parent.

Could it be detected in the shell code so it does not print a scary error?

Expected behavior

I would expect this error-free output:

$ docker run --rm -it rocky_rez rez-env python -- python --version
Python 3.9.18

Actual behavior

$ docker run --rm -it rocky_rez rez-env python -- python --version
error: process ID out of range

Usage:
 ps [options]

 Try 'ps --help <simple|list|output|threads|misc|all>'
  or 'ps --help <s|l|o|t|m|a>'
 for additional help text.

For more details see ps(1).
Python 3.9.18
@darkvertex darkvertex added the bug label Apr 24, 2024
@JeanChristopheMorinPerso JeanChristopheMorinPerso added the shell Shell related issues label Apr 27, 2024
@JeanChristopheMorinPerso
Copy link
Member

Hi @darkvertex, thanks for the report and the detailed reproduction steps. I think we could probably check if the ppid is zero before doing running the ps command.

@JeanChristopheMorinPerso
Copy link
Member

I created #1735 which should fix it. Whenever you have time, can you try it out and see if it fixes the problem on your side please?

@darkvertex
Copy link
Author

@JeanChristopheMorinPerso Your PR #1735 appears to resolve the bug. 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug shell Shell related issues
Projects
None yet
2 participants