Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(Linux) The stage-1 bootstrap script does not propagate certain signals to the stage-2 bootstrap script. #2043

Closed
chowder opened this issue Jul 6, 2024 · 5 comments · Fixed by #2047

Comments

@chowder
Copy link
Contributor

chowder commented Jul 6, 2024

🐞 bug report

Affected Rule

The issue is caused by the rule: py_binary / py_test

Is this a regression?

Yes.

Description

(This bug report is specific to Linux)

In the old 1-stage bootstrap script, the script eventually runs the underlying program with os.execv, which replaces the process entirely, so that's ok - any signals get propagated directly to our program.

def _RunExecv(python_program, main_filename, args, env):
  # type: (str, str, list[str], dict[str, str]) -> ...
  """Executes the given Python file using the various environment settings."""
  os.environ.update(env)
  PrintVerbose("RunExecv: environ:", os.environ)
  argv = [python_program, main_filename] + args
  PrintVerbose("RunExecv: argv:", python_program, argv)
  os.execv(python_program, argv)

https://github.com/bazelbuild/rules_python/blob/main/python/private/python_bootstrap_template.txt#L379-L386

In the new 2-stage boostrap script(s), the first stage invokes the second stage directly:

env \
  "${interpreter_env[@]}" \
  "$python_exe" \
  "${interpreter_args[@]}" \
  "$stage2_bootstrap" \
  "$@"

exit $?

https://github.com/bazelbuild/rules_python/blob/main/python/private/stage1_bootstrap_template.sh#L111-L116

I'm not sure if it's platform depedent - but at least on Debian and CentOS, bash will start a child process, and signals like SIGINT don't get handled nor propagated.

I think this is a simple fix, changing the stage-2 invocation to:

exec env \
  "${interpreter_env[@]}" \
  "$python_exe" \
  "${interpreter_args[@]}" \
  "$stage2_bootstrap" \
  "$@"

I'm happy to raise a PR for the above if it is agreed to be OK.

🔬 Minimal Reproduction

I don't have one at hand, but I can provide one if needed.

🌍 Your Environment

Operating System:

Linux AMD64 - CentOS 7, and Debian 11

Output of bazel version:

Bazel 7.2.1

Rules_python version:

0.34.0

@rickeylev
Copy link
Collaborator

Oh yeah, I vaguely recall this being fixed in the original bootstrap. Yeah, I see a bunch of code for this in the original.

Yeah, this is platform specific in some way. In my own system (some ubuntu/debian thing), I don't see this behavior (ctrl+c is received by python)

call exec to fix the issue

This SGTM. It just needs to special case the zip case to not use exec -- trap is used to cleanup the extracted files; I'm pretty sure that won't survive exec ?

Is there a way to tell the shell to forward on signals to the child process? That would make this a lot easier.

@rickeylev
Copy link
Collaborator

Posterity: My repro was just a simple shell script and python program:

# /tmp/bin.sh
#!/bin/bash

env python3 /tmp/bin.py
exit $?

# /tmp/bin.py

import time
while True:
  try:
    print("sleep")
    time.sleep(100)
  except KeyboardInterrupt:
    print("keyboard interrupt")

Does the above program also exhibit the issue of signals not being received by the child process?

Searching around a bit, I couldn't find a mention of this divergent behavior for centos + debian.

@chowder
Copy link
Contributor Author

chowder commented Jul 8, 2024

This SGTM. It just needs to special case the zip case to not use exec -- trap is used to cleanup the extracted files; I'm pretty sure that won't survive exec ?

Ah yeah - I didn't notice that was being handled in stage 1.

How does this even support Windows at the moment though, is Bash on Windows? I would've thought Windows users would have to use the original 1-stage bootstrap wrapper.

(EDIT: sorry, don't know why I started talking about Windows)

Does the above program also exhibit the issue of signals not being received by the child process?

It'll propagate a CTRL+C, but try sending a kill -2 <pid> (SIGINT), this is where it trips up.

@rickeylev
Copy link
Collaborator

rickeylev commented Jul 8, 2024

Thanks for the PR. I'll have a look in the next day or so.

I think I understand what's going on now. This is some shell arcana that seems WAI. exec is indeed the easiest fix. Other solutions are much more tedious.

From what I can find, CTRL+C (the keyboard event) is slightly different than kill. Both do send SIGINT. However, the one originating from the keyboard sends it to the "process group" (I think this means all processes in the group?). Meanwhile, kill only sends it to that specific process.

If you have e.g. a shell script that runs a stage1-based program, then, when it asks what the child process pid is, it'll get the pid of the stage1 bash process ("pid1"). That there is a subprocess for the python program ("pid2") is hidden from it. kill -2 $pid1 won't work because the bash process is busy waiting for the child process to finish. kill -2 $pid2 or kill -2 -$pid1 (note the extra hyphen) will work; the former sends it to the child directly, the later sends it to the process group (emulating keyboard-initiated ctrl+c).

So, a workaround is to use kill -2 -$pid in your outer program (or otherwise send to the process group).

I looked around to see if bash can be made to forward a signal along. It sort of can, but sigint in particular looks tricky. The basic trick is to use trap, and background process, and wait:

trap 'kill -2 $child' SIGINT
python3 prog.py &
child=$!
wait $child
wait $child # have to wait twice so non-exiting handling works

This will work for e.g. sigterm, but not sigint. Why? Because when bash starts a background job (foo &), it tells that process to ignore sigint. Using set -m can prevent this (but this also makes background jobs trickier to manage). On the python side, import signal; signal.signal(2, signal.SIG_DFL) fixes the problem.

Ignoring this case for zips I think is fine for now.

It would be nice if zips didn't have this problem, too, though. I don't like having to jump through hoops on the shell side, though. This is a relatively obscure behavior. If we could get move into the stage2 logic, then this would probably work better overall. The __main__.py entry point might come in handy here? If stage1 calls python.exe foo.zip, we'll end up in foo.zip/__main__.py. That code could then extract itself and finish stage2 execution.

[1] I'm not sure if this is bash specific or most shells

github-merge-queue bot pushed a commit that referenced this issue Jul 10, 2024
When the two-stage bootstrap is used, the parent shell process runs
python as a child
process, which changes how signals are propagated. Specifically, if a
signal is sent
_directly_ to the parent (e.g. `kill $parent`), the child process
(python) won't receive
it and it will appear to be ignored. This is because the parent process
is busy waiting
for the child process.

To fix, invoke the python process using `exec` instead. Because the
process is entirely
replaced, signals are sent directly to the replacement. This can't be
used for zip files,
though, because they rely on a catching the exit signal to perform
cleanup of the extracted
files.

Fixes #2043

---------

Co-authored-by: Richard Levasseur <[email protected]>
@rickeylev
Copy link
Collaborator

Posterity: Zip builds will still ignore signals. I don't think I'll come back around to addressing that any time soon (if ever).

That said, after #2052, it should be possible to have the second stage bootstrap handle deleting the temporary directory the zip files are extracted into, which allows stage1 to call exec on the extracted files in temp. It just requires doing something like this in the second stage code:

try:
  ...
finally:
  shutil.rmtree(sys._xoptions["RULES_PYTHON_ZIP_DIR"], ignore_errors=True)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants