Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add an option to opt-out of PYTHONSAFEPATH #2060

Open
allsey87 opened this issue Jul 12, 2024 · 13 comments · Fixed by #2076
Open

Add an option to opt-out of PYTHONSAFEPATH #2060

allsey87 opened this issue Jul 12, 2024 · 13 comments · Fixed by #2076

Comments

@allsey87
Copy link

allsey87 commented Jul 12, 2024

🚀 feature request

Relevant Rules

py_binary

Description

py_binary sets PYTHONSAFEPATH=1 which while being a sensible default does cause problems. Most notably, in rules_foreign_cc where py_binary is used to wrap up the Meson build system (written in Python). While this is not an issue for the build system itself, it is an issue when Meson runs compilers or build scripts that are written in Python. The issue here is that Meson is starting processes which inherit Meson's environment variables including PYTHONSAFEPATH. This often breaks scripts which rely on importing modules from the directory in which they are contained.

Describe the solution you'd like

I would like an option on py_binary that allows me to opt out of PYTHONSAFEPATH.

py_binary(
   name = "mypybinary",
   safe_path = False, # defaults to True
)

Describe alternatives you've considered

  • Using --action_env=PYTHONSAFEPATH=: no effect
  • Specifying the env argument on py_binary as follows: no effect
py_binary(
   name = "mypybinary",
   env = {
      "PYTHONSAFEPATH": ""
   }
)
@rickeylev
Copy link
Collaborator

setting PYTHONSAFEPATH in env has no effect

Yeah, we should respect an outer env setting at least.

Using env to set this does raise a bit of a design question. Normally that attribute is only respected when bazel run or bazel test is used with the target. That behavior has always struck me as a bit surprising. But, that behavior is also what most rules do, I think? So it's not surprising insofar as its the existing convention.

Anyways, at the least, we can update the logic so it doesn't always force the value.

@allsey87
Copy link
Author

allsey87 commented Jul 15, 2024

setting PYTHONSAFEPATH in env has no effect

Yeah, we should respect an outer env setting at least.

I would lean towards a dedicated attribute which also clearly documents that this environment variable is set by default, but allowing it to be unset via the env attribute would also be sufficient.

Using env to set this does raise a bit of a design question. Normally that attribute is only respected when bazel run or bazel test is used with the target.

Do you mean env taking effect during bazel build on a py_binary?

@allsey87
Copy link
Author

Any thoughts @aignas? If we can agree on disabling via env or a dedicated attribute, I can put a PR together.

@sbc100
Copy link

sbc100 commented Jul 18, 2024

How about using -P flag when starting the python interpreter instead of PYTHONSAFEPATH? That way only the py_binary itself would be effected and not all of its transitive sub-processes?

@allsey87
Copy link
Author

How about using -P flag when starting the python interpreter instead of PYTHONSAFEPATH? That way only the py_binary itself would be effected and not all of its transitive sub-processes?

That sounds like quite a good solution. I am just wondering how this would interact with things like the multiprocessing library which starts up other interpreters. I guess arguments such as -P would be forwarded to those interpreters?

@rickeylev
Copy link
Collaborator

Do you mean env taking effect during bazel build on a py_binary?

Yes. So given py_binary(env={"FOO": "value"}), the generated executable (bash script, in this case) ultimately calls e.g. env FOO=value python3 main.py

use dedicated attribute

The issue with dedicated attributes for these sort of settings is the API surface is very large because it grows to encompass all the env vars and interpreter flags, of which there are many: https://docs.python.org/3/using/cmdline.html

I am +1 on a general interpreter_args attribute, though. IIRC, not all the interpreter args have environment variables and vice-versa.

use -P to set this because it won't propagate to subprocesses

This does sound appealing. Actually, that sounds necessary if one py_binary has another py_binary as a data dependency and they have different settings.

It'll be a bit annoying to code since the shell code will need to reconcile the outer environment, the env attribute values, and the interpreter_args values, but quite doable.

interaction with multiprocessing

Multiprocessing will propagate the interpreter args (sys.flags) either directly when spawn is used, or implicitly when fork is used. For environment variables, it'll inherit them.

So, insofar as multiprocessing is concerned, these settings will propagate regardless.

github-merge-queue bot pushed a commit that referenced this issue Jul 19, 2024
By default, PYTHONSAFEPATH is enabled to help prevent imports being
found where they
shouldn't be. However, this behavior can't be disabled, which makes it
harder to use
a py_binary when the non-safe path behavior is explicitly desired.

To fix, the bootstrap now respects the caller environment's
PYTHONSAFEPATH environment variable, if set. This allows the callers to
set `PYTHONSAFEPATH=` (empty string) to
override the default behavior that enables it.

Fixes #2060
@rickeylev
Copy link
Collaborator

Reopening because:

The PR fixes it for bootstrap=script, but that isn't the default yet. While it'll become the default in a couple releases, it won't be used for Windows. Also, I suspect that zip invocations going through foo.zip/__main__.py will still be forcing safe path.

@allsey87 if you want to send a PR to update https://github.com/bazelbuild/rules_python/blob/main/python/private/python_bootstrap_template.txt, SGTM. All it probably requires is modifying L500 to conditionally set PYTHONSAFEPATH based on the existing environment. You can probably copy/paste the //tests/base_rules:inherit_pythonsafepath_env_test that PR 2076 added and just set bootstrap_impl="system_python" and it should Just Work to verify behavior.

I probably won't have time to fix this myself before next release, but can probably fit in reviewing a PR.

@rickeylev rickeylev reopened this Jul 19, 2024
@allsey87
Copy link
Author

I'm travelling at the moment so the earliest I can get to this will be on the 29th

@allsey87
Copy link
Author

allsey87 commented Aug 5, 2024

@rickeylev looking into this now but I have stumbled upon what might be another bug. It seems the environment variables set via the env attribute of py_binary are not visible from python_bootstrap_template.txt?

diff --git a/python/private/python_bootstrap_template.txt b/python/private/python_bootstrap_template.txt
index 0f9c90b..083b880 100644
--- a/python/private/python_bootstrap_template.txt
+++ b/python/private/python_bootstrap_template.txt
@@ -497,7 +497,8 @@ def Main():
 
   # Don't prepend a potentially unsafe path to sys.path
   # See: https://docs.python.org/3.11/using/cmdline.html#envvar-PYTHONSAFEPATH
-  new_env['PYTHONSAFEPATH'] = '1'
+  if os.environ.get('PYTHONSAFEPATH') != '':
+    new_env['PYTHONSAFEPATH'] = '1'
 
   main_filename = os.path.join(module_space, main_rel_path)
   main_filename = GetWindowsPathWithUNCPrefix(main_filename)

os.environ does contain environment variables, but setting env like this does not seem to affect those variables:

""" Rule for building meson from source. """

load("@rules_python//python:defs.bzl", "py_binary")

def meson_tool(name, main, data, requirements = [], **kwargs):
    py_binary(
        name = name,
        srcs = [main],
        data = data,
        deps = requirements,
        python_version = "PY3",
        main = main,
        env = {
            "PYTHONSAFEPATH": ""
        },
        **kwargs
    )

Is this the expected behaviour?

@rickeylev
Copy link
Collaborator

env only affects the environment when bazel run or bazel test is used; I'm guessing that's what you're seeing.

This is why I pointed to the //tests/base_rules:inherit_pythonsafepath_env_test target -- it runs the py_binary manually with the envvar set to the different values to affect behavior.

@allsey87
Copy link
Author

allsey87 commented Aug 14, 2024

env only affects the environment when bazel run or bazel test is used; I'm guessing that's what you're seeing.

This is why I pointed to the //tests/base_rules:inherit_pythonsafepath_env_test target -- it runs the py_binary manually with the envvar set to the different values to affect behavior.

This all sounds a bit hacky to me. Considering that env variables are only supposed to apply when bazel run and bazel test are used, it seems to me that it would be better to just add a parameter called safe_path which defaults to True and applies to build, run, and test.

The issue with dedicated attributes for these sort of settings is the API surface is very large because it grows to encompass all the env vars and interpreter flags, of which there are many: https://docs.python.org/3/using/cmdline.html

I see where you are coming from, but would say that PYTHONSAFEPATH is an exception here since it is not the default behaviour of the Python interpreter and it is being explicitly set by rules_python without a way to opt out of it.

In my opinion, either adding the attribute safe_path or just using -P instead as @sbc100 pointed out are the best the solutions to this problem. Using env to "undo" this just feels counterintuitive to me. This being said, I am fine with proceeding in whichever direction you think is best.

@rickeylev
Copy link
Collaborator

Considering that env variables are only supposed to apply when bazel run and bazel test are used

Are they really, though? It's "merely" a convention today, and I couldn't find anything authoritative saying they must be ignored outside of run/test. Over the years, I've seen people (myself included) confused and annoyed that the env and args attributes aren't respected when bazel-bin/foo is used. Hm, I wonder if it simply derives from e.g. C-style executables: such programs are native executables and they typically don't have a clean way to "inject" startup code before the user code (main) runs.

In any case, we're skirting this design question for now by just making the startup respect the existing environment variable, if present. It will already respect other Python environment variables (e.g. PYTHONHASHSEED, PYTHONIOENCODING, etc), so it makes it a bit more consistent in that regard. (Whether they're set via the py_test.env attribute, ctx.actions.run(env=...), or manually in the command line invocation).

I'm pretty strongly -1 on a dedicated attribute. There's simply too many other PYTHON* environment variables and Python command line options for a tenable attribute-based API, and the set of them changes outside our control. When I look at the environment variables and command line flags, I can imagine a few more that would make sense to enable by default (the site package ones, hash randomization (when exec mode), maybe dont-write-byte-code, ...). I also don't like having a third way to control these settings. It's already confusing that there's an environment variable and interpreter arg that control it.

Use -P to control it

I'd be fine with -P being used instead of using the environment variable. I'd approve a PR changing things to pass -P instead. However, it still leaves the problem of how to opt out.

@allsey87
Copy link
Author

Use -P to control it

I'd be fine with -P being used instead of using the environment variable. I'd approve a PR changing things to pass -P instead. However, it still leaves the problem of how to opt out.

Well, I think this is actually an XY problem. What I actually need is to just stop PYTHONSAFEPATH propagating to child processes, which -P would achieve. I don't actually see the need for opting out of safe path for a given py_binary. Sorry if I didn't communicate this well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants