Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

embed.h Python 3.11 config.use_environment=1 + PYTHONPATH test #4119

Merged
merged 10 commits into from
Aug 21, 2022

Conversation

rwgk
Copy link
Collaborator

@rwgk rwgk commented Aug 5, 2022

Description

With pybind11 v2.10.0 & Python 3.11, the PyConfig in embed.h does not update sys.path from PYTHONPATH, which is incompatible with the behavior for all Python versions before 3.11. This PR changes embed.h so that PYTHONPATH is used also with Python 3.11.

Suggested changelog entry:

embed.h was changed so that `PYTHONPATH` is used also with Python 3.11 (established behavior).

@rwgk rwgk added the python dev Working on development versions of Python label Aug 5, 2022
@rwgk
Copy link
Collaborator Author

rwgk commented Aug 5, 2022

@vstinner I saw you worked on safe_path. I'm wondering if you have clues that could help me understand a difference in behavior between Python 3.10.5 and 3.11.0b5:

  • 3.10.5: pybind11 test_embed updates sys.path from PYTHONPATH
  • 3.11.0b5: PYTHONPATH is ignored in the exact same situation***

Does that ring any bells? Some details are below, JIC.

@henryiii I wonder if this could have to do with PR #3923, which introduced a new initialize_interpreter() implementation specifically for 3.11:

https://github.com/pybind/pybind11/pull/3923/files#diff-80008385afb03df2a6e6d29a0b500fbaf9593d1baba5a5bda112998ecdfdb131


*** The "exact same situation" is possibly unusual (interactively building Python from sources with configure; make install; using my own scons-based tools to build the pybind11 tests), but the only variable is the Python version.

I added fprintf to pybind11 test_embed sources: 198c9c2

And patched the Python sources like this:

--- Objects/unicodeobject.c.orig        2022-08-05 08:18:54.186431792 -0700
+++ Objects/unicodeobject.c     2022-08-04 17:36:36.375312426 -0700
@@ -16005,13 +16005,14 @@
 init_fs_encoding(PyThreadState *tstate)
 {
     PyInterpreterState *interp = tstate->interp;
+    _Py_DumpPathConfig(tstate);

     /* Update the filesystem encoding to the normalized Python codec name.
        For example, replace "ANSI_X3.4-1968" (locale encoding) with "ascii"
        (Python codec name). */
     PyConfig *config = (PyConfig*)_PyInterpreterState_GetConfig(interp);
     if (config_get_codec_name(&config->filesystem_encoding) < 0) {
-        _Py_DumpPathConfig(tstate);
+        //_Py_DumpPathConfig(tstate);
         return _PyStatus_ERR("failed to get the Python codec "
                              "of the filesystem encoding");
     }

I'm seeing this with 3.10.5:

Python path configuration:
  PYTHONHOME = (not set)
  PYTHONPATH = '/usr/local/google/home/rwgk/forked/build_py3110b5/lib'
  program name = '/usr/local/google/home/rwgk/usr_local_like/Python-3.10.5/bin/python3'
  isolated = 0
  environment = 1
  user site = 1
  import site = 1
  sys._base_executable = '/usr/local/google/home/rwgk/usr_local_like/Python-3.10.5/bin/python3'
  sys.base_prefix = '/usr/local/google/home/rwgk/usr_local_like/Python-3.10.5'
  sys.base_exec_prefix = '/usr/local/google/home/rwgk/usr_local_like/Python-3.10.5'
  sys.platlibdir = 'lib'
  sys.executable = '/usr/local/google/home/rwgk/usr_local_like/Python-3.10.5/bin/python3'
  sys.prefix = '/usr/local/google/home/rwgk/usr_local_like/Python-3.10.5'
  sys.exec_prefix = '/usr/local/google/home/rwgk/usr_local_like/Python-3.10.5'
  sys.path = [
    '/usr/local/google/home/rwgk/forked/build_py3110b5/lib',
    '/usr/local/google/home/rwgk/usr_local_like/Python-3.10.5/lib/python310.zip',
    '/usr/local/google/home/rwgk/usr_local_like/Python-3.10.5/lib/python3.10',
    '/usr/local/google/home/rwgk/usr_local_like/Python-3.10.5/lib/python3.10/lib-dynload',
  ]
Running tests in directory "/usr/local/google/home/rwgk/forked/pybind11/tests/test_embed":

LOOOK PYTHONPATH /usr/local/google/home/rwgk/forked/build_py3105/lib:/foo/bar

LOOOK sys.path ['', '/usr/local/google/home/rwgk/forked/build_py3105/lib', '/foo/bar', '/usr/lib/python310.zip', '/usr/lib/python3.10', '/usr/lib/python3.10/lib-dynload', '/usr/local/lib/python3.10/dist-packages', '/usr/lib/python3/dist-packages', '/usr/lib/python3.10/dist-packages']
===============================================================================
All tests passed (1553 assertions in 11 test cases)

And this with 3.11.0b5:

Python path configuration:
  PYTHONHOME = (not set)
  PYTHONPATH = '/usr/local/google/home/rwgk/forked/build_py3110b5/lib'
  program name = '/usr/local/google/home/rwgk/usr_local_like/Python-3.11.0b5/bin/python3'
  isolated = 0
  environment = 1
  user site = 1
  safe_path = 0
  import site = 1
  is in build tree = 0
  stdlib dir = '/usr/local/google/home/rwgk/usr_local_like/Python-3.11.0b5/lib/python3.11'
  sys._base_executable = '/usr/local/google/home/rwgk/usr_local_like/Python-3.11.0b5/bin/python3'
  sys.base_prefix = '/usr/local/google/home/rwgk/usr_local_like/Python-3.11.0b5'
  sys.base_exec_prefix = '/usr/local/google/home/rwgk/usr_local_like/Python-3.11.0b5'
  sys.platlibdir = 'lib'
  sys.executable = '/usr/local/google/home/rwgk/usr_local_like/Python-3.11.0b5/bin/python3'
  sys.prefix = '/usr/local/google/home/rwgk/usr_local_like/Python-3.11.0b5'
  sys.exec_prefix = '/usr/local/google/home/rwgk/usr_local_like/Python-3.11.0b5'
  sys.path = [
    '/usr/local/google/home/rwgk/forked/build_py3110b5/lib',
    '/usr/local/google/home/rwgk/usr_local_like/Python-3.11.0b5/lib/python311.zip',
    '/usr/local/google/home/rwgk/usr_local_like/Python-3.11.0b5/lib/python3.11',
    '/usr/local/google/home/rwgk/usr_local_like/Python-3.11.0b5/lib/python3.11/lib-dynload',
  ]
Running tests in directory "/usr/local/google/home/rwgk/forked/pybind11/tests/test_embed":

LOOOK PYTHONPATH /usr/local/google/home/rwgk/forked/build_py3110b5/lib:/foo/bar

LOOOK sys.path ['', '/usr/local/google/home/rwgk/usr_local_like/Python-3.11.0b5/lib/python311.zip', '/usr/local/google/home/rwgk/usr_local_like/Python-3.11.0b5/lib/python3.11', '/usr/local/google/home/rwgk/usr_local_like/Python-3.11.0b5/lib/python3.11/lib-dynload', '/usr/local/google/home/rwgk/usr_local_like/Python-3.11.0b5/lib/python3.11/site-packages']

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
test_embed is a Catch v2.12.2 host application.
Run with -? for options

-------------------------------------------------------------------------------
Restart the interpreter
-------------------------------------------------------------------------------
/usr/local/google/home/rwgk/forked/pybind11/tests/test_embed/test_interpreter.cpp:174
...............................................................................

/usr/local/google/home/rwgk/forked/pybind11/tests/test_embed/test_interpreter.cpp:192: FAILED:
  REQUIRE( py::module_::import("external_module").attr("A")(123).attr("value").cast<int>() == 123 )
due to unexpected exception with message:
  ModuleNotFoundError: No module named 'external_module'

===============================================================================
test cases:   11 |   10 passed | 1 failed
assertions: 1537 | 1536 passed | 1 failed

@vstinner
Copy link
Contributor

vstinner commented Aug 5, 2022

pybind11 seems to use a different API on Python 3.10 and Python 3.11.

PySys_SetArgvEx() called after Python init on Python 3.10 versus Py_InitializeFromConfig() on Python 3.11.

Your dump says:

  PYTHONPATH = '/usr/local/google/home/rwgk/forked/build_py3110b5/lib'
...
  environment = 1

So PYTHONPATH was used. It's part of the computed sys.path.

pybind11 doesn't call PySys_SetArgvEx() on Python 3.11. That's the function which changes sys.path.

@rwgk
Copy link
Collaborator Author

rwgk commented Aug 5, 2022

@vstinner Thanks a lot for the explanation and the pointers!

I see PySys_SetArgvEx() is deprecated in 3.11.

So PYTHONPATH was used. It's part of the computed sys.path.

But evidently the computed sys.path is not in effect in test_embed. — What's the recommended way to restore the established behavior?

I decided to try something straightforward, using PyRun_SimpleString(), similar to what we do already for add_program_dir_to_path (I believe @henryiii did this based on your recommendation here):

--- a/include/pybind11/embed.h
+++ b/include/pybind11/embed.h
@@ -166,6 +166,9 @@ inline void initialize_interpreter(bool init_signal_handlers = true,
         throw std::runtime_error(PyStatus_IsError(status) ? status.err_msg
                                                           : "Failed to init CPython");
     }
+    // Emulates established behavior (prior to Python 3.11 implemented by PySys_SetArgvEx()).
+    PyRun_SimpleString("import sys, os; "
+                       "sys.path[:0] = os.environ.get('PYTHONPATH', []).split(os.pathsep)");
     if (add_program_dir_to_path) {
         PyRun_SimpleString("import sys, os.path; "
                            "sys.path.insert(0, "

With that I get back the established behavior and test_embed passes in my situation. Is that approach OK?

@vstinner
Copy link
Contributor

vstinner commented Aug 9, 2022

But evidently the computed sys.path is not in effect in test_embed.

Sorry, I don't get how you manage to get PYTHONPATH environment variable being ignored. If you are looking for my help, it would help if you could write a short C or C++ program reproducing your issue. pybind11 is a very large code base that I don't know.

@rwgk
Copy link
Collaborator Author

rwgk commented Aug 9, 2022

There seems to be some kind of conflict in expectations & observations:

Sorry, I don't get how you manage to get PYTHONPATH environment variable being ignored.

But earlier:

Your dump says:

  PYTHONPATH = '/usr/local/google/home/rwgk/forked/build_py3110b5/lib'
...
  environment = 1

So PYTHONPATH was used. It's part of the computed sys.path.

=========================================================================

it would help if you could write a short C or C++ program reproducing your issue.

At the moment I'm leaning more towards root-causing why environment = 1 but sys.path is not updated.

@vstinner Would you have a pointer to the cpython sources that update sys.path? (I looked around previously but didn't get lucky. I'll try again.)

@rwgk
Copy link
Collaborator Author

rwgk commented Aug 10, 2022

At the moment I'm leaning more towards root-causing why environment = 1 but sys.path is not updated.

Interestingly — and luckily — I'm seeing a behavior change between 3.11.0b5 and 3.11.0rc1, both installed from tar files.

Everything except 3.11.0b5 vs 3.11.0rc1 being equal, when running test_embed with

  • 3.11.0b5: The Python path configuration: output appears 1 time.
  • 3.11.0rc1: The Python path configuration: output appears 13 times.

I believe it is enlightening, because only the 1st Python path configuration: has environment = 1, but the other 12 have environment = 0. As it happens, the failing unit test is after the 5th and before the 6th Python path configuration: output.

This observation got me to look again at the new 3.11-specific initialize_interpreter() implementation. We are using PyConfig_InitIsolatedConfig() which is actually documented to ignore environment variables:

I tried adding config.use_environment = 1; instead of my previous PyRun_SimpleString() diff, but that didn't help.

What do we want to do?

A. Document the pybind11 behavior change? (What do we want to suggest as a solution for people who relied on the old behavior?)
B. Switch to PyConfig_InitPythonConfig() instead? (What are the pros and cons?)
C. Add a bool isolated = false; to select between PyConfig_InitPythonConfig() or PyConfig_InitIsolatedConfig()?
D. Keep my PyRun_SimpleString() diff? Maybe behind a bool add_environ_pythonpath_to_path = true option?

@henryiii @Skylion007 @vstinner what is your vote or recommendation?

I don't like A but don't have enough background to decide between B, C, D. D doesn't look very general. Maybe B or C?

@vstinner
Copy link
Contributor

PyConfig_InitIsolatedConfig() sets isolated to 1 and isolated=1 implies use_environment=0.

@vstinner
Copy link
Contributor

B. Switch to PyConfig_InitPythonConfig() instead? (What are the pros and cons?)

That's closer to the regular "python" program. You can then tweak configuration from these defaults. For example, it uses isolated=0 and use_environment=1. Example:

@vstinner
Copy link
Contributor

@vstinner Would you have a pointer to the cpython sources that update sys.path? (I looked around previously but didn't get lucky. I'll try again.)

If you are talking about PySys_SetArgvEx() that you only call on Python 3.10 and older:

@rwgk
Copy link
Collaborator Author

rwgk commented Aug 15, 2022

I only now got a chance to poke some more at this, trying to use PyConfig_InitPythonConfig() instead:

-    PyConfig_InitIsolatedConfig(&config);
+    PyConfig_InitPythonConfig(&config);

With that as the only diff against current master, the PYTHONPATH-related failure is resolved, but it is failing here instead:

TEST_CASE("sys.argv gets initialized properly") {
    py::finalize_interpreter();
    {
        py::scoped_interpreter default_scope;
        auto module = py::module::import("test_interpreter");
        auto py_widget = module.attr("DerivedWidget")("The question");
        const auto &cpp_widget = py_widget.cast<const Widget &>();
        REQUIRE(cpp_widget.argv0().empty());
    }

    std::string cpp_widget_argv0;
    {
        char *argv[] = {strdup("a.out")};
        py::scoped_interpreter argv_scope(true, 1, argv);
        std::free(argv[0]);
        auto module = py::module::import("test_interpreter");
        auto py_widget = module.attr("DerivedWidget")("The question");
        const auto &cpp_widget = py_widget.cast<const Widget &>();
        cpp_widget_argv0 = cpp_widget.argv0();
    }
    REQUIRE(cpp_widget_argv0 == "a.out"); // FAILURE
    py::initialize_interpreter();
}

Note that I moved the REQUIRE out, otherwise it segfaults.

With all other tested removed, this simply fails. With others tests present, it turns into a segfault (possibly triggered by the following test) even with the REQUIRE moved out.

Commenting out the std::free(argv[0]); call didn't help.

I have to give up for now. We don't use embedding, I cannot spend more time on this. For my local testing I can just use my PyRun_SimpleString() workaround.

If nobody else gets to this before the next pybind11 release, we should at least document that PYTHONPATH is not used anymore with Python 3.11 and put a "help wanted" tag on it.

@vstinner
Copy link
Contributor

Python still uses PYTHONPATH env var. I cannot explain why pybind11 no longer uses it since you didn't provide a short example reproducing your issue. Python initialization is very complicated, they are like 50+ options, configuration files, etc.

@rwgk
Copy link
Collaborator Author

rwgk commented Aug 18, 2022

Python still uses PYTHONPATH env var. I cannot explain why pybind11 no longer uses it since you didn't provide a short example reproducing your issue.

It's purely a pybind11 issue, because we're using PyConfig_InitIsolatedConfig() for Python >= 3.11. I think it just escaped our attention and standard testing that this introduced a pybind11 behavior change.

Python initialization is very complicated, they are like 50+ options, configuration files, etc.

That's probably why I got stuck when trying PyConfig_InitPythonConfig().

To summarize:

  • With PyConfig_InitIsolatedConfig() PYTHONPATH is not used as documented (i.e. the config is working as intended).
  • With PyConfig_InitPythonConfig() the unit test that depends on PYTHONPATH — in my environment (!) — is working, but a different unit test is failing. I don't know why.

@vstinner
Copy link
Contributor

With PyConfig_InitIsolatedConfig() PYTHONPATH is not used as documented (i.e. the config is working as intended).

You should be able to use PYTHONPATH if you set isolated=0 and use_environment=1.

@rwgk
Copy link
Collaborator Author

rwgk commented Aug 18, 2022

You should be able to use PYTHONPATH if you set isolated=0 and use_environment=1.

Confirmed, thanks @vstinner!
(Previously I only tried use_environment=1 without also setting isolated=0.)

With that figured out, this PR boils down to just adding two lines in embed.h (I just need to remove the debug printf code).

But I also want to add a unit test that fails if PYTHONPATH is not used. I haven't worked on the embedding tests before, therefore that's a little adventure for me.

@vstinner
Copy link
Contributor

Confirmed, thanks @vstinner!

Nice! Maybe the effect of isolated=1 is not well explained in the doc.

@@ -173,6 +173,18 @@ bool has_pybind11_internals_static() {

TEST_CASE("Restart the interpreter") {
// Verify pre-restart state.
fprintf(stdout,
"\nLOOOK PYTHONPATH %s\n",
py::module_::import("os")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it be cleaner to just use the py::print API here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is/was just for debugging, I'll remove this before taking this PR out of draft mode, after adding a unit test for PYTHONPATH processing.

I think py::print could do indirections. For debugging I generally try to go straight down to the bare metal.

@rwgk rwgk changed the title WIP Python 3.11 related embed.h Python 3.11 config.use_environment=1 + PYTHONPATH test Aug 20, 2022
@rwgk rwgk marked this pull request as ready for review August 20, 2022 04:27
@rwgk
Copy link
Collaborator Author

rwgk commented Aug 20, 2022

@Skylion007 @henryiii

This (small) PR is ready for review now.

@vstinner
Copy link
Contributor

I like the final change, short and simple :-D

@rwgk rwgk requested a review from Skylion007 August 20, 2022 20:21
@rwgk rwgk merged commit 68e6fda into pybind:master Aug 21, 2022
@rwgk rwgk deleted the py311_related branch August 21, 2022 16:44
@github-actions github-actions bot added the needs changelog Possibly needs a changelog entry label Aug 21, 2022
@vstinner
Copy link
Contributor

Congrats for fixing your issue!

@henryiii henryiii removed the needs changelog Possibly needs a changelog entry label Oct 20, 2022
rwgk added a commit to rwgk/pybind11 that referenced this pull request Oct 21, 2022
…ybind#4119)

* Add debug fprintf to test_interpreter.cpp

* Update `sys.path` from `PYTHONPATH` in Python >= 3.11 branch of `initialize_interpreter()`

* Use `config.isolated = 0; config.use_environment = 1;`

As suggsted by @vstinner here: pybind#4119 (comment)

* Add `TEST_CASE("PYTHONPATH is used to update sys.path")`

* Fix clang-tidy error.

* Use `_putenv_s()` under Windows.

* Fix clang-tidy error: argument name ... in comment does not match parameter name

* Remove slash from PYTHONPATH addition, to work around Windows slash-vs-backslash issue.

* Use `py::str(...)` instead of `.attr("__str__")` as suggested by @Skylion007

Co-authored-by: Aaron Gokaslan <[email protected]>

Co-authored-by: Aaron Gokaslan <[email protected]>
rwgk pushed a commit to rwgk/pybind11 that referenced this pull request Oct 21, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
python dev Working on development versions of Python
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants