Show segfault in embed scoped_interpreter test #4500

PhilipDeegan · 2023-02-07T22:38:05Z

Hey, we've noticed we're getting a segfault in the destructor of the scoped_interpreter.

I've updated a test of yours to force a full lifecycle of the scoped_interpreter class before anything else.

I've run this on my own fork and it does segfault.

It's possible I have misunderstood something so please let me know if this is not a smart thing to be doing.

rwgk · 2023-02-08T00:01:57Z

Hey, we've noticed we're getting a segfault in the destructor of the scoped_interpreter.

Interesting ... I don't have good ideas (but I'm also not an embedding expert, although I fixed up a thing here and there recently).

We have lots of py::scoped_interpreter that go out of scope in test_interpreter.cpp, I wonder what's special about the situation you created with this PR.

Another question: did this work before? With an older version of pybind11? If yes, when did it break? (git bisect?)

But maybe the first thing to try: run in a debugger and look at the stack trace. Could you please try and post the stack trace here?

PhilipDeegan · 2023-02-08T00:07:25Z

hi,

did this work before?

It was working as of a yesterday.

locally with the following

#include "pybind11/embed.h"

int main(){
    pybind11::scoped_interpreter g;
}

I get this stack trace

801 Python/pystate.c: No such file or directory.
#0  0x00007ffff7d002a6 in new_threadstate (interp=0x0) at Python/pystate.c:801
#1  0x00007ffff7d00267 in PyThreadState_New (interp=<optimized out>) at Python/pystate.c:855
#2  0x00007ffff7bc0a56 in PyGILState_Ensure () at Python/pystate.c:1685
#3  0x000055555555fe75 in pybind11::detail::get_internals()::gil_scoped_acquire_local::gil_scoped_acquire_local() (this=0x7fffffffdc94) at p/include/pybind11/detail/../detail/internals.h:438
#4  0x0000555555560969 in pybind11::detail::get_internals () at p/include/pybind11/detail/../detail/internals.h:443
#5  0x0000555555560e54 in pybind11::detail::local_internals::local_internals (this=0x5555555af3d0) at p/include/pybind11/detail/../detail/internals.h:529
#6  0x0000555555560faa in pybind11::detail::get_local_internals () at p/include/pybind11/detail/../detail/internals.h:548
#7  0x000055555556931a in pybind11::finalize_interpreter () at p/include/pybind11/embed.h:264
#8  0x0000555555569444 in pybind11::scoped_interpreter::~scoped_interpreter (this=0x7fffffffdeef, __in_chrg=<optimized out>) at p/include/pybind11/embed.h:313
#9  0x000055555555af66 in main () at cpp.cpp:5

If I try a commit from Dec 1st 2022 it works

Author: Aaron Gokaslan <[email protected]>
Date:   Thu Dec 1 15:15:47 2022 -0500

    chore: Convert direct multiprocessing.set_start_method("forkserver") call to a pytest fixture. (#4377)
    
    * chore: convert multiprocessing set_spawn to fixture in pytest
    
    * Switch to early return

PhilipDeegan · 2023-02-08T00:10:38Z

If I pop off the top two commits it no longer segfaults

rwgk · 2023-02-08T00:15:00Z

If I pop off the top two commits it no longer segfaults

Then I'd say it's pretty much certain, #4459 is the culprit.

That one came without a test. I'd say let's roll it back until we understand why it's causing the segfault here.

rwgk · 2023-02-08T00:24:30Z

Probably the trouble really started with #3744.

Apparently that broke atexit, which led to #4459.

The question to solve: how can we make this PR and atexit work?

We need a volunteer to work on this, ideally this time around with full unit testing, even if it's not easy.

@Skylion007 I'll create the rollback PR.

…ybind#4459 (pybind#4486)" This reverts commit b2c1978. See pybind#4500 for background.

…4459 (#4486)" (#4501) This reverts commit b2c1978. See #4500 for background.

PhilipDeegan · 2023-02-08T10:38:13Z

Do you want to merge this, or something similar to prevent these kinds of issues in future?
You can see things are passing now that that commit has been reverted.

Feel free to close otherwise.

Skylion007 · 2023-02-08T15:17:14Z

@PhilipDeegan Absolutely, if you could rebase and this doesn't SEGFAULT, then it would be a nice addition to the test suite (although please add a commit referencing the issue).

Skylion007 · 2023-02-08T15:44:45Z

So I did some digging through the stack trace and found what's going on a bit @rwgk. The issue is the detail::get_local_internals() cannot be called safely after Py_Finalize():

pybind11/include/pybind11/embed.h

Line 259 in b8f2855

detail::get_local_internals().registered_types_cpp.clear();

There is some serious nasty static variable behavior in our code right now. We construct local_internals and store it in a static variable.

pybind11/include/pybind11/detail/internals.h

Line 548 in b8f2855

static auto *locals = new local_internals();

The ctor for local_internals() has multiple calls to the Python API though which is not safe when the interpreter is not initialized! It seems in practice to be initialized on the first call to get_local_internals(), which with atexit fix is after Py_Finalize(). The first thing it will do is call get_internals() which will try to do a GIL lock before calling Python calls. It can't do this though because the interpreter isn't initialized! Hence the segfault:

pybind11/include/pybind11/detail/internals.h

Line 438 in b8f2855

gil_scoped_acquire_local() : state(PyGILState_Ensure()) {}

The real issue is that the initial construction of local_internals is unsafe, UB and this is exposing it. The hotfix for this is just to put that previous change back in, but cache a reference to the output of get_local_internals() before Py_Finalize() is called then clear the C++ type_info after Py_Finalize(). Or safer yet, cache references to the C++ maps themselves and only clear them.

Skylion007 · 2023-02-08T16:11:57Z

Actually @PhilipDeegan @rwgk, I introduced your test case into our test suite and fixed the atexit bug in #4505.

Skylion007 · 2023-02-08T16:59:47Z

Hmm, I spoke too soon. Seems like some compilers are exploiting the UB for an optimization which is causing them to still be broken.

Show segfault in embed scoped_interpreter test

9c71fbc

rwgk pushed a commit to rwgk/pybind11 that referenced this pull request Feb 8, 2023

Revert "bugfix: Keep registered types until after Py_Finalize(). Fix p…

4efdec8

…ybind#4459 (pybind#4486)" This reverts commit b2c1978. See pybind#4500 for background.

rwgk mentioned this pull request Feb 8, 2023

Revert PR #4486 (until we understand the segfault under PR #4500) #4501

Merged

rwgk pushed a commit that referenced this pull request Feb 8, 2023

Revert "bugfix: Keep registered types until after Py_Finalize(). Fix #…

f8713ec

…4459 (#4486)" (#4501) This reverts commit b2c1978. See #4500 for background.

Merge branch 'pybind:master' into scoped_segfault

ed9b38a

Skylion007 mentioned this pull request Feb 8, 2023

Fix #4459 (cpp atexit callbacks) without segfault (#4500) #4505

Open

PhilipDeegan closed this Feb 8, 2023

rwgk mentioned this pull request Feb 11, 2023

FWD pybind11 google/pybind11clif#4500

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Show segfault in embed scoped_interpreter test #4500

Show segfault in embed scoped_interpreter test #4500

PhilipDeegan commented Feb 7, 2023

rwgk commented Feb 8, 2023

PhilipDeegan commented Feb 8, 2023

PhilipDeegan commented Feb 8, 2023 •

edited

Loading

rwgk commented Feb 8, 2023

rwgk commented Feb 8, 2023

PhilipDeegan commented Feb 8, 2023 •

edited

Loading

Skylion007 commented Feb 8, 2023

Skylion007 commented Feb 8, 2023 •

edited

Loading

Skylion007 commented Feb 8, 2023

Skylion007 commented Feb 8, 2023

Show segfault in embed scoped_interpreter test #4500

Show segfault in embed scoped_interpreter test #4500

Conversation

PhilipDeegan commented Feb 7, 2023

rwgk commented Feb 8, 2023

PhilipDeegan commented Feb 8, 2023

PhilipDeegan commented Feb 8, 2023 • edited Loading

rwgk commented Feb 8, 2023

rwgk commented Feb 8, 2023

PhilipDeegan commented Feb 8, 2023 • edited Loading

Skylion007 commented Feb 8, 2023

Skylion007 commented Feb 8, 2023 • edited Loading

Skylion007 commented Feb 8, 2023

Skylion007 commented Feb 8, 2023

PhilipDeegan commented Feb 8, 2023 •

edited

Loading

PhilipDeegan commented Feb 8, 2023 •

edited

Loading

Skylion007 commented Feb 8, 2023 •

edited

Loading