Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[bug] Python 3.9 macOS segfault #2558

Closed
henryiii opened this issue Oct 8, 2020 · 10 comments · Fixed by #2576
Closed

[bug] Python 3.9 macOS segfault #2558

henryiii opened this issue Oct 8, 2020 · 10 comments · Fixed by #2576
Labels
bug ci flake Known CI issues. Failures on CI should be recorded in an issue, but not block PRs.

Comments

@henryiii
Copy link
Collaborator

henryiii commented Oct 8, 2020

The following code segfaults on Python 3.9 and macOS:

from pybind11_tests import gil_scoped as m

def _python_to_cpp_to_python():
    """Calls different C++ functions that come back to Python."""
    class ExtendedVirtClass(m.VirtClass):
        def virtual_func(self):
            pass

        def pure_virtual_func(self):
            pass

    extended = ExtendedVirtClass()
    m.test_callback_py_obj(lambda: None)
    m.test_callback_std_func(lambda: None)
    m.test_callback_virtual_func(extended)
    m.test_callback_pure_virtual_func(extended)


for x in range(1000):
    _python_to_cpp_to_python()

Debug output:

* thread #2, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x74736e6f43)
  * frame #0: 0x000000010017434a Python`meth_dealloc + 120
    frame #1: 0x000000010014645a Python`instancemethod_dealloc + 65
    frame #2: 0x0000000100168feb Python`PyDict_Clear + 409
    frame #3: 0x00000001001866ec Python`type_clear + 59
    frame #4: 0x0000000100233f00 Python`collect + 2558
    frame #5: 0x00000001002334f1 Python`_PyGC_CollectNoFail + 67
    frame #6: 0x000000010020619c Python`_PyImport_Cleanup + 1545
    frame #7: 0x0000000100216680 Python`Py_FinalizeEx + 160
    frame #8: 0x000000010023279e Python`Py_RunMain + 1408
    frame #9: 0x0000000100232c86 Python`pymain_main + 306
    frame #10: 0x0000000100232cd4 Python`Py_BytesMain + 42
    frame #11: 0x00007fff6910dcc9 libdyld.dylib`start + 1

Originally posted by @henryiii in #2391 (comment)

@rwgk
Copy link
Collaborator

rwgk commented Oct 8, 2020

from pybind11_tests import gil_scoped as m

IIUC gil_scoped is a submodule, which is a little tricker for me to get going in the Google environment. Is there a chance you could try isolating the 4 callback functions in a minimal main module? — I could do that myself, but then I will not be sure that it still fails in your environment.

@henryiii
Copy link
Collaborator Author

henryiii commented Oct 8, 2020

That'll take a little time, since I have to pull the cross module utils out to build separately.

@rwgk
Copy link
Collaborator

rwgk commented Oct 8, 2020

That'll take a little time, since I have to pull the cross module utils out to build separately.

Oh, please don't worry then for now. I'll try to get it going as is, maybe it's not as tricky as I'm thinking.

@henryiii
Copy link
Collaborator Author

henryiii commented Oct 8, 2020

It's fairly well isolated as far as I can tell; it's two cpp files, and you should be able to convert TEST_SUBMODULE(gil_scoped, m) -> PYBIND11_MODULE(gil_scoped, m). I think it's just setting up the surrounding build and import of "cross_module_gil_utils".

@henryiii henryiii added bug ci flake Known CI issues. Failures on CI should be recorded in an issue, but not block PRs. labels Oct 8, 2020
@henryiii
Copy link
Collaborator Author

henryiii commented Oct 8, 2020

Possibly related: #2422

@YannickJadoul
Copy link
Collaborator

Related to https://bugs.python.org/issue41237

@bstaletic and I managed to reproduce with valgrind on Linux :-)

==31640== Invalid read of size 1
==31640==    at 0x62B0D0: meth_dealloc (methodobject.c:169)
==31640==    by 0x464169: _Py_Dealloc (object.c:2209)
==31640==    by 0x4556A2: _Py_DECREF (object.h:430)
==31640==    by 0x4556A2: _Py_XDECREF (object.h:497)
==31640==    by 0x4556A2: insertdict (dictobject.c:1123)
==31640==    by 0x454F4B: PyDict_SetItem (dictobject.c:1573)
==31640==    by 0x462E23: _PyModule_ClearDict (moduleobject.c:612)
==31640==    by 0x462A6E: _PyModule_Clear (moduleobject.c:560)
==31640==    by 0x4F7D25: _PyImport_Cleanup (import.c:606)
==31640==    by 0x50BFCA: Py_FinalizeEx (pylifecycle.c:1431)
==31640==    by 0x4206D0: Py_RunMain (main.c:679)
==31640==    by 0x4204A2: pymain_main (main.c:707)
==31640==    by 0x42047D: Py_BytesMain (main.c:731)
==31640==    by 0x42044F: main (python.c:15)
==31640==  Address 0x8a3e161 is 17 bytes inside a block of size 32 free'd
==31640==    at 0x4C3123B: operator delete(void*) (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==31640==    by 0xA92B4A5: pybind11::cpp_function::destruct(pybind11::detail::function_record*) (pybind11.h:469)
==31640==    by 0xA929877: pybind11::cpp_function::initialize_generic(pybind11::detail::function_record*, char const*, std::type_info const* const*, unsigned long)::{lambda(void*)#1}::operator()(void*) const (pybind11.h:359)
==31640==    by 0xA929897: pybind11::cpp_function::initialize_generic(pybind11::detail::function_record*, char const*, std::type_info const* const*, unsigned long)::{lambda(void*)#1}::_FUN(void*) (pybind11.h:360)
==31640==    by 0xA92171F: pybind11::capsule::capsule(void const*, void (*)(void*))::{lambda(_object*)#1}::operator()(_object*) const (pytypes.h:1218)
==31640==    by 0xA92173F: pybind11::capsule::capsule(void const*, void (*)(void*))::{lambda(_object*)#1}::_FUN(_object*) (pytypes.h:1219)
==31640==    by 0x618402: capsule_dealloc (capsule.c:261)
==31640==    by 0x464169: _Py_Dealloc (object.c:2209)
==31640==    by 0x62B11F: _Py_DECREF (object.h:430)
==31640==    by 0x62B11F: _Py_XDECREF (object.h:497)
==31640==    by 0x62B11F: meth_dealloc (methodobject.c:167)
==31640==    by 0x464169: _Py_Dealloc (object.c:2209)
==31640==    by 0x4556A2: _Py_DECREF (object.h:430)
==31640==    by 0x4556A2: _Py_XDECREF (object.h:497)
==31640==    by 0x4556A2: insertdict (dictobject.c:1123)
==31640==    by 0x454F4B: PyDict_SetItem (dictobject.c:1573)
==31640==  Block was alloc'd at
==31640==    at 0x4C3017F: operator new(unsigned long) (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==31640==    by 0xA92A69F: pybind11::cpp_function::initialize_generic(pybind11::detail::function_record*, char const*, std::type_info const* const*, unsigned long) (pybind11.h:352)
==31640==    by 0xAF61B8F: void pybind11::cpp_function::initialize<test_submodule_virtual_functions(pybind11::module_&)::{lambda(ExampleVirt*, int)#1}, int, ExampleVirt*, int, pybind11::name, pybind11::scope, pybind11::sibling>(test_submodule_virtual_functions(pybind11::module_&)::{lambda(ExampleVirt*, int)#1}&&, int (*)(ExampleVirt*, int), pybind11::name const&, pybind11::scope const&, pybind11::sibling const&) (pybind11.h:209)
==31640==    by 0xAF600EA: pybind11::cpp_function::cpp_function<test_submodule_virtual_functions(pybind11::module_&)::{lambda(ExampleVirt*, int)#1}, pybind11::name, pybind11::scope, pybind11::sibling, void>(test_submodule_virtual_functions(pybind11::module_&)::{lambda(ExampleVirt*, int)#1}&&, pybind11::name const&, pybind11::scope const&, pybind11::sibling const&) (pybind11.h:76)
==31640==    by 0xAF5E620: pybind11::module_& pybind11::module_::def<test_submodule_virtual_functions(pybind11::module_&)::{lambda(ExampleVirt*, int)#1}>(char const*, test_submodule_virtual_functions(pybind11::module_&)::{lambda(ExampleVirt*, int)#1}&&) (pybind11.h:893)
==31640==    by 0xAF5DCD4: test_submodule_virtual_functions(pybind11::module_&) (test_virtual_functions.cpp:217)
==31640==    by 0xA91C04A: test_initializer::test_initializer(char const*, void (*)(pybind11::module_&))::{lambda(pybind11::module_&)#1}::operator()(pybind11::module_&) const (pybind11_tests.cpp:41)
==31640==    by 0xA91E86F: std::_Function_handler<void (pybind11::module_&), test_initializer::test_initializer(char const*, void (*)(pybind11::module_&))::{lambda(pybind11::module_&)#1}>::_M_invoke(std::_Any_data const&, pybind11::module_&) (std_function.h:316)
==31640==    by 0xA93560C: std::function<void (pybind11::module_&)>::operator()(pybind11::module_&) const (std_function.h:706)
==31640==    by 0xA91C88B: pybind11_init_pybind11_tests(pybind11::module_&) (pybind11_tests.cpp:90)
==31640==    by 0xA91C36B: PyInit_pybind11_tests (pybind11_tests.cpp:65)
==31640==    by 0x4F8943: _PyImport_LoadDynamicModuleWithSpec (importdl.c:164)

@tdegeus
Copy link
Contributor

tdegeus commented Oct 13, 2020

Possibly related : conda-forge/python-goosefem-feedstock#7

@YannickJadoul
Copy link
Collaborator

YannickJadoul commented Oct 13, 2020

@tdegeus, if you have a stack trace of the segfault, we should be able to (reasonably) quickly judge if that's the case.
At any rate, #2576 is almost ready, I believe, and 2.6.0 will still contain this fix. Meanwhile, Python has fixed this for 3.9.1 :-)

@tdegeus
Copy link
Contributor

tdegeus commented Oct 13, 2020

@YannickJadoul I'm not sure how I can easily stacktrace on conda's CI. I tried on my own mac, but there I don't have the problem. Probably the easiest is to wait for the new release and see if the problem is solved.

@YannickJadoul
Copy link
Collaborator

@tdegeus Hmmm, not sure. I'm not a conda user and it took us a while to figure out how to best debug this issue as well :-/

Alternatively, can you try with #2576?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug ci flake Known CI issues. Failures on CI should be recorded in an issue, but not block PRs.
Projects
None yet
4 participants