Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segmentation fault on pytest exit when creating a SerializableObjectWithMetadata #1474

Open
ssteinbach opened this issue Nov 1, 2022 · 10 comments · May be fixed by #1490
Open

Segmentation fault on pytest exit when creating a SerializableObjectWithMetadata #1474

ssteinbach opened this issue Nov 1, 2022 · 10 comments · May be fixed by #1490
Labels
bug A problem, flaw, or broken functionality.

Comments

@ssteinbach
Copy link
Collaborator

A user reported that there is a segmentation fault when running pytest + objects with metadata + the "fs" (fake filesystem) capture system.

Repro:

import opentimelineio as otio

# Does not segfault if you comment out this "test"
def test_empty_fs(fs):
    pass


def test_create_clip(capsys):
    # can be child classes of SerializableObjectWithMetadata (Clip, Gap, etc),
    # however other otio C++ classes appear to be ok (RationalTime, TimeRange,
    # TimeTransform)
    otio.core.SerializableObjectWithMetadata(
        # also does not segfault if you comment this out
        metadata={}
    )

Output:

❯ pytest otio_pytest_seg_fault.py --disable-warnings
============================================== test session starts ===============================================
platform darwin -- Python 3.8.13, pytest-7.2.0, pluggy-1.0.0
rootdir: /private/var/tmp
plugins: pyfakefs-5.0.0
collected 2 items

otio_pytest_seg_fault.py ..                                                                                [100%]

========================================== 2 passed, 1 warning in 0.09s ==========================================
zsh: segmentation fault  pytest otio_pytest_seg_fault.py --disable-warnings

(The warning is related to our use of imp and appears to be a red herring)

@ssteinbach ssteinbach added the bug A problem, flaw, or broken functionality. label Nov 1, 2022
@ssteinbach ssteinbach changed the title Segmentation fault on pytest exit when creating a clip with metadata Segmentation fault on pytest exit when creating a SerializableObjectWithMetadata Nov 1, 2022
@JeanChristopheMorinPerso
Copy link
Member

When OTIO is built in debug mode, we get a little bit more details:

Fatal Python error: PyThreadState_Get: the function must be called with the GIL held, but the GIL is released (the current Python thread state is NULL)
Python runtime state: finalizing (tstate=0x000055649b90a060)

Aborted (core dumped)

@JeanChristopheMorinPerso
Copy link
Member

The fact that it crashes it metadata={} is given as an argument kind of suggests me it has to do with the round-trip in https://github.com/AcademySoftwareFoundation/OpenTimelineIO/blob/main/src/py-opentimelineio/opentimelineio-bindings/otio_utils.cpp#L99-L121. This part of the code is pretty painful to debug...

@JeanChristopheMorinPerso
Copy link
Member

JeanChristopheMorinPerso commented Nov 1, 2022

(gdb) bt
#0  __pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0) at pthread_kill.c:44
#1  0x00007ffff7e3d6b3 in __pthread_kill_internal (signo=6, threadid=<optimized out>) at pthread_kill.c:78
#2  0x00007ffff7ded958 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#3  0x00007ffff7dd753d in __GI_abort () at abort.c:79
#4  0x00007ffff7af561f in fatal_error_exit (status=-1) at Python/pylifecycle.c:2553
#5  fatal_error (fd=<optimized out>, header=header@entry=1, prefix=prefix@entry=0x7ffff7ce76d0 <__func__.6.lto_priv.2> "PyThreadState_Get", 
    msg=msg@entry=0x7ffff7c8e200 "the function must be called with the GIL held, but the GIL is released (the current Python thread state is NULL)", status=status@entry=-1)
    at Python/pylifecycle.c:2664
#6  0x00007ffff7af59fe in _Py_FatalErrorFunc (func=0x7ffff7ce76d0 <__func__.6.lto_priv.2> "PyThreadState_Get", 
    msg=0x7ffff7c8e200 "the function must be called with the GIL held, but the GIL is released (the current Python thread state is NULL)") at Python/pylifecycle.c:2750
#7  0x00007ffff7af5a51 in _Py_FatalError_TstateNULL (func=<optimized out>) at Python/ceval.c:271
#8  0x00007ffff7a63c30 in _Py_EnsureFuncTstateNotNULL (tstate=0x0, func=0x7ffff7ce76d0 <__func__.6.lto_priv.2> "PyThreadState_Get") at ./Include/internal/pycore_pystate.h:95
#9  PyThreadState_Get () at Python/pystate.c:1015
#10 tupledealloc (op=0x7ffff5f1be60) at Objects/tupleobject.c:272
#11 0x00007ffff7b57c1c in _Py_Dealloc (op=<optimized out>) at Objects/object.c:2300
#12 _Py_DECREF (op=<optimized out>) at ./Include/object.h:500
#13 _Py_XDECREF (op=<optimized out>) at ./Include/object.h:567
#14 code_dealloc (co=0x7ffff5f3f9f0) at Objects/codeobject.c:658
#15 0x00007ffff7b53557 in _Py_Dealloc (op=<optimized out>) at Objects/object.c:2300
#16 _Py_DECREF (op=<optimized out>) at ./Include/object.h:500
#17 func_clear (op=0x7ffff5f1e710) at Objects/funcobject.c:626
#18 func_dealloc (op=0x7ffff5f1e710) at Objects/funcobject.c:648
#19 0x00007ffff581a6a5 in _Py_DECREF (op=0x7ffff5f1e710) at /usr/include/python3.10/object.h:500
#20 0x00007ffff581bbca in _Py_XDECREF (op=0x7ffff5f1e710) at /usr/include/python3.10/object.h:567
#21 0x00007ffff581cfcf in pybind11::handle::dec_ref() const & (this=0x7ffff5b0a130 <_value_to_any>)
    at /home/jcmorin/jcmenv/aswf/OpenTimelineIO/src/deps/pybind11/include/pybind11/pytypes.h:260
#22 0x00007ffff581d0b0 in pybind11::object::~object (this=0x7ffff5b0a130 <_value_to_any>, __in_chrg=<optimized out>)
    at /home/jcmorin/jcmenv/aswf/OpenTimelineIO/src/deps/pybind11/include/pybind11/pytypes.h:323
#23 0x00007ffff7deffa5 in __run_exit_handlers (status=status@entry=0, listp=0x7ffff7f8d760 <__exit_funcs>, run_list_atexit=run_list_atexit@entry=true, run_dtors=run_dtors@entry=true)
    at exit.c:113
#24 0x00007ffff7df0120 in __GI_exit (status=status@entry=0) at exit.c:143
#25 0x00007ffff7aa8699 in Py_Exit (sts=0) at Python/pylifecycle.c:2862
#26 0x00007ffff7a9f910 in handle_system_exit () at Python/pythonrun.c:775
#27 0x00007ffff7c1861c in _PyErr_PrintEx (set_sys_last_vars=1, tstate=0x55555555ed50) at Python/pythonrun.c:785
#28 PyErr_PrintEx (set_sys_last_vars=1) at Python/pythonrun.c:880
#29 0x00007ffff7aa10e7 in PyErr_Print () at Python/pythonrun.c:886
#30 _PyRun_SimpleFileObject (fp=<optimized out>, filename=<optimized out>, closeit=<optimized out>, flags=0x7fffffffe308) at Python/pythonrun.c:462
#31 0x00007ffff7aa1a6e in _PyRun_AnyFileObject (fp=0x55555555a470, filename=0x7ffff74424f0, closeit=1, flags=0x7fffffffe308) at Python/pythonrun.c:90
#32 0x00007ffff7c0ea6d in pymain_run_file_obj (skip_source_first_line=0, filename=0x7ffff74424f0, program_name=0x7ffff74b4810) at Modules/main.c:353
#33 pymain_run_file (config=0x5555555855a0) at Modules/main.c:372
#34 pymain_run_python (exitcode=0x7fffffffe304) at Modules/main.c:587
#35 Py_RunMain () at Modules/main.c:666
#36 0x00007ffff7be3c9b in Py_BytesMain (argc=<optimized out>, argv=<optimized out>) at Modules/main.c:720
#37 0x00007ffff7dd8290 in __libc_start_call_main (main=main@entry=0x555555555120 <main>, argc=argc@entry=3, argv=argv@entry=0x7fffffffe538) at ../sysdeps/nptl/libc_start_call_main.h:58
#38 0x00007ffff7dd834a in __libc_start_main_impl (main=0x555555555120 <main>, argc=3, argv=0x7fffffffe538, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, 
    stack_end=0x7fffffffe528) at ../csu/libc-start.c:381
#39 0x0000555555555045 in _start () at ../sysdeps/x86_64/start.S:115

Alright, with GDB atatched, we get this. The interesting part of that backtrace is:

#21 0x00007ffff581cfcf in pybind11::handle::dec_ref() const & (this=0x7ffff5b0a130 <_value_to_any>)
    at /home/jcmorin/jcmenv/aswf/OpenTimelineIO/src/deps/pybind11/include/pybind11/pytypes.h:260
#22 0x00007ffff581d0b0 in pybind11::object::~object (this=0x7ffff5b0a130 <_value_to_any>, __in_chrg=<optimized out>)
    at /home/jcmorin/jcmenv/aswf/OpenTimelineIO/src/deps/pybind11/include/pybind11/pytypes.h:323

Notice the _value_to_any, which confirms my suspicion, or it's at least the region of code I highlighted in my previous reply. _value_to_any is the python function that is called from the C+++ side to convert the python type given as the metadata argument into a PyANy (Python) and then returned which then converts the PyAny into a C++ PyAny (automatically by pybind11).

@JeanChristopheMorinPerso
Copy link
Member

Some more information:

https://github.com/AcademySoftwareFoundation/OpenTimelineIO/blob/main/src/py-opentimelineio/opentimelineio/core/_core_utils.py#L61 is executed properly, which means that at least opentimelineio.core._core_utils._value_to_any was called and it returns.

Also found pytest-dev/pyfakefs#693 which kind of means that pyfakefs can break stuff. Looking at the pyfakefs code, it seems extremely intrusive.

@JeanChristopheMorinPerso
Copy link
Member

JeanChristopheMorinPerso commented Nov 2, 2022

By doing a small change, we can verify if _value_to_any actually returns something or it it crashes while converting the return value:

diff --git a/src/py-opentimelineio/opentimelineio-bindings/otio_serializableObjects.cpp b/src/py-opentimelineio/opentimelineio-bindings/otio_serializableObjects.cpp
index b3338eb..4f37429 100644
--- a/src/py-opentimelineio/opentimelineio-bindings/otio_serializableObjects.cpp
+++ b/src/py-opentimelineio/opentimelineio-bindings/otio_serializableObjects.cpp
@@ -188,7 +188,8 @@ static void define_bases1(py::module m) {
     py::class_<SOWithMetadata, SerializableObject,
                managing_ptr<SOWithMetadata>>(m, "SerializableObjectWithMetadata", py::dynamic_attr())
         .def(py::init([](std::string name, py::object metadata) {
-                    return new SOWithMetadata(name, py_to_any_dictionary(metadata));
+                    auto asd = py_to_any_dictionary(metadata);
+                    return new SOWithMetadata(name, asd);
                 }),
             py::arg_v("name"_a = std::string()),
             py::arg_v("metadata"_a = py::none()))
diff --git a/src/py-opentimelineio/opentimelineio-bindings/otio_utils.cpp b/src/py-opentimelineio/opentimelineio-bindings/otio_utils.cpp
index c327374..55ed8ea 100644
--- a/src/py-opentimelineio/opentimelineio-bindings/otio_utils.cpp
+++ b/src/py-opentimelineio/opentimelineio-bindings/otio_utils.cpp
@@ -102,7 +102,8 @@ static void py_to_any(py::object const& o, any* result) {
         _value_to_any = core.attr("_value_to_any");
     }
 
-    result->swap(_value_to_any(o).cast<PyAny*>()->a);
+    py::object tmp = _value_to_any(o);
+    result->swap(tmp.cast<PyAny*>()->a);
 }
 
 AnyDictionary py_to_any_dictionary(py::object const& o) {

Then in GDB:

(gdb) b otio_utils.cpp:105
(gdb) b otio_utils.cpp:106
(gdb) b otio_utils.cpp:121
(gdb) b otio_serializableObjects.cpp:192
(gdb) run .venv/bin/pytest tests/test_asd.py
Starting program: /home/jcmorin/jcmenv/aswf/OpenTimelineIO/.venv/bin/python .venv/bin/pytest tests/test_asd.py
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/usr/lib/libthread_db.so.1".
================================================================================== test session starts ==================================================================================
platform linux -- Python 3.10.8, pytest-7.2.0, pluggy-1.0.0
rootdir: /home/jcmorin/jcmenv/aswf/OpenTimelineIO
plugins: cov-4.0.0, pyfakefs-5.0.0
collected 2 items                                                                                                                                                                       

tests/test_asd.py .
Breakpoint 1, py_to_any (o=..., result=0x7fffffff9400) at /home/jcmorin/jcmenv/aswf/OpenTimelineIO/src/py-opentimelineio/opentimelineio-bindings/otio_utils.cpp:105
105	    py::object tmp = _value_to_any(o);
(gdb) l
100	    if (_value_to_any.is_none()) {
101	        py::object core = py::module::import("opentimelineio.core");
102	        _value_to_any = core.attr("_value_to_any");
103	    }
104	
105	    py::object tmp = _value_to_any(o);
106	    result->swap(tmp.cast<PyAny*>()->a);
107	}
108	
109	AnyDictionary py_to_any_dictionary(py::object const& o) {
(gdb) c
Continuing.

Breakpoint 2, py_to_any (o=..., result=0x7fffffff9400) at /home/jcmorin/jcmenv/aswf/OpenTimelineIO/src/py-opentimelineio/opentimelineio-bindings/otio_utils.cpp:106
106	    result->swap(tmp.cast<PyAny*>()->a);
(gdb) l
101	        py::object core = py::module::import("opentimelineio.core");
102	        _value_to_any = core.attr("_value_to_any");
103	    }
104	
105	    py::object tmp = _value_to_any(o);
106	    result->swap(tmp.cast<PyAny*>()->a);
107	}
108	
109	AnyDictionary py_to_any_dictionary(py::object const& o) {
110	    if (o.is_none()) {
(gdb) c
Continuing.

Breakpoint 3, py_to_any_dictionary (o=...) at /home/jcmorin/jcmenv/aswf/OpenTimelineIO/src/py-opentimelineio/opentimelineio-bindings/otio_utils.cpp:121
121	    return safely_cast_any_dictionary_any(a);
(gdb) l
116	    if (!compare_typeids(a.type(), typeid(AnyDictionary))) {
117	        throw py::type_error(string_printf("Expected an AnyDictionary (i.e. metadata); got %s instead",
118	                                           type_name_for_error_message(a).c_str()));
119	    }
120	
121	    return safely_cast_any_dictionary_any(a);
122	}
123	
124	py::object any_to_py(any const& a, bool top_level) {
125	    std::type_info const& tInfo = a.type();
(gdb) c
Continuing.

Breakpoint 4, operator() (__closure=0x5555559f63d8, name="", metadata=...) at /home/jcmorin/jcmenv/aswf/OpenTimelineIO/src/py-opentimelineio/opentimelineio-bindings/otio_serializableObjects.cpp:192
192	                    return new SOWithMetadata(name, asd);
(gdb) l
187	
188	    py::class_<SOWithMetadata, SerializableObject,
189	               managing_ptr<SOWithMetadata>>(m, "SerializableObjectWithMetadata", py::dynamic_attr())
190	        .def(py::init([](std::string name, py::object metadata) {
191	                    auto asd = py_to_any_dictionary(metadata);
192	                    return new SOWithMetadata(name, asd);
193	                }),
194	            py::arg_v("name"_a = std::string()),
195	            py::arg_v("metadata"_a = py::none()))
196	        .def_property_readonly("metadata", [](SOWithMetadata* s) {

So _value_to_any returns a py::object as expected and the cast works, then safely_cast_any_dictionary_any is called and seems to be working. Then the breakpoint at line 192 is hit. So it either crashes while creating SOWithMetadata or hile returning SOWithMetadata. Or something like that.

@mrbean-bremen
Copy link

Stumbled over this issue today, and created a respective issue in pyfakefs.
I cannot promise a fix so far (have to understand the problem first), but it seems to have something to with the caching of patched modules. Clearing the fs cache before running the problematic code can serve as an (ugly) workaround at the moment:

def test_create_clip(capsys, fs):
    fs.clear_cache()
    otio.core.SerializableObjectWithMetadata(
        metadata={}
    )

@jminor
Copy link
Collaborator

jminor commented Aug 15, 2023

Thanks! We will test this workaround to see if it helps.

@mrbean-bremen
Copy link

Alternatively, you could use a fixture without caching instead of fs, e.g.:

import pytest
from pyfakefs.fake_filesystem_unittest import Patcher

@pytest.fixture
def fs_no_cache():
    with Patcher(use_cache=False) as patcher:
        yield patcher.fs

...
def test_empty_fs(fs_no_cache):
    pass

Caching is a performance feature that is only relevant if you run a larger number of tests with fs.
This is the first time I see a side-effect from caching, but I had added the possibility to switch it off at the time just in case...

@mrbean-bremen
Copy link

mrbean-bremen commented Aug 18, 2023

Ok, I've made a patch release, please check if your tests work with the current version!

@JeanChristopheMorinPerso
Copy link
Member

Nice, thanks @mrbean-bremen !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug A problem, flaw, or broken functionality.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants