Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG]: Segmentation Fault 11 w/ Conda + Pybind11 #3907

Closed
3 tasks done
coreyjadams opened this issue Apr 26, 2022 · 5 comments
Closed
3 tasks done

[BUG]: Segmentation Fault 11 w/ Conda + Pybind11 #3907

coreyjadams opened this issue Apr 26, 2022 · 5 comments
Labels
triage New bug, unverified

Comments

@coreyjadams
Copy link

coreyjadams commented Apr 26, 2022

Required prerequisites

Problem description

I have a segmentation fault on macos that only appears using the conda builds of python. I haven't been able to solve this one myself, sorry.

In short: When using the package I've built with pybind11, I can not import the libraries from python without a segfault. I've verified this with python 3.6, 3.9, 3.10, and using the latest version of pybind11. I have a stand-alone repository that reproduces this bug.

Here is the stack track when running with lldb, it appears to be related to take_gil

>>> import larcv
Process 24818 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x10)
    frame #0: 0x00000001041afc17 libpython3.10.dylib`take_gil + 71
libpython3.10.dylib`take_gil:
->  0x1041afc17 <+71>: movq   0x10(%rax), %r13
    0x1041afc1b <+75>: leaq   0x1b0(%r13), %r12
    0x1041afc22 <+82>: movq   %r12, %rdi
    0x1041afc25 <+85>: callq  0x1042e1212               ; symbol stub for: pthread_mutex_lock
Target 0: (python) stopped.
(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x10)
  * frame #0: 0x00000001041afc17 libpython3.10.dylib`take_gil + 71
    frame #1: 0x0000000104226230 libpython3.10.dylib`PyGILState_Ensure + 48
    frame #2: 0x0000000101e495df pylarcv.cpython-310-darwin.so`___lldb_unnamed_symbol1$$pylarcv.cpython-310-darwin.so + 63
    frame #3: 0x0000000101e490a6 pylarcv.cpython-310-darwin.so`PyInit_pylarcv + 118
    frame #4: 0x00000001001fd17e python`_imp_create_dynamic + 1486
    frame #5: 0x00000001000e75a5 python`cfunction_vectorcall_FASTCALL + 85
    frame #6: 0x00000001001b2b9a python`_PyEval_EvalFrameDefault + 2986
    frame #7: 0x00000001001b0588 python`_PyEval_Vector + 376
    frame #8: 0x00000001001c16ee python`call_function + 174
    frame #9: 0x00000001001b8fec python`_PyEval_EvalFrameDefault + 28668
    frame #10: 0x00000001001b0588 python`_PyEval_Vector + 376
    frame #11: 0x00000001001c16ee python`call_function + 174
    frame #12: 0x00000001001b79b2 python`_PyEval_EvalFrameDefault + 22978
    frame #13: 0x00000001001b0588 python`_PyEval_Vector + 376
    frame #14: 0x00000001001c16ee python`call_function + 174
    frame #15: 0x00000001001b6cd3 python`_PyEval_EvalFrameDefault + 19683
    frame #16: 0x00000001001b0588 python`_PyEval_Vector + 376
    frame #17: 0x00000001001c16ee python`call_function + 174
    frame #18: 0x00000001001b6cd3 python`_PyEval_EvalFrameDefault + 19683
    frame #19: 0x00000001001b0588 python`_PyEval_Vector + 376
    frame #20: 0x00000001001c16ee python`call_function + 174
    frame #21: 0x00000001001b6cd3 python`_PyEval_EvalFrameDefault + 19683
    frame #22: 0x00000001001b0588 python`_PyEval_Vector + 376
    frame #23: 0x000000010008577b python`object_vacall + 427
    frame #24: 0x0000000100085a29 python`_PyObject_CallMethodIdObjArgs + 249
    frame #25: 0x00000001001f8a64 python`PyImport_ImportModuleLevelObject + 3076
    frame #26: 0x00000001001b8410 python`_PyEval_EvalFrameDefault + 25632
    frame #27: 0x00000001001b0588 python`_PyEval_Vector + 376
    frame #28: 0x00000001001aa979 python`builtin_exec + 345
    frame #29: 0x00000001000e75a5 python`cfunction_vectorcall_FASTCALL + 85
    frame #30: 0x00000001001b2b9a python`_PyEval_EvalFrameDefault + 2986
    frame #31: 0x00000001001b0588 python`_PyEval_Vector + 376
    frame #32: 0x00000001001c16ee python`call_function + 174
    frame #33: 0x00000001001b8fec python`_PyEval_EvalFrameDefault + 28668
    frame #34: 0x00000001001b0588 python`_PyEval_Vector + 376
    frame #35: 0x00000001001c16ee python`call_function + 174
    frame #36: 0x00000001001b79b2 python`_PyEval_EvalFrameDefault + 22978
    frame #37: 0x00000001001b0588 python`_PyEval_Vector + 376
    frame #38: 0x00000001001c16ee python`call_function + 174
    frame #39: 0x00000001001b6cd3 python`_PyEval_EvalFrameDefault + 19683
    frame #40: 0x00000001001b0588 python`_PyEval_Vector + 376
    frame #41: 0x00000001001c16ee python`call_function + 174
    frame #42: 0x00000001001b6cd3 python`_PyEval_EvalFrameDefault + 19683
    frame #43: 0x00000001001b0588 python`_PyEval_Vector + 376
    frame #44: 0x000000010008577b python`object_vacall + 427
    frame #45: 0x0000000100085a29 python`_PyObject_CallMethodIdObjArgs + 249
    frame #46: 0x00000001001f8a64 python`PyImport_ImportModuleLevelObject + 3076
    frame #47: 0x00000001001b8410 python`_PyEval_EvalFrameDefault + 25632
    frame #48: 0x00000001001b0588 python`_PyEval_Vector + 376
    frame #49: 0x00000001002277a9 python`PyRun_InteractiveOneObjectEx + 1049
    frame #50: 0x000000010022640a python`_PyRun_InteractiveLoopObject + 122
    frame #51: 0x0000000100225cbf python`_PyRun_AnyFileObject + 63
    frame #52: 0x000000010022a106 python`PyRun_AnyFileExFlags + 118
    frame #53: 0x0000000100250f2f python`pymain_run_stdin + 175
    frame #54: 0x000000010025057d python`pymain_run_python + 509
    frame #55: 0x0000000100250335 python`Py_RunMain + 37
    frame #56: 0x0000000100251910 python`pymain_main + 64
    frame #57: 0x00000001000026d8 python`main + 56
    frame #58: 0x000000010049a51e dyld`start + 462

Reproducible example code

This repository can reproduce the bug.  Sorry if you wanted something smaller, this is about as small as I can make it, and it is nearly stand alone - obviously, you need conda to run it...

[[email protected]:coreyjadams/larcv3-pybind11-example.git]([email protected]:coreyjadams/larcv3-pybind11-example.git)

To replicate the bug, you need to be on Mac OS (I am on Monteray, the latest) and using miniconda.  I created an environment for each test I did:


conda create -n test-env-python-3.10 # Accept any questions, etc
conda activate test-env-python-3.10 # Activate the environment
conda install python=3.10 cmake scikit-build # The dependencies are just build systems.


Then, after cloning the repository I linked above, one can do:
```bash
git submodule update --init # pybind11 is a submodule here
python setup.py build # Trigger scikit-build to run cmake
python setup.py install

From a different directory (otherwise, it tries to import the larcv folder in the repo), do:

>>> import larcv

And it ought to reproduce the crash.

@coreyjadams coreyjadams added the triage New bug, unverified label Apr 26, 2022
@henryiii
Copy link
Collaborator

Conda doesn't support building from python, only from Conda-build. You are likely mixing the system compilers and the conda compilers, causing the crash. Try conda install compilers - that might get it to use the conda compilers (make sure you remove any caching, like _skbuild).

@wolfv
Copy link

wolfv commented Jun 29, 2022

I do see this issue as well on macOS x64 -- but I am pretty sure I am using the conda compilers :)

I tried to add -undefined dynamic_lookup which helped in the past, and I tried to remove the CMAKE_STRIP step, but none of that helped so far. Will investigate further.

It's failing for us for rclpy which is a dependency of ROS, the robot operating system. Same exact error.

@wolfv
Copy link

wolfv commented Jun 29, 2022

Hm, I managed to replicate the issue with your example larcv code.
The problem seems to boil down to not explicitly link Python in the lower level libraries (or anywhere) and to trust "-undefined dynamic_lookup".

I've added

set_target_properties(larcv3 PROPERTIES
                      LINK_FLAGS "-undefined dynamic_lookup")

and removed any instances of linking to ${Python_LIBRARIES} and things then seem to work. I think the pybind11_add_module automatically sets that linker flag already.

@coreyjadams
Copy link
Author

@wolfv thanks for this tip! I will test it out tomorrow and get back to you, that'd be awesome to have this resolved.

@wolfv
Copy link

wolfv commented Jun 29, 2022

In my case, pybind11_add_module(blabla SHARED ...) did not work, however pybind11_add_module(blabla MODULE ...) works.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
triage New bug, unverified
Projects
None yet
Development

No branches or pull requests

3 participants