-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TensorRT crashes on Windows unless it is the first imported module #2853
Comments
@oxana-nvidia ^ ^ |
also @pranavm-nvidia may know more about this, I have a vague memory that this could be caused by pycuda or something, so the import order does matter. |
We tested TensorRT 8.6 with up to PyTorch 1.13. Please see release notes https://docs.nvidia.com/deeplearning/tensorrt/release-notes/index.html#rel-8-6-0-EA For your example, you can try to replace
with
|
This issue also happens with PyTorch 1.13. |
I'm seeing a strange behavior that looks related to this. I have a completely unrelated pybind11 module that crashes the python process when trying to throw an Exception. But only if the |
@pwuertz any repro steps you can provide so we can test it on our side? |
@oxana-nvidia Yes, it's fairly simple. My pybind11 module is import tensorrt
import dioptic.profileparser
dioptic.profileparser.Profile("syntax error") # Crash import dioptic.profileparser
import tensorrt
dioptic.profileparser.Profile("syntax error") # Crash import dioptic.profileparser
dioptic.profileparser.Profile("syntax error") # Ok, raises `RuntimeError` |
Thanks for provided repro. |
Oh, it's a fundamental bug in pybind11 isn't it? Pybind11 is using a global C++ data-structure for exception handling, and it is shared across all pybind11-based modules regardless of compiler or standard-lib version. What we are seeing is probably an ABI induced crash/corruption. |
Confirmed, the problem is fixed by preventing global data sharing between multiple pybind11 modules, which pybind11 does by default for some reason. The workaround is to make sure that Here is a diff for TensorRT diff --git a/python/CMakeLists.txt b/python/CMakeLists.txt
index 35ae486d..4dcb775d 100644
--- a/python/CMakeLists.txt
+++ b/python/CMakeLists.txt
@@ -113,6 +113,12 @@ message(STATUS "PY_CONFIG_INCLUDE: ${PY_CONFIG_INCLUDE}")
include_directories(${TENSORRT_ROOT}/include ${PROJECT_SOURCE_DIR}/include ${CUDA_INCLUDE_DIRS} ${PROJECT_SOURCE_DIR}/docstrings ${ONNX_INC_DIR} ${PYBIND11_DIR})
link_directories(${TENSORRT_LIBPATH})
+if (MSVC)
+ # Prevent pybind11 from sharing resources with other, potentially ABI incompatible modules
+ # https://github.com/pybind/pybind11/issues/2898
+ add_definitions(-DPYBIND11_COMPILER_TYPE="_${PROJECT_NAME}_abi")
+endif()
+
if (MSVC)
message(STATUS "include_dirs: ${MSVC_COMPILER_DIR}/include ${MSVC_COMPILER_DIR}/../ucrt/include ${NV_WDKSDK_INC}/um ${NV_WDKSDK_INC}/shared")
message(STATUS "link dirs: ${PY_LIB_DIR} ${NV_WDKSDK_LIB}/um/x64 ${MSVC_COMPILER_DIR}/lib/amd64 ${MSVC_COMPILER_DIR}/../ucrt/lib/x64") |
Thanks for provided solution! We will verify it and add to the next release if no issues. |
The original issue has been fixed, so close this issue. |
Description
The exception mechanism in pybind11 causes a crash in TensorRT if its not the first module imported.
If another module throws an exception than it will cause tensorRT to crash.
This issue seems similar to this one onnx/onnx#3493 but I was not able to build TensorRT with debug symbols so I can't be sure.
Environment
TensorRT Version: TensorRT-8.6.0.12
NVIDIA GPU: RTX 3060 Laptop GPU
NVIDIA Driver Version: 526.56
CUDA Version: 11.8
CUDNN Version: 8.5.0
Operating System: Windows 11
Python Version (if applicable): 3.9
Tensorflow Version (if applicable):
PyTorch Version (if applicable): 2.0
Baremetal or Container (if so, version):
Relevant Files
Steps To Reproduce
2 modules have this issues with TensorRT. One is torch, but I also created a small module https://github.com/mantaionut/python_example that has the same issue.
repro 1:
repro 2:
git clone https://github.com/mantaionut/python_example
cd python_example
pip install .
The text was updated successfully, but these errors were encountered: