Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TensorRT C++ Example Error #322

Closed
AnnetGdd opened this issue Feb 17, 2022 · 11 comments · Fixed by #326
Closed

TensorRT C++ Example Error #322

AnnetGdd opened this issue Feb 17, 2022 · 11 comments · Fixed by #326
Labels
documentation Improvements or additions to documentation

Comments

@AnnetGdd
Copy link

🐛 Describe the bug

When running the C++ TRT example, using
./yolort_trt --image ../../../test/assets/zidane.jpg --model_path ../../../../yolov5n6.onnx --class_names ../../../notebooks/assets/coco.names
I get the following error:

Platform:
DLACores: 0
INT8: YES
FP16: YES
onnx2trt_utils.cpp:366: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
onnx2trt_utils.cpp:392: One or more weights outside the range of INT32 was clamped
onnx2trt_utils.cpp:392: One or more weights outside the range of INT32 was clamped
onnx2trt_utils.cpp:392: One or more weights outside the range of INT32 was clamped
onnx2trt_utils.cpp:392: One or more weights outside the range of INT32 was clamped
onnx2trt_utils.cpp:392: One or more weights outside the range of INT32 was clamped
Inference data type: FP32.
4: [pluginV2Builder.cpp::makeRunner::476] Error Code 4: Internal Error (Internal error: plugin node batched_nms requires 48960768 bytes of scratch space, but only 41943040 is available. Try increasing the workspace size with IBuilderConfig::setMaxWorkspaceSize().
)
2: [builder.cpp::buildSerializedNetwork::609] Error Code 2: Internal Error (Assertion enginePtr != nullptr failed. )
buildSerializedNetwork fail!
Segmentation fault (core dumped)

Also, in the README it says we should generate the TRT engine file before-hand but the example code converts an onnx model to an engine file, could you please confirm what the correct procedure is? If we wanted to use the engine file directly created from the python example tutorial with the C++ example would that be possible?

Lastly, is this part from the README
from yolort.runtime.yolo_graphsurgeon import YOLOGraphSurgeon
still supported?
Thanks.

Versions

PyTorch version: 1.10.2+cu113
Is debug build: False
CUDA used to build PyTorch: 11.3
ROCM used to build PyTorch: N/A

OS: Ubuntu 20.04.3 LTS (x86_64)
GCC version: (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
Clang version: Could not collect
CMake version: version 3.16.3
Libc version: glibc-2.31

Python version: 3.8.10 (default, Nov 26 2021, 20:14:08) [GCC 9.3.0] (64-bit runtime)
Python platform: Linux-5.13.0-28-generic-x86_64-with-glibc2.29
Is CUDA available: True
CUDA runtime version: 11.4.152
GPU models and configuration: GPU 0: NVIDIA RTX A4000 Laptop GPU
Nvidia driver version: 470.103.01
cuDNN version: Probably one of the following:
/usr/lib/x86_64-linux-gnu/libcudnn.so.8.2.1
/usr/lib/x86_64-linux-gnu/libcudnn_adv_infer.so.8.2.1
/usr/lib/x86_64-linux-gnu/libcudnn_adv_train.so.8.2.1
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8.2.1
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_train.so.8.2.1
/usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8.2.1
/usr/lib/x86_64-linux-gnu/libcudnn_ops_train.so.8.2.1
HIP runtime version: N/A
MIOpen runtime version: N/A

Versions of relevant libraries:
[pip3] mypy-extensions==0.4.3
[pip3] numpy==1.22.2
[pip3] pytorch-lightning==1.5.10
[pip3] torch==1.10.2+cu113
[pip3] torchaudio==0.10.2+cu113
[pip3] torchmetrics==0.7.2
[pip3] torchvision==0.11.3+cu113
[conda] Could not collect

@AnnetGdd AnnetGdd changed the title TensorRT C++ example Error TensorRT C++ Example Error Feb 17, 2022
@AnnetGdd
Copy link
Author

AnnetGdd commented Feb 17, 2022

Update: I was able to solve this problem by modifying this line in the code:
config->setMaxWorkspaceSize(40 * (1U << 20));
to
config->setMaxWorkspaceSize(40 * (1U << 25));

@AnnetGdd
Copy link
Author

Would you be able to provide some guidance as to how to bypass the onnx->trt conversion portion of the example and load from the engine file directly for inference. It seems that I should replace the CreateCudaEngineFromOnnx functions with the engine file? Any help would be appreciated.

@zhiqwang
Copy link
Owner

zhiqwang commented Feb 17, 2022

Hi @AnnetGdd , thanks for reporting this issues to us.

4: [pluginV2Builder.cpp::makeRunner::476] Error Code 4: Internal Error (Internal error: plugin node batched_nms requires 48960768 bytes of scratch space, but only 41943040 is available. Try increasing the workspace size with IBuilderConfig::setMaxWorkspaceSize().

We should increase the workspace to 60 at least in that case, and maybe we should set this value as a command line parameter to make it easier to debug
https://github.com/zhiqwang/yolov5-rt-stack/blob/345a77e2f7430196993635a757931b18cde92bb8/deployment/tensorrt/main.cpp#L220

Also, in the README it says we should generate the TRT engine file before-hand but the example code converts an onnx model to an engine file, could you please confirm what the correct procedure is? If we wanted to use the engine file directly created from the python example tutorial with the C++ example would that be possible?

Yes, the docs is a little outdated, and we will fix this

Lastly, is this part from the README from yolort.runtime.yolo_graphsurgeon import YOLOGraphSurgeon still supported?

We rename it to YOLOTRTGraphSurgeon in #312 , We expect the functions in the yolort.relay directory to be more generic. We will provide a CLI tool to export the ONNX and TRT Engine models in a follow up PR, and we hope to release 0.6.0 this week, at that time this interface will be determined.
https://github.com/zhiqwang/yolov5-rt-stack/blob/345a77e2f7430196993635a757931b18cde92bb8/yolort/relay/trt_graphsurgeon.py#L27

@zhiqwang
Copy link
Owner

Would you be able to provide some guidance as to how to bypass the onnx->trt conversion portion of the example and load from the engine file directly for inference. It seems that I should replace the CreateCudaEngineFromOnnx functions with the engine file? Any help would be appreciated.

Hi @AnnetGdd , We're fixing this docs.

@zhiqwang
Copy link
Owner

zhiqwang commented Feb 17, 2022

Update: I was able to solve this problem by modifying this line in the code:
config->setMaxWorkspaceSize(40 * (1U << 20));
to
config->setMaxWorkspaceSize(40 * (1U << 25));

Yep, that's the key to solve the workspace size problem. By the way, we fixed a bug in pre-processing #321, make sure you're using the latest code.

@zhiqwang zhiqwang added the documentation Improvements or additions to documentation label Feb 17, 2022
@zhiqwang
Copy link
Owner

zhiqwang commented Feb 17, 2022

Hi @AnnetGdd

Also, in the README it says we should generate the TRT engine file before-hand but the example code converts an onnx model to an engine file, could you please confirm what the correct procedure is? If we wanted to use the engine file directly created from the python example tutorial with the C++ example would that be possible?

Would you be able to provide some guidance as to how to bypass the onnx->trt conversion portion of the example and load from the engine file directly for inference. It seems that I should replace the CreateCudaEngineFromOnnx functions with the engine file? Any help would be appreciated.

We support loading with the serialized TRT engine model in #323 , now the yolort_trt will determine if it needs to build the serialized engine file from ONNX based on the file suffix, and only do serialization when the argument --model_path given are with .onnx suffixes, all other suffixes are treated as the serialized engine.

@AnnetGdd
Copy link
Author

AnnetGdd commented Feb 17, 2022

Thank you very much for the detailed responses, I was able to successfully run the example using an engine file.
Side note: I had to add #include <fstream> in the main file to get rid of the following error when building:
error: variable ‘std::ifstream infile’ has initializer but incomplete type

Also I was curious to know if there is a minimum TRT version requirement? I am currently using 8.2.3.0 and it works fine, but just wanted to make clarify - and are there any plans to update PyPI installation to include newer changes? Thanks!

@zhiqwang
Copy link
Owner

zhiqwang commented Feb 18, 2022

Side note: I had to add #include <fstream> in the main file to get rid of the following error when building: error: variable ‘std::ifstream infile’ has initializer but incomplete type

Thanks @AnnetGdd for reporting this issue, we'll test this. UPDATED: Done as suggested in #324 .

Also I was curious to know if there is a minimum TRT version requirement? I am currently using 8.2.3.0 and it works fine, but just wanted to make clarify.

The minimal TRT version is 8.2, because The EfficientNMS_plugin we rely on are introduced in TensorRT 8.2. check out this https://zhiqwang.com/yolov5-rt-stack/notebooks/onnx-graphsurgeon-inference-tensorrt.html#TensorRT-Installation-Instructions for more details.

Are there any plans to update PyPI installation to include newer changes?

Yep, we plan to release 0.6.0 for recent changes at this week.

@zhiqwang
Copy link
Owner

Hi @AnnetGdd , We will provide a Python CLI tools to export the TensorRT serialized engine in #326 ,

python tools/export_model.py --checkpoint_path [path/to/your/best.pt] --include engine

And we will update the docs for using TensorRT C++ interface. as such I'm closing this ticket, feel free to create another tickets if you have more questions.

@zhiqwang
Copy link
Owner

Just FYI @AnnetGdd ,

yolort 0.6.0 is released, try to pip install yolort==0.6.0 to use TensorRT conversion of YOLOv5 with yolort directly on PyPI !

@AnnetGdd
Copy link
Author

Thank you very much, will try it out!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants