TensorRT C++ Example Error #322

AnnetGdd · 2022-02-17T06:05:40Z

🐛 Describe the bug

When running the C++ TRT example, using
./yolort_trt --image ../../../test/assets/zidane.jpg --model_path ../../../../yolov5n6.onnx --class_names ../../../notebooks/assets/coco.names
I get the following error:

Platform:
DLACores: 0
INT8: YES
FP16: YES
onnx2trt_utils.cpp:366: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
onnx2trt_utils.cpp:392: One or more weights outside the range of INT32 was clamped
onnx2trt_utils.cpp:392: One or more weights outside the range of INT32 was clamped
onnx2trt_utils.cpp:392: One or more weights outside the range of INT32 was clamped
onnx2trt_utils.cpp:392: One or more weights outside the range of INT32 was clamped
onnx2trt_utils.cpp:392: One or more weights outside the range of INT32 was clamped
Inference data type: FP32.
4: [pluginV2Builder.cpp::makeRunner::476] Error Code 4: Internal Error (Internal error: plugin node batched_nms requires 48960768 bytes of scratch space, but only 41943040 is available. Try increasing the workspace size with IBuilderConfig::setMaxWorkspaceSize().
)
2: [builder.cpp::buildSerializedNetwork::609] Error Code 2: Internal Error (Assertion enginePtr != nullptr failed. )
buildSerializedNetwork fail!
Segmentation fault (core dumped)

Also, in the README it says we should generate the TRT engine file before-hand but the example code converts an onnx model to an engine file, could you please confirm what the correct procedure is? If we wanted to use the engine file directly created from the python example tutorial with the C++ example would that be possible?

Lastly, is this part from the README
from yolort.runtime.yolo_graphsurgeon import YOLOGraphSurgeon
still supported?
Thanks.

Versions

PyTorch version: 1.10.2+cu113
Is debug build: False
CUDA used to build PyTorch: 11.3
ROCM used to build PyTorch: N/A

OS: Ubuntu 20.04.3 LTS (x86_64)
GCC version: (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
Clang version: Could not collect
CMake version: version 3.16.3
Libc version: glibc-2.31

Python version: 3.8.10 (default, Nov 26 2021, 20:14:08) [GCC 9.3.0] (64-bit runtime)
Python platform: Linux-5.13.0-28-generic-x86_64-with-glibc2.29
Is CUDA available: True
CUDA runtime version: 11.4.152
GPU models and configuration: GPU 0: NVIDIA RTX A4000 Laptop GPU
Nvidia driver version: 470.103.01
cuDNN version: Probably one of the following:
/usr/lib/x86_64-linux-gnu/libcudnn.so.8.2.1
/usr/lib/x86_64-linux-gnu/libcudnn_adv_infer.so.8.2.1
/usr/lib/x86_64-linux-gnu/libcudnn_adv_train.so.8.2.1
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8.2.1
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_train.so.8.2.1
/usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8.2.1
/usr/lib/x86_64-linux-gnu/libcudnn_ops_train.so.8.2.1
HIP runtime version: N/A
MIOpen runtime version: N/A

Versions of relevant libraries:
[pip3] mypy-extensions==0.4.3
[pip3] numpy==1.22.2
[pip3] pytorch-lightning==1.5.10
[pip3] torch==1.10.2+cu113
[pip3] torchaudio==0.10.2+cu113
[pip3] torchmetrics==0.7.2
[pip3] torchvision==0.11.3+cu113
[conda] Could not collect

The text was updated successfully, but these errors were encountered:

AnnetGdd · 2022-02-17T06:16:56Z

Update: I was able to solve this problem by modifying this line in the code:
config->setMaxWorkspaceSize(40 * (1U << 20));
to
config->setMaxWorkspaceSize(40 * (1U << 25));

AnnetGdd · 2022-02-17T06:20:09Z

Would you be able to provide some guidance as to how to bypass the onnx->trt conversion portion of the example and load from the engine file directly for inference. It seems that I should replace the CreateCudaEngineFromOnnx functions with the engine file? Any help would be appreciated.

zhiqwang · 2022-02-17T06:33:25Z

Hi @AnnetGdd , thanks for reporting this issues to us.

4: [pluginV2Builder.cpp::makeRunner::476] Error Code 4: Internal Error (Internal error: plugin node batched_nms requires 48960768 bytes of scratch space, but only 41943040 is available. Try increasing the workspace size with IBuilderConfig::setMaxWorkspaceSize().

We should increase the workspace to 60 at least in that case, and maybe we should set this value as a command line parameter to make it easier to debug
https://github.com/zhiqwang/yolov5-rt-stack/blob/345a77e2f7430196993635a757931b18cde92bb8/deployment/tensorrt/main.cpp#L220

Also, in the README it says we should generate the TRT engine file before-hand but the example code converts an onnx model to an engine file, could you please confirm what the correct procedure is? If we wanted to use the engine file directly created from the python example tutorial with the C++ example would that be possible?

Yes, the docs is a little outdated, and we will fix this

Lastly, is this part from the README from yolort.runtime.yolo_graphsurgeon import YOLOGraphSurgeon still supported?

We rename it to YOLOTRTGraphSurgeon in #312 , We expect the functions in the yolort.relay directory to be more generic. We will provide a CLI tool to export the ONNX and TRT Engine models in a follow up PR, and we hope to release 0.6.0 this week, at that time this interface will be determined.
https://github.com/zhiqwang/yolov5-rt-stack/blob/345a77e2f7430196993635a757931b18cde92bb8/yolort/relay/trt_graphsurgeon.py#L27

zhiqwang · 2022-02-17T06:37:27Z

Would you be able to provide some guidance as to how to bypass the onnx->trt conversion portion of the example and load from the engine file directly for inference. It seems that I should replace the CreateCudaEngineFromOnnx functions with the engine file? Any help would be appreciated.

Hi @AnnetGdd , We're fixing this docs.

zhiqwang · 2022-02-17T06:39:12Z

Update: I was able to solve this problem by modifying this line in the code:
config->setMaxWorkspaceSize(40 * (1U << 20));
to
config->setMaxWorkspaceSize(40 * (1U << 25));

Yep, that's the key to solve the workspace size problem. By the way, we fixed a bug in pre-processing #321, make sure you're using the latest code.

zhiqwang · 2022-02-17T09:00:20Z

Hi @AnnetGdd

Also, in the README it says we should generate the TRT engine file before-hand but the example code converts an onnx model to an engine file, could you please confirm what the correct procedure is? If we wanted to use the engine file directly created from the python example tutorial with the C++ example would that be possible?

Would you be able to provide some guidance as to how to bypass the onnx->trt conversion portion of the example and load from the engine file directly for inference. It seems that I should replace the CreateCudaEngineFromOnnx functions with the engine file? Any help would be appreciated.

We support loading with the serialized TRT engine model in #323 , now the yolort_trt will determine if it needs to build the serialized engine file from ONNX based on the file suffix, and only do serialization when the argument --model_path given are with .onnx suffixes, all other suffixes are treated as the serialized engine.

AnnetGdd · 2022-02-17T22:59:01Z

Thank you very much for the detailed responses, I was able to successfully run the example using an engine file.
Side note: I had to add #include <fstream> in the main file to get rid of the following error when building:
error: variable ‘std::ifstream infile’ has initializer but incomplete type

Also I was curious to know if there is a minimum TRT version requirement? I am currently using 8.2.3.0 and it works fine, but just wanted to make clarify - and are there any plans to update PyPI installation to include newer changes? Thanks!

zhiqwang · 2022-02-18T01:21:14Z

Side note: I had to add #include <fstream> in the main file to get rid of the following error when building: error: variable ‘std::ifstream infile’ has initializer but incomplete type

Thanks @AnnetGdd for reporting this issue, we'll test this. UPDATED: Done as suggested in #324 .

Also I was curious to know if there is a minimum TRT version requirement? I am currently using 8.2.3.0 and it works fine, but just wanted to make clarify.

The minimal TRT version is 8.2, because The EfficientNMS_plugin we rely on are introduced in TensorRT 8.2. check out this https://zhiqwang.com/yolov5-rt-stack/notebooks/onnx-graphsurgeon-inference-tensorrt.html#TensorRT-Installation-Instructions for more details.

Are there any plans to update PyPI installation to include newer changes?

Yep, we plan to release 0.6.0 for recent changes at this week.

zhiqwang · 2022-02-18T05:53:39Z

Hi @AnnetGdd , We will provide a Python CLI tools to export the TensorRT serialized engine in #326 ,

python tools/export_model.py --checkpoint_path [path/to/your/best.pt] --include engine

And we will update the docs for using TensorRT C++ interface. as such I'm closing this ticket, feel free to create another tickets if you have more questions.

zhiqwang · 2022-02-19T10:38:25Z

Just FYI @AnnetGdd ,

yolort 0.6.0 is released, try to pip install yolort==0.6.0 to use TensorRT conversion of YOLOv5 with yolort directly on PyPI !

AnnetGdd · 2022-02-21T18:15:18Z

Thank you very much, will try it out!

AnnetGdd changed the title ~~TensorRT C++ example Error~~ TensorRT C++ Example Error Feb 17, 2022

zhiqwang added the documentation Improvements or additions to documentation label Feb 17, 2022

This was referenced Feb 18, 2022

Fix initializer error in TensorRT example #324

Merged

Support exporting TensorRT serialized engine in CLI tools #326

Merged

zhiqwang closed this as completed in #326 Feb 18, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TensorRT C++ Example Error #322

TensorRT C++ Example Error #322

AnnetGdd commented Feb 17, 2022

AnnetGdd commented Feb 17, 2022 •

edited

Loading

AnnetGdd commented Feb 17, 2022

zhiqwang commented Feb 17, 2022 •

edited

Loading

zhiqwang commented Feb 17, 2022

zhiqwang commented Feb 17, 2022 •

edited

Loading

zhiqwang commented Feb 17, 2022 •

edited

Loading

AnnetGdd commented Feb 17, 2022 •

edited

Loading

zhiqwang commented Feb 18, 2022 •

edited

Loading

zhiqwang commented Feb 18, 2022

zhiqwang commented Feb 19, 2022

AnnetGdd commented Feb 21, 2022

TensorRT C++ Example Error #322

TensorRT C++ Example Error #322

Comments

AnnetGdd commented Feb 17, 2022

🐛 Describe the bug

Versions

AnnetGdd commented Feb 17, 2022 • edited Loading

AnnetGdd commented Feb 17, 2022

zhiqwang commented Feb 17, 2022 • edited Loading

zhiqwang commented Feb 17, 2022

zhiqwang commented Feb 17, 2022 • edited Loading

zhiqwang commented Feb 17, 2022 • edited Loading

AnnetGdd commented Feb 17, 2022 • edited Loading

zhiqwang commented Feb 18, 2022 • edited Loading

zhiqwang commented Feb 18, 2022

zhiqwang commented Feb 19, 2022

AnnetGdd commented Feb 21, 2022

AnnetGdd commented Feb 17, 2022 •

edited

Loading

zhiqwang commented Feb 17, 2022 •

edited

Loading

zhiqwang commented Feb 17, 2022 •

edited

Loading

zhiqwang commented Feb 17, 2022 •

edited

Loading

AnnetGdd commented Feb 17, 2022 •

edited

Loading

zhiqwang commented Feb 18, 2022 •

edited

Loading