Add TensorRT infer support #57

xiang-wuu · 2022-07-08T18:36:18Z

This PR is intended to export model from PyTorch to onnx, and then serialize the exported onnx model to native TRT engine, which will be inferred using TensorRT I.E,

Implement onnx_to_tensorrt.py script
Export onnx model to TensorRT engine
Implement python module to infer from serialized TRT engine
Integrate pre-process & post process functions from detect.py script to TensorRT infer script
Drawing bounding box detection's over sample image
Custom Detect plugin integration for TensorRT < 8.0
Implement INT8 calibrator script for INT8 serialization

philipp-schmidt · 2022-07-10T07:59:24Z

When exported with --grid:
python models/export.py --weights yolov7.pt --grid

Building the TensorRT engine fails:

root@3aa30b614471:/workspace/yolov7# python deploy/TensoorRT/onnx_to_tensorrt.py --onnx yolov7.onnx --fp16 --explicit-batch -o yolov7.engine
Namespace(calibration_batch_size=128, calibration_cache='calibration.cache', calibration_data=None, debug=False, explicit_batch=True, explicit_precision=False, fp16=True, gpu_fallback=False, int8=False, max_batch_size=None, max_calibration_size=2048, onnx='yolov7.onnx', output='yolov7.engine', refittable=False, simple=False, strict_types=False, verbosity=None)
2022-07-10 07:53:52 - __main__ - INFO - TRT_LOGGER Verbosity: Severity.ERROR
2022-07-10 07:53:52 - __main__ - INFO - Setting BuilderFlag.FP16
[TensorRT] ERROR: [graphShapeAnalyzer.cpp::throwIfError::1306] Error Code 9: Internal Error (Mul_378: broadcast dimensions must be conformable
)
ERROR: Failed to parse the ONNX file: yolov7.onnx
In node 378 (parseGraph): INVALID_NODE: Invalid Node - Mul_378
[graphShapeAnalyzer.cpp::throwIfError::1306] Error Code 9: Internal Error (Mul_378: broadcast dimensions must be conformable
)

Any idea how to fix that @xiang-wuu ?

philipp-schmidt · 2022-07-10T08:01:09Z

I have the same issue when using trtexec for conversion, so this is definitely a TensorRT / ONNX issue.
Here: #66

xiang-wuu · 2022-07-10T10:33:05Z

@philipp-schmidt that could be an issue due to PyTorch and ONNX version, try upgrading to latest versions for both of them. however am working on post-processing part with --grid option which returns primary output node with shape (1, 25200, 85).

philipp-schmidt · 2022-07-10T11:00:20Z

Yes it was the pytorch version. I also had to run onnx-simplify, otherwise TensorRT had issues with a few resize operations.

Looking forward to try your implementation.

…erencing.

xiang-wuu · 2022-07-12T16:33:31Z

Almost done, with some final typo's to be resolved.

philipp-schmidt · 2022-07-12T17:40:43Z

Quickly scanned the code and it looks really good!

A few questions / remarks:

you use yolov7.cache for INT8, how do you put that together? Still a todo? Actually im curios about yolov7 INT8 performance to accuracy tradeoff, so that would be cool to see!
Conversion from onnx to TensorRT can also be done with TensorRT directly without any additional code. The NGC TensorRT docker images come with a precompiled tool "trtexec" which will happily turn ONNX into an engine.
I'm looking into making the batch size dynamic so that e.g. Triton Inference Server can combine smaller requests into larger batch sizes via a feature called Dynamic Batching. (e.g. pack multiple simultaniously arriving batch 1 requests into one larger batch 4)
While coding this, did you somehow manage to make the input batch size of the TensorRT engine dynamic up to a maximum batch size?
So basically the input shape will be either [-1,640,640,3] for explicit batching or [640,640,3] with implicit batching. In the past ONNX was unable to support implicit batching (still seems be the case) and custom plugins were a little hard to make work with dynamic (-1) + explicit batching.

albertfaromatics · 2022-07-12T17:53:26Z

Hi, sorry to write here. I've tried your branch with tensorrt and yolov7-tiny custom trained on a Nvidia Jetson Xavier NX. I converted the model trained from pytorch with no problem after some tries, but when testing results, both mAP and FPS are much lower:

PyTorch + cuda: ~40fps, 78mAP
TRT: ~24fps, 64mAP

Is this normal? Am I doing something wrong?

xiang-wuu · 2022-07-13T07:11:07Z

Quickly scanned the code and it looks really good!

A few questions / remarks:

1. you use yolov7.cache for INT8, how do you put that together? Still a todo? Actually im curios about yolov7 INT8 performance to accuracy tradeoff, so that would be cool to see!

2. Conversion from onnx to TensorRT can also be done with TensorRT directly without any additional code. The NGC TensorRT docker images come with a precompiled tool "trtexec" which will happily turn ONNX into an engine.

3. I'm looking into making the batch size dynamic so that e.g. Triton Inference Server can combine smaller requests into larger batch sizes via a feature called Dynamic Batching. (e.g. pack multiple simultaniously arriving batch 1 requests into one larger batch 4)
   While coding this, did you somehow manage to make the input batch size of the TensorRT engine dynamic up to a maximum batch size?
   So basically the input shape will be either [-1,640,640,3] for explicit batching or [640,640,3] with implicit batching. In the past ONNX was unable to support implicit batching (still seems be the case) and custom plugins were a little hard to make work with dynamic (-1) + explicit batching.

will add calibration script for PTQ.
Yes, serialization with trtexec is possible but if using TRT < 8.0 the custom plugin need's to be preloaded.
I haven't tested for max. dynamic batch size , but as i know dynamic batching is effectively abstracted by Triton and by exporting the onnx model with implicit batching could make it work with Triton, still subject to trial & error!

xiang-wuu · 2022-07-13T07:13:54Z

Hi, sorry to write here. I've tried your branch with tensorrt and yolov7-tiny custom trained on a Nvidia Jetson Xavier NX. I converted the model trained from pytorch with no problem after some tries, but when testing results, both mAP and FPS are much lower:
* PyTorch + cuda: ~40fps, 78mAP

* TRT: ~24fps, 64mAP
Is this normal? Am I doing something wrong?

Optimization is out of scope for this PR, this PR is intended to support minimalistic deployable TRT implementation, the optimization is altogether subject to further contribution.

philipp-schmidt · 2022-07-13T07:31:49Z

@albertfaromatics
How do you test FPS and mAP?
There is very little chance that your TensorRT engine is slower than pytorch directly. Especially on Jetson.

albertfaromatics · 2022-07-13T07:37:04Z

@philipp-schmidt
For PyTorch + cuda, I simply adapted the detect.py here to read a folder of images (around 200 of them), compute prediction time (inference + nms) and compute fps
For TensorRT, I followed the README on the repo, with export, simplify, onnx_to_tensorrt (I'm using TensorRT 8.4) and run.

These steps gave me the FPS (40ish vs 25ish). For mAP I used the test here and adapted a code to get the detections from TensorRT and "manually" compute mAP.

philipp-schmidt · 2022-07-13T07:55:05Z

Try to run your engine with trtexec instead, it will give you a very good indication of actual compute latency.

Last few steps of this: https://github.com/isarsoft/yolov4-triton-tensorrt#build-tensorrt-engine

philipp-schmidt · 2022-07-13T07:56:33Z

I don't think that it comes prebuilt in the Linux 4 Tegra TensorRT docker images for jetson though.

albertfaromatics · 2022-07-13T08:09:47Z

@philipp-schmidt I'll give it a try. I can compile it myself from tensorrt/samples folder, but never used it before.

I'll try and see why I have this results.
Thanks!

xiang-wuu · 2022-07-13T12:43:35Z

@WongKinYiu good to merge.

ccqedq · 2022-07-14T16:55:31Z

it works,but no bounding box is drawn

xiang-wuu · 2022-07-14T18:31:13Z

it works,but no bounding box is drawn

share the environment details?

ccqedq · 2022-07-15T07:37:00Z

torch 1.11.0+cu113 onnx 1.12.0 tensorrt 8.4.1.5
I use ScatterND op built-in plugin to run the code, but found no bounding box is drawn
Considering the built-in plugin is used, is this a problem of data preprocessing?

ccqedq · 2022-07-15T07:44:04Z

I use deploy_onnx_trt branch to generate yolov7.onnx, to get yolov7.engine, I run the following command:
python3 onnx_to_tensorrt.py --explicit-batch --onnx yolov7-sim.onnx -o yolov7.engine

xiang-wuu · 2022-07-15T08:17:40Z

@dongdengwei , try without building the plugin , if using TRT > 8.0

ccqedq · 2022-07-15T08:57:01Z

I run the following command to do the inference:
python3 yolov7_trt.py video1.mp4
still no bounding box

ccqedq · 2022-07-15T10:25:08Z

it seem that I should replace "return x if self.training else (torch.cat(z, 1), x)" with "return x if self.training else (torch.cat(z, 1), ) if not self.export else (torch.cat(z, 1), x)" in yolo.py.
but in environment torch 1.10.1+cu111 onnx 1.8.1 tensorrt 7.2.3.4, it has the following error:
2022-07-15 18:18:59 - main - INFO - TRT_LOGGER Verbosity: Severity.ERROR
getFieldNames
createPlugin
[TensorRT] ERROR: Mul_378: elementwise inputs must have same dimensions or follow broadcast rules (input dimensions were [1,3,80,80,2] and [1,1,1,3,2]).
should I upgrade torch 1.10.1 to 1.11.0 and onnx 1.8.1 to 1.12.0

xiang-wuu · 2022-07-15T12:51:57Z

@dongdengwei PyTorch > 1.11.0 is required to make it work, recommended is 1.12.0

akashAD98 · 2022-08-02T05:12:19Z

@xiang-wuu @philipp-schmidt @AlexeyAB @Linaom1214 can you share the map performance of converted model? is the accuracy same after conversion ? or how much drop in accuracy?also it would be great if you add support for checking map of .trt model ,its inference on video. Thanks

akashAD98 · 2022-08-10T08:50:20Z

Linaom1214/TensorRT-For-YOLO-Series#26 not able to do inference on videos

Stoooner · 2022-09-05T14:08:08Z

Hi, sorry to write here. I've tried your branch with tensorrt and yolov7-tiny custom trained on a Nvidia Jetson Xavier NX. I converted the model trained from pytorch with no problem after some tries, but when testing results, both mAP and FPS are much lower:

PyTorch + cuda: ~40fps, 78mAP

TRT: ~24fps, 64mAP

Is this normal? Am I doing something wrong?

Hi, I have tested the yolov7-tiny tensorRT model on jetson Xavier NX by my own code, and the result is showed in issue #703:#703, maybe you can check it.

Linaom1214 · 2022-09-05T14:17:58Z

Linaom1214/TensorRT-For-YOLO-Series#26 not able to do inference on videos

the reason is colab env don't support opencv imshow fucntion

9friday · 2023-10-14T12:45:14Z

Hi, sorry to write here. I've tried your branch with tensorrt and yolov7-tiny custom trained on a Nvidia Jetson Xavier NX. I converted the model trained from pytorch with no problem after some tries, but when testing results, both mAP and FPS are much lower:

PyTorch + cuda: ~40fps, 78mAP

TRT: ~24fps, 64mAP

Is this normal? Am I doing something wrong?

Hi @xiang-wuu ,
I'm using a Nvidia Jetson Xavier AGX with Jetpack version 4.6.1 and CUDA version 10.2. I would like to recreate these results for both .pt and TRT formats.
We have tried to convert to .engine files using the 'trtexec' already present with the L4T installation in the Jetson device, but the inference timings are not good. For inference we used the 'official yolov7 deepstream inference script' from NVIDIA.

Environment setup:

Should the requirements.txt from yolov7 repo be used on Xavier AGX as it is?
Should we install Pytorch from 'Pytorch for Jetson'. The Pytorch wheel corresponding to Jetpack 4.6.1 is this.

Inference on Jetson device:

Is the original detect.py sufficient for inference using .pt weights on Jetson devices?
Is the YOLOv7ONNXandTRT.ipynb file sufficient for inference using TRT format weights on Jetson devices?

Looking forward to your response.

Cheers :)

xiang-wuu added 5 commits July 9, 2022 00:01

script to convert onnx to tensorrt engine

9943fb5

gitignore updated

c36c35a

README added

f11d9cb

trt infer module added.

63101c8

README updated

e342736

deanofthewebb mentioned this pull request Jul 10, 2022

TRT errors on concatentation route_15 #41

Open

xiang-wuu added 5 commits July 12, 2022 01:07

base class for pre-process & post process functions including TRT inf…

2b4193e

…erencing.

custom scatterND plugin for TRT < 8.0

4887aa6

final demo script for running live demo using TRT inference.

1e177cc

updated gitignore

df0fff4

README updated for final usage.

b8eba5d

typo's fixed

93ed426

xiang-wuu added 2 commits July 13, 2022 15:25

PTQ support for INT8 calibration

e2ef49f

updated README for PTQ

80907e8

AlexeyAB mentioned this pull request Jul 22, 2022

ONNX to TensorRT method #278

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add TensorRT infer support #57

Add TensorRT infer support #57

xiang-wuu commented Jul 8, 2022 •

edited

Loading

philipp-schmidt commented Jul 10, 2022 •

edited

Loading

philipp-schmidt commented Jul 10, 2022

xiang-wuu commented Jul 10, 2022

philipp-schmidt commented Jul 10, 2022

xiang-wuu commented Jul 12, 2022

philipp-schmidt commented Jul 12, 2022 •

edited

Loading

albertfaromatics commented Jul 12, 2022

xiang-wuu commented Jul 13, 2022

xiang-wuu commented Jul 13, 2022

philipp-schmidt commented Jul 13, 2022

albertfaromatics commented Jul 13, 2022

philipp-schmidt commented Jul 13, 2022 •

edited

Loading

philipp-schmidt commented Jul 13, 2022

albertfaromatics commented Jul 13, 2022

xiang-wuu commented Jul 13, 2022

ccqedq commented Jul 14, 2022

xiang-wuu commented Jul 14, 2022

ccqedq commented Jul 15, 2022

ccqedq commented Jul 15, 2022

xiang-wuu commented Jul 15, 2022

ccqedq commented Jul 15, 2022

ccqedq commented Jul 15, 2022

xiang-wuu commented Jul 15, 2022

akashAD98 commented Aug 2, 2022

akashAD98 commented Aug 10, 2022 •

edited

Loading

Stoooner commented Sep 5, 2022

Linaom1214 commented Sep 5, 2022

9friday commented Oct 14, 2023 •

edited

Loading

Add TensorRT infer support #57

Are you sure you want to change the base?

Add TensorRT infer support #57

Conversation

xiang-wuu commented Jul 8, 2022 • edited Loading

philipp-schmidt commented Jul 10, 2022 • edited Loading

philipp-schmidt commented Jul 10, 2022

xiang-wuu commented Jul 10, 2022

philipp-schmidt commented Jul 10, 2022

xiang-wuu commented Jul 12, 2022

philipp-schmidt commented Jul 12, 2022 • edited Loading

albertfaromatics commented Jul 12, 2022

xiang-wuu commented Jul 13, 2022

xiang-wuu commented Jul 13, 2022

philipp-schmidt commented Jul 13, 2022

albertfaromatics commented Jul 13, 2022

philipp-schmidt commented Jul 13, 2022 • edited Loading

philipp-schmidt commented Jul 13, 2022

albertfaromatics commented Jul 13, 2022

xiang-wuu commented Jul 13, 2022

ccqedq commented Jul 14, 2022

xiang-wuu commented Jul 14, 2022

ccqedq commented Jul 15, 2022

ccqedq commented Jul 15, 2022

xiang-wuu commented Jul 15, 2022

ccqedq commented Jul 15, 2022

ccqedq commented Jul 15, 2022

xiang-wuu commented Jul 15, 2022

akashAD98 commented Aug 2, 2022

akashAD98 commented Aug 10, 2022 • edited Loading

Stoooner commented Sep 5, 2022

Linaom1214 commented Sep 5, 2022

9friday commented Oct 14, 2023 • edited Loading

xiang-wuu commented Jul 8, 2022 •

edited

Loading

philipp-schmidt commented Jul 10, 2022 •

edited

Loading

philipp-schmidt commented Jul 12, 2022 •

edited

Loading

philipp-schmidt commented Jul 13, 2022 •

edited

Loading

akashAD98 commented Aug 10, 2022 •

edited

Loading

9friday commented Oct 14, 2023 •

edited

Loading