Add end2end yolov7 onnx export for TensorRT8.0+ and onnxruntime(testing now) #273

triple-Mu · 2022-07-22T08:29:40Z

End-to-end object detection has always been a hot topic in this field.
How to send pictures into the network, and get clean output, so that we do not need to do NMS is a problem that developers often discuss.
This pr uses pytorch's symbolic and designs a global NMS for batch. Registering the NMS operater with the network can speed up global detection and reduce data copying.
Image input, result output!
Real end-to-end detection will make yolov7 even greater !

You can get more information in end2end_example.ipynb !

triple-Mu · 2022-07-22T08:31:35Z

@WongKinYiu @AlexeyAB
Such pr is the same as:
ultralytics/yolov5#8101
meituan/YOLOv6#34
PaddlePaddle/PaddleDetection#6348

And it is also avaiable in my repo : https://github.com/triple-Mu/YOLO-TensorRT8

philipp-schmidt · 2022-07-22T14:54:52Z

Efficient NMS Plugin included with ONNX has already been merged just a few hours ago.

Use --include-nms with export.py

triple-Mu · 2022-07-22T15:11:39Z

@philipp-schmidt
The usage of this pr is different from the pr you mentioned, but this pr does not require additional dependencies such as onnx-graphsurgeon, just use pytorch and onnx to complete the registration of NMS plugins.
Besides, this one is better compatible with new models and has been tested on yolov5 and yolov6. In addition to adding NMS to TensorRT, you can also use ONNXruntime inference in ONNX. It is also very convenient to add preprocessing parts such as bgr2rgb normalization to ONNX.
Convenience and efficiency are the main purpose of this pr.

AlexeyAB · 2022-07-22T15:34:54Z

@triple-Mu Thanks!

Does your Export and NMS implementation better than the current in the most/all cases?

triple-Mu · 2022-07-22T15:44:32Z

@triple-Mu Thanks!

Does your Export and NMS implementation better than the current in the most/all cases?

The method proposed by this pr adds many options, such as iou threshold and conf threshold. In terms of accuracy, since coco mAP calculation requires a lower conf threshold, this part cannot be customized if the combined method is used. Apart from that, using this pr makes it easier to add new modules like preprocessing and letterbox. In terms of inference speed, I think it is consistent with another pr.

AlexeyAB · 2022-07-22T15:52:10Z

@triple-Mu Great!

since coco mAP calculation requires a lower conf threshold, this part cannot be customized if the combined method is used.

As I understand for coco mAP calculation we can use your export.py --conf-thres 0.001

Could you please fix merge conflicts and provide Google Colab how to convert model to ONNX / TRT and use these ONNX / TRT models for inference?

triple-Mu · 2022-07-22T15:56:30Z

@triple-Mu Great!

since coco mAP calculation requires a lower conf threshold, this part cannot be customized if the combined method is used.

As I understand for coco mAP calculation we can use your export.py --conf-thres 0.001

Could you please fix merge conflicts and provide Google Colab how to convert model to ONNX / TRT and use these ONNX / TRT models for inference?

The example is here
https://github.com/triple-Mu/yolov7/blob/end2end/end2end_example.ipynb
Conflicts are not easy to resolve. We made some changes to the model part.

AlexeyAB · 2022-07-22T16:49:22Z

In addition to adding NMS to TensorRT, you can also use ONNXruntime inference in ONNX.

Do you mean that the current implementation in the https://github.com/WongKinYiu/yolov7 main-branch doesn't allow to use ONNXruntime inference in ONNX?

triple-Mu · 2022-07-23T01:35:47Z

In addition to adding NMS to TensorRT, you can also use ONNXruntime inference in ONNX.

Do you mean that the current implementation in the https://github.com/WongKinYiu/yolov7 main-branch doesn't allow to use ONNXruntime inference in ONNX?

Sure. I will add an example for onnxruntime

triple-Mu · 2022-07-23T02:05:35Z

In addition to adding NMS to TensorRT, you can also use ONNXruntime inference in ONNX.

Do you mean that the current implementation in the https://github.com/WongKinYiu/yolov7 main-branch doesn't allow to use ONNXruntime inference in ONNX?

Now I add two examples notebook
For onnxruntime end2end detect https://github.com/triple-Mu/yolov7/blob/end2end/end2end_onnxruntime.ipynb
For tensorrt end2end detect https://github.com/triple-Mu/yolov7/blob/end2end/end2end_tensorrt.ipynb

There is a slight difference between the two usages because they are two different NMS.
This flag --max-wh determines which NMS we use.
If we use a positive integer , we will get an onnx for onnxruntime whose nms op is non-agnostic.
If we use zero, we will get an onnx for onnxruntime whose nms op is agnostic.
If we use default none, we will get an onnx for tensorrt whose nms is a plugin.

AlexeyAB · 2022-07-23T02:22:41Z

If we use default none, we will get an onnx for tensorrt whose nms is a plugin.

Does it mean that such (default none) NMS will work only for TRT-inference, but will not work for ONNX-inference?

triple-Mu · 2022-07-23T02:29:21Z

If we use default none, we will get an onnx for tensorrt whose nms is a plugin.

Does it mean that such (default none) NMS will work only for TRT-inference, but will not work for ONNX-inference?

Yes! It is a plugin not an onnx op.
It cannot be parsed by onnxruntime.

AlexeyAB · 2022-07-23T02:36:48Z

Great, it works! https://colab.research.google.com/gist/AlexeyAB/ae8ec223c0d7c3d8b6b4e2c609a72df0/yolov7trtlinaom.ipynb

philipp-schmidt · 2022-07-27T10:39:33Z

Hi @triple-Mu, I have a few questions with this PR:

If we use a positive integer , we will get an onnx for onnxruntime whose nms op is non-agnostic.
If we use zero, we will get an onnx for onnxruntime whose nms op is agnostic.

What's the actual difference between the two? Can't tell from the code.

parser.add_argument('--max-wh', type=int, default=None, help='None for tensorrt nms, int value for onnx-runtime nms')

What's the meaning of the int value? How do I pick the correct one?
If it doesn't matter why not e.g. boolean "--tensorrt-nms-plugin"?

'output': {0: 'batch', 2: 'y', 3: 'x'}} if opt.dynamic and not opt.end2end else None)

Why does opt.end2end disable any dynamic axis?
Dynamic batching input is an important optimization step (e.g. for deployment on Triton Inference Server with TensorRT can give up to +25% throughput).

If I set dynamic axis to what it actually should be here (the previous code is copy pasted from a different model I believe, it makes absolutely no sense to make input x and y dynamic if there is no letterbox preprocess op present):

shapes = ["batch", 1, "batch", opt.topk_all, 4, "batch", opt.topk_all, "batch", opt.topk_all]
dynamic_axes={'images': {0: 'batch'},
                                      'num_dets': {0: 'batch'},
                                      'det_boxes': {0: 'batch'},
                                      'det_scores': {0: 'batch'},
                                      'det_classes': {0: 'batch'}}

Any reason to disable this? I will make a PR (or change my current PR #280), so I'm curious if you know there are any blockers with your implementation for dynamic input?

From export.py options:
--include-nms: export end2end onnx
--end2end: export end2end onnx

This is very ambiguous now - the include-nms option should have been removed with this PR?

triple-Mu · 2022-07-27T11:20:45Z

Hi @triple-Mu, I have a few questions with this PR:

If we use a positive integer , we will get an onnx for onnxruntime whose nms op is non-agnostic.
If we use zero, we will get an onnx for onnxruntime whose nms op is agnostic.

What's the actual difference between the two? Can't tell from the code.

@philipp-schmidt
Question 1:

For those using onnxruntime, in order to achieve the same NMS functionality as https://github.com/WongKinYiu/yolov7/blob/main/utils/general.py#L677 , I provide max-wh flag to control the max_wh functionality in similar code. This is the same implementation.

For those using TensorRT plugin, Efficient NMS Plugin is a non-agnostic NMS, so we don't need max-wh, set default None will be ok.

parser.add_argument('--max-wh', type=int, default=None, help='None for tensorrt nms, int value for onnx-runtime nms')

What's the meaning of the int value? How do I pick the correct one? If it doesn't matter why not e.g. boolean "--tensorrt-nms-plugin"?

Question 2:

In order not to add extra options, because too many options are already included. If we set boolean flag --tensorrt-nms-plugin , shell we set boolean flag --onnx-nms-operater ?

'output': {0: 'batch', 2: 'y', 3: 'x'}} if opt.dynamic and not opt.end2end else None)

Why does opt.end2end disable any dynamic axis? Dynamic batching input is an important optimization step (e.g. for deployment on Triton Inference Server with TensorRT can give up to +25% throughput).

Question 3:

Because export.py already contains the dynamic option, and all output axes are dynamic, if you need dynamic batch dimensions, you need to modify a lot of content.

Beyond that, while TensorRT supports dynamic batch well, for most people, deploying static models is sufficient.

If I set dynamic axis to what it actually should be here (the previous code is copy pasted from a different model I believe, it makes absolutely no sense to make input x and y dynamic if there is no letterbox preprocess op present):
shapes = ["batch", 1, "batch", opt.topk_all, 4, "batch", opt.topk_all, "batch", opt.topk_all]
dynamic_axes={'images': {0: 'batch'},
                                      'num_dets': {0: 'batch'},
                                      'det_boxes': {0: 'batch'},
                                      'det_scores': {0: 'batch'},
                                      'det_classes': {0: 'batch'}}
Any reason to disable this? I will make a PR (or change my current PR #280), so I'm curious if you know there are any blockers with your implementation for dynamic input?

From export.py options:
--include-nms: export end2end onnx
--end2end: export end2end onnx

This is very ambiguous now - the include-nms option should have been removed with this PR?

Question 4:

include-nms has nothing to do with me. It is also useful.

Summary: The reason why this pr conflicts with end2end and dynamic settings is not that dynamic batch is not supported, but that the code modification may be larger.
Dynamic batch is feasible, and modification on the basis of this pr can also be implemented.

I think as long as your pr is strong enough, yolov7 will not reject you!

…ng now) (WongKinYiu#273) * Add end2end yolov7 onnx export for TensorRT8.0+ * Add usage in README * Update yolo.py * Update yolo.py * Add tensorrt onnxruntime examples * Add usage in README Co-authored-by: Alexey <[email protected]>

Add end2end yolov7 onnx export for TensorRT8.0+

4f5fb27

Add usage in README

2900827

AlexeyAB mentioned this pull request Jul 22, 2022

ONNX to TensorRT method #278

Open

triple-Mu force-pushed the end2end branch from 081fa6a to 2900827 Compare July 22, 2022 15:46

AlexeyAB mentioned this pull request Jul 22, 2022

EfficientNMS_TRT does not work on onnxruntime #281

Closed

AlexeyAB added 2 commits July 23, 2022 04:05

Merge branch 'main' into end2end

d208864

Update yolo.py

567ed2c

AlexeyAB and others added 3 commits July 23, 2022 04:38

Update yolo.py

33a9e01

Add tensorrt onnxruntime examples

91bc882

Merge branch 'WongKinYiu:main' into end2end

6779cb5

triple-Mu added 2 commits July 23, 2022 10:11

Add usage in README

67820c8

Merge branch 'end2end' of github.com:triple-Mu/yolov7 into end2end

208e6a9

AlexeyAB merged commit 1c59e43 into WongKinYiu:main Jul 23, 2022

triple-Mu deleted the end2end branch July 23, 2022 02:43

triple-Mu mentioned this pull request Jul 27, 2022

Support dynamic batch for TensorRT and onnxruntime meituan/YOLOv6#369

Closed

triple-Mu mentioned this pull request Jul 27, 2022

Support dynamic batch for TensorRT and onnxruntime #329

Merged

philipp-schmidt mentioned this pull request Jul 27, 2022

Dynamic batchsize export support for ONNX and TensorRT #280

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add end2end yolov7 onnx export for TensorRT8.0+ and onnxruntime(testing now) #273

Add end2end yolov7 onnx export for TensorRT8.0+ and onnxruntime(testing now) #273

triple-Mu commented Jul 22, 2022

triple-Mu commented Jul 22, 2022

philipp-schmidt commented Jul 22, 2022

triple-Mu commented Jul 22, 2022

AlexeyAB commented Jul 22, 2022

triple-Mu commented Jul 22, 2022

AlexeyAB commented Jul 22, 2022

triple-Mu commented Jul 22, 2022

AlexeyAB commented Jul 22, 2022 •

edited

Loading

triple-Mu commented Jul 23, 2022

triple-Mu commented Jul 23, 2022

AlexeyAB commented Jul 23, 2022

triple-Mu commented Jul 23, 2022

AlexeyAB commented Jul 23, 2022

philipp-schmidt commented Jul 27, 2022 •

edited

Loading

triple-Mu commented Jul 27, 2022 •

edited

Loading

Add end2end yolov7 onnx export for TensorRT8.0+ and onnxruntime(testing now) #273

Add end2end yolov7 onnx export for TensorRT8.0+ and onnxruntime(testing now) #273

Conversation

triple-Mu commented Jul 22, 2022

triple-Mu commented Jul 22, 2022

philipp-schmidt commented Jul 22, 2022

triple-Mu commented Jul 22, 2022

AlexeyAB commented Jul 22, 2022

triple-Mu commented Jul 22, 2022

AlexeyAB commented Jul 22, 2022

triple-Mu commented Jul 22, 2022

AlexeyAB commented Jul 22, 2022 • edited Loading

triple-Mu commented Jul 23, 2022

triple-Mu commented Jul 23, 2022

AlexeyAB commented Jul 23, 2022

triple-Mu commented Jul 23, 2022

AlexeyAB commented Jul 23, 2022

philipp-schmidt commented Jul 27, 2022 • edited Loading

triple-Mu commented Jul 27, 2022 • edited Loading

AlexeyAB commented Jul 22, 2022 •

edited

Loading

philipp-schmidt commented Jul 27, 2022 •

edited

Loading

triple-Mu commented Jul 27, 2022 •

edited

Loading