Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add end2end yolov7 onnx export for TensorRT8.0+ and onnxruntime(testing now) #273

Merged
merged 9 commits into from
Jul 23, 2022

Conversation

triple-Mu
Copy link
Contributor

End-to-end object detection has always been a hot topic in this field.
How to send pictures into the network, and get clean output, so that we do not need to do NMS is a problem that developers often discuss.
This pr uses pytorch's symbolic and designs a global NMS for batch. Registering the NMS operater with the network can speed up global detection and reduce data copying.
Image input, result output!
Real end-to-end detection will make yolov7 even greater !

You can get more information in end2end_example.ipynb !

@triple-Mu
Copy link
Contributor Author

@philipp-schmidt
Copy link
Contributor

Efficient NMS Plugin included with ONNX has already been merged just a few hours ago.

Use --include-nms with export.py

@triple-Mu
Copy link
Contributor Author

@philipp-schmidt
The usage of this pr is different from the pr you mentioned, but this pr does not require additional dependencies such as onnx-graphsurgeon, just use pytorch and onnx to complete the registration of NMS plugins.
Besides, this one is better compatible with new models and has been tested on yolov5 and yolov6. In addition to adding NMS to TensorRT, you can also use ONNXruntime inference in ONNX. It is also very convenient to add preprocessing parts such as bgr2rgb normalization to ONNX.
Convenience and efficiency are the main purpose of this pr.

@AlexeyAB
Copy link
Collaborator

@triple-Mu Thanks!

Does your Export and NMS implementation better than the current in the most/all cases?

@triple-Mu
Copy link
Contributor Author

@triple-Mu Thanks!

Does your Export and NMS implementation better than the current in the most/all cases?

The method proposed by this pr adds many options, such as iou threshold and conf threshold. In terms of accuracy, since coco mAP calculation requires a lower conf threshold, this part cannot be customized if the combined method is used. Apart from that, using this pr makes it easier to add new modules like preprocessing and letterbox. In terms of inference speed, I think it is consistent with another pr.

@AlexeyAB
Copy link
Collaborator

@triple-Mu Great!

since coco mAP calculation requires a lower conf threshold, this part cannot be customized if the combined method is used.

As I understand for coco mAP calculation we can use your export.py --conf-thres 0.001

Could you please fix merge conflicts and provide Google Colab how to convert model to ONNX / TRT and use these ONNX / TRT models for inference?

@triple-Mu
Copy link
Contributor Author

@triple-Mu Great!

since coco mAP calculation requires a lower conf threshold, this part cannot be customized if the combined method is used.

As I understand for coco mAP calculation we can use your export.py --conf-thres 0.001

Could you please fix merge conflicts and provide Google Colab how to convert model to ONNX / TRT and use these ONNX / TRT models for inference?

The example is here
https://github.com/triple-Mu/yolov7/blob/end2end/end2end_example.ipynb
Conflicts are not easy to resolve. We made some changes to the model part.

@AlexeyAB
Copy link
Collaborator

AlexeyAB commented Jul 22, 2022

In addition to adding NMS to TensorRT, you can also use ONNXruntime inference in ONNX.

Do you mean that the current implementation in the https://github.com/WongKinYiu/yolov7 main-branch doesn't allow to use ONNXruntime inference in ONNX?

@triple-Mu
Copy link
Contributor Author

In addition to adding NMS to TensorRT, you can also use ONNXruntime inference in ONNX.

Do you mean that the current implementation in the https://github.com/WongKinYiu/yolov7 main-branch doesn't allow to use ONNXruntime inference in ONNX?

Sure. I will add an example for onnxruntime

@triple-Mu
Copy link
Contributor Author

In addition to adding NMS to TensorRT, you can also use ONNXruntime inference in ONNX.

Do you mean that the current implementation in the https://github.com/WongKinYiu/yolov7 main-branch doesn't allow to use ONNXruntime inference in ONNX?

Now I add two examples notebook
For onnxruntime end2end detect https://github.com/triple-Mu/yolov7/blob/end2end/end2end_onnxruntime.ipynb
For tensorrt end2end detect https://github.com/triple-Mu/yolov7/blob/end2end/end2end_tensorrt.ipynb

There is a slight difference between the two usages because they are two different NMS.
This flag --max-wh determines which NMS we use.
If we use a positive integer , we will get an onnx for onnxruntime whose nms op is non-agnostic.
If we use zero, we will get an onnx for onnxruntime whose nms op is agnostic.
If we use default none, we will get an onnx for tensorrt whose nms is a plugin.

@AlexeyAB
Copy link
Collaborator

If we use default none, we will get an onnx for tensorrt whose nms is a plugin.

Does it mean that such (default none) NMS will work only for TRT-inference, but will not work for ONNX-inference?

@triple-Mu
Copy link
Contributor Author

If we use default none, we will get an onnx for tensorrt whose nms is a plugin.

Does it mean that such (default none) NMS will work only for TRT-inference, but will not work for ONNX-inference?

Yes! It is a plugin not an onnx op.
It cannot be parsed by onnxruntime.

@AlexeyAB
Copy link
Collaborator

@AlexeyAB AlexeyAB merged commit 1c59e43 into WongKinYiu:main Jul 23, 2022
@triple-Mu triple-Mu deleted the end2end branch July 23, 2022 02:43
@philipp-schmidt
Copy link
Contributor

philipp-schmidt commented Jul 27, 2022

Hi @triple-Mu, I have a few questions with this PR:

If we use a positive integer , we will get an onnx for onnxruntime whose nms op is non-agnostic.
If we use zero, we will get an onnx for onnxruntime whose nms op is agnostic.

What's the actual difference between the two? Can't tell from the code.


parser.add_argument('--max-wh', type=int, default=None, help='None for tensorrt nms, int value for onnx-runtime nms')

What's the meaning of the int value? How do I pick the correct one?
If it doesn't matter why not e.g. boolean "--tensorrt-nms-plugin"?


'output': {0: 'batch', 2: 'y', 3: 'x'}} if opt.dynamic and not opt.end2end else None)

Why does opt.end2end disable any dynamic axis?
Dynamic batching input is an important optimization step (e.g. for deployment on Triton Inference Server with TensorRT can give up to +25% throughput).

If I set dynamic axis to what it actually should be here (the previous code is copy pasted from a different model I believe, it makes absolutely no sense to make input x and y dynamic if there is no letterbox preprocess op present):

shapes = ["batch", 1, "batch", opt.topk_all, 4, "batch", opt.topk_all, "batch", opt.topk_all]
dynamic_axes={'images': {0: 'batch'},
                                      'num_dets': {0: 'batch'},
                                      'det_boxes': {0: 'batch'},
                                      'det_scores': {0: 'batch'},
                                      'det_classes': {0: 'batch'}}

image

Any reason to disable this? I will make a PR (or change my current PR #280), so I'm curious if you know there are any blockers with your implementation for dynamic input?


From export.py options:
--include-nms: export end2end onnx
--end2end: export end2end onnx

This is very ambiguous now - the include-nms option should have been removed with this PR?

@triple-Mu
Copy link
Contributor Author

triple-Mu commented Jul 27, 2022

Hi @triple-Mu, I have a few questions with this PR:

If we use a positive integer , we will get an onnx for onnxruntime whose nms op is non-agnostic.
If we use zero, we will get an onnx for onnxruntime whose nms op is agnostic.

What's the actual difference between the two? Can't tell from the code.

@philipp-schmidt
Question 1:

For those using onnxruntime, in order to achieve the same NMS functionality as https://github.com/WongKinYiu/yolov7/blob/main/utils/general.py#L677 , I provide max-wh flag to control the max_wh functionality in similar code. This is the same implementation.

For those using TensorRT plugin, Efficient NMS Plugin is a non-agnostic NMS, so we don't need max-wh, set default None will be ok.

parser.add_argument('--max-wh', type=int, default=None, help='None for tensorrt nms, int value for onnx-runtime nms')

What's the meaning of the int value? How do I pick the correct one? If it doesn't matter why not e.g. boolean "--tensorrt-nms-plugin"?

Question 2:

In order not to add extra options, because too many options are already included. If we set boolean flag --tensorrt-nms-plugin , shell we set boolean flag --onnx-nms-operater ?

'output': {0: 'batch', 2: 'y', 3: 'x'}} if opt.dynamic and not opt.end2end else None)

Why does opt.end2end disable any dynamic axis? Dynamic batching input is an important optimization step (e.g. for deployment on Triton Inference Server with TensorRT can give up to +25% throughput).

Question 3:

Because export.py already contains the dynamic option, and all output axes are dynamic, if you need dynamic batch dimensions, you need to modify a lot of content.

Beyond that, while TensorRT supports dynamic batch well, for most people, deploying static models is sufficient.

If I set dynamic axis to what it actually should be here (the previous code is copy pasted from a different model I believe, it makes absolutely no sense to make input x and y dynamic if there is no letterbox preprocess op present):

shapes = ["batch", 1, "batch", opt.topk_all, 4, "batch", opt.topk_all, "batch", opt.topk_all]
dynamic_axes={'images': {0: 'batch'},
                                      'num_dets': {0: 'batch'},
                                      'det_boxes': {0: 'batch'},
                                      'det_scores': {0: 'batch'},
                                      'det_classes': {0: 'batch'}}

image

Any reason to disable this? I will make a PR (or change my current PR #280), so I'm curious if you know there are any blockers with your implementation for dynamic input?

From export.py options:
--include-nms: export end2end onnx
--end2end: export end2end onnx

This is very ambiguous now - the include-nms option should have been removed with this PR?

Question 4:

include-nms has nothing to do with me. It is also useful.

Summary: The reason why this pr conflicts with end2end and dynamic settings is not that dynamic batch is not supported, but that the code modification may be larger.
Dynamic batch is feasible, and modification on the basis of this pr can also be implemented.

I think as long as your pr is strong enough, yolov7 will not reject you!

wheemyungshin-nota pushed a commit to wheemyungshin/yolov7 that referenced this pull request Dec 1, 2023
…ng now) (WongKinYiu#273)

* Add end2end yolov7 onnx export for TensorRT8.0+

* Add usage in README

* Update yolo.py

* Update yolo.py

* Add tensorrt onnxruntime examples

* Add usage in README

Co-authored-by: Alexey <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants