-
-
Notifications
You must be signed in to change notification settings - Fork 16.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add nms for tensorrt8.0+ / onnxruntime / openvino(the same way as onnxruntime) #7736
Conversation
@triple-Mu thanks for the PR, this looks great! Especially like the usage example notebook. If this works for TRT can it also work for ONNX exports? |
This pr exports onnx by default method, and adds an additional graph structure to make the network output meet the input of the TRT nms plugin, and finally adds the nms plugin to allow the network to be detected end-to-end. |
@triple-Mu yes I mean right here, the ONNX-only export (no TRT), i.e.:
EDIT: Since it seems like the NMS modification is done directly on the ONNX model, perhaps the PR updates are suitable as well for the export_onnx() call on the line shown above. |
Got it.Means that using --nms flag(and score/iou threshold) may export onnx which only used for TRT, remove TRT building in this pr. If so this onnx will be not available for onnxruntime and openvino,and so on |
@glenn-jocher |
@triple-Mu I'd like to handle your two PRs today. But I'm confused as the original PR #6984 was limited in scope to adding trtexec support but now seems expanded. Can you please summarize the changes in each and if they overlap anywhere? Also what's your recommendation, should we merge 1 or the other or both, and if both in which order? |
@glenn-jocher |
@triple-Mu ok got it! Let's close #6984 then and please add the |
It is my pleasure to be able to help you, I have the following questions:
|
@triple-Mu I think the two topics are separate:
|
@glenn-jocher All right! |
@glenn-jocher (torch) ubuntu@y9000p:~/work/yolov5$ python export.py --weights yolov5s.pt --include engine --trtexec
export: data=data/coco128.yaml, weights=['yolov5s.pt'], imgsz=[640, 640], batch_size=1, device=cpu, half=False, inplace=False, train=False, optimize=False, int8=False, dynamic=False, simplify=False, opset=12, verbose=False, workspace=4, trtexec=True, nms=False, agnostic_nms=False, topk_per_class=100, topk_all=100, iou_thres=0.45, conf_thres=0.25, include=['engine']
YOLOv5 🚀 v6.1-224-gba552fe Python-3.8.13 torch-1.11.0+cu115 CPU
Fusing layers...
YOLOv5s summary: 213 layers, 7225885 parameters, 0 gradients
PyTorch: starting from yolov5s.pt with output shape (1, 25200, 85) (14.1 MB)
[05/19/2022-22:43:30] [W] --workspace flag has been deprecated by --memPoolSize flag.
Cuda failure: no CUDA-capable device is detected
Aborted (core dumped)
Traceback (most recent call last):
File "export.py", line 646, in <module>
main(opt)
File "export.py", line 641, in main
run(**vars(opt))
File "/home/ubuntu/miniconda3/envs/torch/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "export.py", line 561, in run
f[1] = export_engine(model, im, file, train, half, simplify, workspace, verbose, trtexec)
File "export.py", line 258, in export_engine
subprocess.check_output(cmd, shell=True)
File "/home/ubuntu/miniconda3/envs/torch/lib/python3.8/subprocess.py", line 415, in check_output
return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
File "/home/ubuntu/miniconda3/envs/torch/lib/python3.8/subprocess.py", line 516, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '/usr/src/tensorrt/bin/trtexec --onnx=yolov5s.onnx --saveEngine=yolov5s.engine --workspace=4096' returned non-zero exit status 134. |
@glenn-jocher |
Recently I re-changed this branch again. |
Would greatly appreciate this feature being rolled into the production version. Exporting Object Detection models to something like ONNX with NMS will allow many people to use light weight frameworks on Edge devices or things like AWS Lambda. Torch is a lot of overhead for just implementing NMS. Edit: I've tested the trtNMS branch to export the model using these arguments: python export.py --weights mymodel.pt --include onnx --nms --conf-thres 0.4 When I inference using onnxruntime, I am getting different results than I am with detect.py. It seems like the conf_thres on the ONNX model has some lower bound of ~0.7. There are no predictions below that. The actual confidence values for each detection do not quite match either. Edit2: It appears it is being limited to 100 response values. I tried modifying the "max_output_boxes" to be 1000 but it still only returns 100 detections per image. Edit3: I needed to modify the --top-k-per-class and --top-k-all to be 100. This yielded more than 100 results. Detections and confidence with onnxruntime don't exactly match but we're in the ballpark. |
Hi, @triple-Mu! Thanks for your amazing work on adding NMS! @wolfpack12 has mentioned that outputs of the models exported with this PR do not exactly match original outputs of Thank you! |
New PR for "ultralytics#7736" Remove not use Format onnxruntime and tensorrt onnx outputs fix unified outputs
|
I re-updated the code of this pr, please try again python3 export.py --weights yolov5s.pt --include onnx --nms trt --iou 0.65 --conf 0.001 --topk-all 300 --simplify For onnxruntime nms export: python3 export.py --weights yolov5s.pt --include onnx --nms ort --iou 0.65 --conf 0.001 --topk-all 300 --simplify For openvino nms export: python3 export.py --weights yolov5s.pt --include openvino --nms ovo --iou 0.65 --conf 0.001 --topk-all 300 --simplify In order to export the model supported by the corresponding backend, you need to specify --nms trt/ort/ovo to export onnx or xml. In addition, you can export models in dynamic shape. You can add python3 export.py --weights yolov5s.pt --include onnx --nms trt --iou 0.65 --conf 0.001 --topk-all 300 --simplify --dynamic batch If you want to export orin yolov5 onnx model with dynamic shape, the cmd is: python3 export.py --weights yolov5s.pt --include onnx -simplify --dynamic You don't need to pass arguments to If you want to export orin yolov5 tflite model with nms, the cmd is: python3 export.py --weights yolov5s.pt --include tflite --nms You don't need to pass arguments to |
New PR for "ultralytics#7736" Remove not use Format onnxruntime and tensorrt onnx outputs fix unified outputs
New PR for "ultralytics#7736" Remove not use Format onnxruntime and tensorrt onnx outputs fix unified outputs
I’ll test in the new year. Just curious, how is this implementation different than yolort? |
New PR for "ultralytics#7736" Remove not use Format onnxruntime and tensorrt onnx outputs fix unified outputs
The update is very close. The detections are off by only a couple (out of ~200 objects). While I drill into the root cause, I noticed a few things: 1. export.py fails on models where the --nms argument is used on export (see error message below)
2. The output of the inference using onnxruntime includes an object with 0 probability and -1 class. I don't recall seeing this before. Here's how I was inferencing:
|
New PR for "ultralytics#7736" Remove not use Format onnxruntime and tensorrt onnx outputs fix unified outputs
Question 1: It should be caused by your use of the Question 2. In order to avoid detecting that there is no object in the picture, such as a randomly generated noise. I added a class of -1, boxes and a result of score 0 for this case in postprocessing. This prevents the network output from being empty. You can use the numeric value of the first output to do a secondary filter on the box and score. It's easy, please refer to my submitted notebook. |
Sorry I had a typo. The error in Question 1 is when detect.py is used. It attempts to run the non_max_suppression function on the custom ONNX model where NMS is part of the graph. Here's the run command:
Here's more granular output of the error:
For Question 2, the notebook is a great addition. Stepping through the process of exporting the model and then inferencing using onnxruntime will be very helpful to others. I suspect the issue I'm having is the conversion of the image to a tensor. I'm trying to execute this within an AWS Lambda Function (this was not trivial to do). The way I was converting the image is different than your method:
|
It seems that you feed an input tensor with shape 512x640. |
@triple-Mu Unfortunately that isn't the issue. I can send it a 640x640 image and the results still don't match. I suspect the issue is the use of letterbox (Still need to confirm). In your example notebook, you import letterbox from YOLOv5 which requires cv2 to be imported. If I want to run this in AWS Lambda, I don't want to import cv2 or torch since it would exceed the 250MB limit. So I'd need to implement using numpy or base python. Will provide results when I dig more into this. |
Maybe you can save the input tensor to your local pc as npy file. |
I added the letterboxing function below. It helps increase the accuracy but its still slightly off.
I call it in my Lambda function using this:
EDIT: I'm increasingly confident this is a resizing/letterbox issue. I've played around with changing the padding color from (114, 114, 114) to (0, 0, 0) and (255, 255, 255). This actually affects the number of calls the model makes! In addition, the scaling method matters. In the code above, the Image.BICUBIC method is used for interpolation on the scaling. In YOLOv5, the letterbox function uses the cv2 code below:
When I change the interpolation to Image.BILINEAR or Image.NEAREST, it makes a significant impact on the number of calls. I think this is the root cause of the problem and I am doubtful I will ever match the output. The take away is that these models are extremely sensitive to very small changes. Scaling method, background color and input size have an unpredictable impact on the model performance. EDIT2: For anyone that is morbidly curious, the difference in interpolation between PIL and CV2 is discuss ad-nauseum here: python-pillow/Pillow#2718 I found that Image.BICUBIC had the closest results to the cv2.resize method used in YOLOv5. I tried Image.BILINEAR since, you know, it should be equivalent to cv2.INTER_LINEAR. But it wasn't! This commentary goes beyond the scope of this issue (exporting NMS for onnxruntime). I believe the branch that @triple-Mu created accomplishes this. The only thing I see that needs to be wrapped up is ensuring NMS-enabled ONNX models can use the detect.py function in YOLOv5 without throwing an error. |
New PR for "ultralytics#7736" Remove not use Format onnxruntime and tensorrt onnx outputs fix unified outputs
👋 Hello there! We wanted to let you know that we've decided to close this pull request due to inactivity. We appreciate the effort you put into contributing to our project, but unfortunately, not all contributions are suitable or aligned with our product roadmap. We hope you understand our decision, and please don't let it discourage you from contributing to open source projects in the future. We value all of our community members and their contributions, and we encourage you to keep exploring new projects and ways to get involved. For additional resources and information, please see the links below:
Thank you for your contributions to YOLO 🚀 and Vision AI ⭐ |
Maybe it is the easiest way for registering EfficientNMS plugin in onnx and building tensorrt engine.
I am inspired by this issue : #6430
🛠️ PR Summary
Made with ❤️ by Ultralytics Actions
WARNING⚠️ this PR is very large, summary may not cover all changes.
🌟 Summary
Ultralytics introduces advanced NMS (Non-Maximum Suppression) export capabilities for ONNX models in YOLOv5.
📊 Key Changes
export_onnx_with_nms
added to handle ONNX export with integrated NMS.onnxruntime-nms-export.ipynb
for demonstration purposes.🎯 Purpose & Impact