Add nms for tensorrt8.0+ / onnxruntime / openvino(the same way as onnxruntime) #7736

triple-Mu · 2022-05-09T07:33:50Z

Maybe it is the easiest way for registering EfficientNMS plugin in onnx and building tensorrt engine.
I am inspired by this issue : #6430

🛠️ PR Summary

_{Made with ❤️ by Ultralytics Actions}

WARNING ⚠️ this PR is very large, summary may not cover all changes.

🌟 Summary

Ultralytics introduces advanced NMS (Non-Maximum Suppression) export capabilities for ONNX models in YOLOv5.

📊 Key Changes

New function export_onnx_with_nms added to handle ONNX export with integrated NMS.
Support for dynamic axes configurations during ONNX export.
Metadata and shape information adjustments for ONNX models to improve compatibility with TensorRT.
New Jupyter notebook onnxruntime-nms-export.ipynb for demonstration purposes.

🎯 Purpose & Impact

Enables users to export YOLOv5 models with NMS directly to ONNX, simplifying deployment in inference engines that support ONNX format.
Enhances cross-platform model deployment potential, particularly beneficial for environments where post-processing outside the model is complex or suboptimal.
The new export functionality with dynamic axes and simplification could lead to performance optimizations in applications using TensorRT and similar acceleration frameworks.

glenn-jocher · 2022-05-09T12:15:46Z

@triple-Mu thanks for the PR, this looks great! Especially like the usage example notebook.

If this works for TRT can it also work for ONNX exports?

triple-Mu · 2022-05-09T12:45:29Z

@triple-Mu thanks for the PR, this looks great! Especially like the usage example notebook.

If this works for TRT can it also work for ONNX exports?

This pr exports onnx by default method, and adds an additional graph structure to make the network output meet the input of the TRT nms plugin, and finally adds the nms plugin to allow the network to be detected end-to-end.
I don't know what you mean by onnx exports? Only the original onnx cannot achieve end-to-end.

glenn-jocher · 2022-05-09T13:51:25Z

@triple-Mu yes I mean right here, the ONNX-only export (no TRT), i.e.:

python export.py --include onnx --nms

EDIT: Since it seems like the NMS modification is done directly on the ONNX model, perhaps the PR updates are suitable as well for the export_onnx() call on the line shown above.

triple-Mu · 2022-05-09T14:28:11Z

@triple-Mu yes I mean right here, the ONNX-only export (no TRT), i.e.:
python export.py --include onnx --nms
EDIT: Since it seems like the NMS modification is done directly on the ONNX model, perhaps the PR updates are suitable as well for the export_onnx() call on the line shown above.

Got it.Means that using --nms flag(and score/iou threshold) may export onnx which only used for TRT, remove TRT building in this pr. If so this onnx will be not available for onnxruntime and openvino,and so on

triple-Mu · 2022-05-09T15:59:54Z

@glenn-jocher
This new pr modifies the onnx export method and adds the judgment of the nms flag. After the exported onnx has been tested, the engine can be directly exported by trtexec. All test code can be seen in notebook.

glenn-jocher · 2022-05-19T11:59:50Z

@triple-Mu I'd like to handle your two PRs today. But I'm confused as the original PR #6984 was limited in scope to adding trtexec support but now seems expanded. Can you please summarize the changes in each and if they overlap anywhere? Also what's your recommendation, should we merge 1 or the other or both, and if both in which order?

triple-Mu · 2022-05-19T13:08:06Z

@triple-Mu I'd like to handle your two PRs today. But I'm confused as the original PR #6984 was limited in scope to adding trtexec support but now seems expanded. Can you please summarize the changes in each and if they overlap anywhere? Also what's your recommendation, should we merge 1 or the other or both, and if both in which order?

Add trtexec TensorRT export #6984

Add TensorRT EfficientNMS plugin register #7736

@glenn-jocher
Thank you for your reply! Pr #6984 is just a simple attempt, using trtexec can directly convert the onnx exported by #7736 into an engine, which is shown in my notebook. Since the onnx exported by #7736 cannot be used together with detect.py, I suggest closing #6984 and adding the documentation for exporting using trtexec for #7736.

glenn-jocher · 2022-05-19T13:12:54Z

@triple-Mu ok got it! Let's close #6984 then and please add the python export.py --include engine --trtexec flag capability to #7736 for trtexec engine exports. Can you do that?

triple-Mu · 2022-05-19T13:20:16Z

@triple-Mu ok got it! Let's close #6984 then and please add the python export.py --include engine --trtexec flag capability to #7736 for trtexec engine exports. Can you do that?

It is my pleasure to be able to help you, I have the following questions:

If use python export.py --include engine --trtexec , does it mean that Add nms for tensorrt8.0+ / onnxruntime / openvino(the same way as onnxruntime) #7736 the function of export_onnx needs to be deleted, which is back to the original version of this pr, and the modified onnx is placed in export_engine.
If the current export_onnx function is still retained, does it mean that I need to call export_onnx and add the "export_engine_with_trtexec" function while executing this command.

glenn-jocher · 2022-05-19T13:55:05Z

@triple-Mu I think the two topics are separate:

--trtexec: I think the original trtexec PR was limited in scope to simply adding a --trtexec flag to export.py which ran export via trtexec command instead of tensorrt pip package install (nothing changed about the exported TensorRT models). python export.py --include engine --trtexec export appeared to work maybe 2x faster than default (i.e. mabe 2 minutes instead of 4 minutes to export), which could be helpful to users exporting many models.
NMS pipelining. This has been a topic of a variety of formats, i.e. CoreML, ONNX and TensorRT where users are looking to deploy without the PyTorch dependency. This PR appears to implement this well for TensorRT so no additional changes should be needed here.

triple-Mu · 2022-05-19T14:27:21Z

@triple-Mu I think the two topics are separate:

--trtexec: I think the original trtexec PR was limited in scope to simply adding a --trtexec flag to export.py which ran export via trtexec command instead of tensorrt pip package install (nothing changed about the exported TensorRT models). python export.py --include engine --trtexec export appeared to work maybe 2x faster than default (i.e. mabe 2 minutes instead of 4 minutes to export), which could be helpful to users exporting many models.

NMS pipelining. This has been a topic of a variety of formats, i.e. CoreML, ONNX and TensorRT where users are looking to deploy without the PyTorch dependency. This PR appears to implement this well for TensorRT so no additional changes should be needed here.

@glenn-jocher All right!
However, after registering NMS, onnx cannot be exported normally using python-tensorrt, because the instruction trt.init_libnvinfer_plugins(trt_logger, namespace="") to introduce plugin namespace needs to be added.
In addition, when the pytorch model is loaded in the main process, it may be affected by problems such as cuda stream.
Exporting by trtexec may require opening a new process.
The above is what I am testing to work on.
In addition, I would like to ask if you have a social account to connect with?

triple-Mu · 2022-05-19T14:38:52Z

@glenn-jocher
I'm not sure why I can't export with the following command --python export.py --weights yolov5s.pt --include engine --trtexec . after adding the above.
If I run this command alone subprocess.check_output(cmd,shell=True) , it executes correctly under the new python file.
So I suspect that it has something to do with pytorch model loading. Is there a conflict between main processes?
Log is as shown:

(torch) ubuntu@y9000p:~/work/yolov5$ python export.py --weights yolov5s.pt --include engine --trtexec
export: data=data/coco128.yaml, weights=['yolov5s.pt'], imgsz=[640, 640], batch_size=1, device=cpu, half=False, inplace=False, train=False, optimize=False, int8=False, dynamic=False, simplify=False, opset=12, verbose=False, workspace=4, trtexec=True, nms=False, agnostic_nms=False, topk_per_class=100, topk_all=100, iou_thres=0.45, conf_thres=0.25, include=['engine']
YOLOv5 🚀 v6.1-224-gba552fe Python-3.8.13 torch-1.11.0+cu115 CPU

Fusing layers... 
YOLOv5s summary: 213 layers, 7225885 parameters, 0 gradients

PyTorch: starting from yolov5s.pt with output shape (1, 25200, 85) (14.1 MB)
[05/19/2022-22:43:30] [W] --workspace flag has been deprecated by --memPoolSize flag.
Cuda failure: no CUDA-capable device is detected
Aborted (core dumped)
Traceback (most recent call last):
  File "export.py", line 646, in <module>
    main(opt)
  File "export.py", line 641, in main
    run(**vars(opt))
  File "/home/ubuntu/miniconda3/envs/torch/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "export.py", line 561, in run
    f[1] = export_engine(model, im, file, train, half, simplify, workspace, verbose, trtexec)
  File "export.py", line 258, in export_engine
    subprocess.check_output(cmd, shell=True)
  File "/home/ubuntu/miniconda3/envs/torch/lib/python3.8/subprocess.py", line 415, in check_output
    return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
  File "/home/ubuntu/miniconda3/envs/torch/lib/python3.8/subprocess.py", line 516, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '/usr/src/tensorrt/bin/trtexec --onnx=yolov5s.onnx --saveEngine=yolov5s.engine --workspace=4096' returned non-zero exit status 134.

triple-Mu · 2022-05-22T08:04:02Z

@glenn-jocher
Good NEWS!
I recently tried adding NMS to other inference backends like onnxruntime and openvino, and the results were astounding!
Just modifying a part of the onnx graph can achieve a very good effect. It is worth mentioning that although the --export-type ort flag can be turned on to export the graph with nms, some post-processing operations are still required.
I did not completely place the post-processing in the graph, which may cause the network to output too many tensors.
You can get all result in notebooks~

triple-Mu · 2022-09-04T14:37:31Z

Recently I re-changed this branch again.
Could you please review this pr?

wolfpack12 · 2022-12-21T21:27:52Z

Would greatly appreciate this feature being rolled into the production version. Exporting Object Detection models to something like ONNX with NMS will allow many people to use light weight frameworks on Edge devices or things like AWS Lambda. Torch is a lot of overhead for just implementing NMS.

Edit: I've tested the trtNMS branch to export the model using these arguments:

python export.py --weights mymodel.pt --include onnx --nms --conf-thres 0.4

When I inference using onnxruntime, I am getting different results than I am with detect.py. It seems like the conf_thres on the ONNX model has some lower bound of ~0.7. There are no predictions below that. The actual confidence values for each detection do not quite match either.

Edit2: It appears it is being limited to 100 response values. I tried modifying the "max_output_boxes" to be 1000 but it still only returns 100 detections per image.

Edit3: I needed to modify the --top-k-per-class and --top-k-all to be 100. This yielded more than 100 results. Detections and confidence with onnxruntime don't exactly match but we're in the ballpark.

pokidyshev · 2022-12-27T03:26:04Z

Hi, @triple-Mu! Thanks for your amazing work on adding NMS!

@wolfpack12 has mentioned that outputs of the models exported with this PR do not exactly match original outputs of .pt model. Have you confirmed that models exported with this PR work properly and have same or close outputs to the original? I'm especially interested in the TensorRT version.

Thank you!

New PR for "ultralytics#7736" Remove not use Format onnxruntime and tensorrt onnx outputs fix unified outputs

triple-Mu · 2022-12-27T09:02:48Z

triple-Mu · 2022-12-27T09:18:59Z

Would greatly appreciate this feature being rolled into the production version. Exporting Object Detection models to something like ONNX with NMS will allow many people to use light weight frameworks on Edge devices or things like AWS Lambda. Torch is a lot of overhead for just implementing NMS.

Edit: I've tested the trtNMS branch to export the model using these arguments:

python export.py --weights mymodel.pt --include onnx --nms --conf-thres 0.4

When I inference using onnxruntime, I am getting different results than I am with detect.py. It seems like the conf_thres on the ONNX model has some lower bound of ~0.7. There are no predictions below that. The actual confidence values for each detection do not quite match either.

Edit2: It appears it is being limited to 100 response values. I tried modifying the "max_output_boxes" to be 1000 but it still only returns 100 detections per image.

Edit3: I needed to modify the --top-k-per-class and --top-k-all to be 100. This yielded more than 100 results. Detections and confidence with onnxruntime don't exactly match but we're in the ballpark.

I re-updated the code of this pr, please try again
Usage:
For tensorrt nms export:

python3 export.py --weights yolov5s.pt --include onnx --nms trt --iou 0.65 --conf 0.001 --topk-all 300 --simplify

For onnxruntime nms export:

python3 export.py --weights yolov5s.pt --include onnx --nms ort --iou 0.65 --conf 0.001 --topk-all 300 --simplify

For openvino nms export:

python3 export.py --weights yolov5s.pt --include openvino --nms ovo --iou 0.65 --conf 0.001 --topk-all 300 --simplify

In order to export the model supported by the corresponding backend, you need to specify --nms trt/ort/ovo to export onnx or xml.
Of course, onnx is a product that must be generated.

In addition, you can export models in dynamic shape. You can add --dynamic batch or --dynamic all to export dynamic batch or dynamic axes onnx first.
An example onnx for TensorRT export cmd is

python3 export.py --weights yolov5s.pt --include onnx --nms trt --iou 0.65 --conf 0.001 --topk-all 300 --simplify --dynamic batch

If you want to export orin yolov5 onnx model with dynamic shape, the cmd is:

python3 export.py --weights yolov5s.pt --include onnx -simplify --dynamic

You don't need to pass arguments to --dynamic

If you want to export orin yolov5 tflite model with nms, the cmd is:

python3 export.py --weights yolov5s.pt --include tflite  --nms

You don't need to pass arguments to --nms.

New PR for "ultralytics#7736" Remove not use Format onnxruntime and tensorrt onnx outputs fix unified outputs

wolfpack12 · 2022-12-27T16:14:29Z

I’ll test in the new year. Just curious, how is this implementation different than yolort?

New PR for "ultralytics#7736" Remove not use Format onnxruntime and tensorrt onnx outputs fix unified outputs

wolfpack12 · 2023-01-03T20:47:42Z

The update is very close. The detections are off by only a couple (out of ~200 objects). While I drill into the root cause, I noticed a few things:

1. export.py fails on models where the --nms argument is used on export (see error message below)

  nc = prediction.shape[2] - nm - 5  # number of classes
  IndexError: tuple index out of range

2. The output of the inference using onnxruntime includes an object with 0 probability and -1 class. I don't recall seeing this before. Here's how I was inferencing:

ort_session = onnxruntime.InferenceSession(model, providers = ['CPUExecutionProvider'])
ort_inputs = {ort_session.get_inputs()[0].name: image}
ort_outs = ort_session.run(None, ort_inputs)
img_out_y = ort_outs

New PR for "ultralytics#7736" Remove not use Format onnxruntime and tensorrt onnx outputs fix unified outputs

triple-Mu · 2023-01-04T02:18:11Z

The update is very close. The detections are off by only a couple (out of ~200 objects). While I drill into the root cause, I noticed a few things:

1. export.py fails on models where the --nms argument is used on export (see error message below)
  nc = prediction.shape[2] - nm - 5  # number of classes
  IndexError: tuple index out of range
2. The output of the inference using onnxruntime includes an object with 0 probability and -1 class. I don't recall seeing this before. Here's how I was inferencing:
ort_session = onnxruntime.InferenceSession(model, providers = ['CPUExecutionProvider'])
ort_inputs = {ort_session.get_inputs()[0].name: image}
ort_outs = ort_session.run(None, ort_inputs)
img_out_y = ort_outs

Question 1: It should be caused by your use of the non_max_suppression function. This shouldn't happen when export.py is executed, can you provide a run command?

Question 2. In order to avoid detecting that there is no object in the picture, such as a randomly generated noise. I added a class of -1, boxes and a result of score 0 for this case in postprocessing. This prevents the network output from being empty. You can use the numeric value of the first output to do a secondary filter on the box and score. It's easy, please refer to my submitted notebook.

wolfpack12 · 2023-01-04T14:53:47Z

The update is very close. The detections are off by only a couple (out of ~200 objects). While I drill into the root cause, I noticed a few things:
1. export.py fails on models where the --nms argument is used on export (see error message below)
  nc = prediction.shape[2] - nm - 5  # number of classes
  IndexError: tuple index out of range
2. The output of the inference using onnxruntime includes an object with 0 probability and -1 class. I don't recall seeing this before. Here's how I was inferencing:
ort_session = onnxruntime.InferenceSession(model, providers = ['CPUExecutionProvider'])
ort_inputs = {ort_session.get_inputs()[0].name: image}
ort_outs = ort_session.run(None, ort_inputs)
img_out_y = ort_outs
The update is very close. The detections are off by only a couple (out of ~200 objects). While I drill into the root cause, I noticed a few things:
1. export.py fails on models where the --nms argument is used on export (see error message below)
  nc = prediction.shape[2] - nm - 5  # number of classes
  IndexError: tuple index out of range
2. The output of the inference using onnxruntime includes an object with 0 probability and -1 class. I don't recall seeing this before. Here's how I was inferencing:
ort_session = onnxruntime.InferenceSession(model, providers = ['CPUExecutionProvider'])
ort_inputs = {ort_session.get_inputs()[0].name: image}
ort_outs = ort_session.run(None, ort_inputs)
img_out_y = ort_outs
Question 1: It should be caused by your use of the non_max_suppression function. This shouldn't happen when export.py is executed, can you provide a run command?

Question 2. In order to avoid detecting that there is no object in the picture, such as a randomly generated noise. I added a class of -1, boxes and a result of score 0 for this case in postprocessing. This prevents the network output from being empty. You can use the numeric value of the first output to do a secondary filter on the box and score. It's easy, please refer to my submitted notebook.

Sorry I had a typo. The error in Question 1 is when detect.py is used. It attempts to run the non_max_suppression function on the custom ONNX model where NMS is part of the graph.

Here's the run command:

python detect.py --weights weights/model1.onnx --source image1.tif --conf-thres 0.4 --imgsz 512 640 --save-txt --iou-thres 0.45

Here's more granular output of the error:

Loading weights/model1.onnx for ONNX Runtime inference...
Traceback (most recent call last):
  File "/home/user/onnxexportyolov5/yolov5/detect.py", line 261, in <module>
    main(opt)
  File "/home/user/onnxexportyolov5/yolov5/detect.py", line 256, in main
    run(**vars(opt))
  File "/home/user/.local/lib/python3.9/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/home/user/onnxexportyolov5/yolov5/detect.py", line 132, in run
    pred = non_max_suppression(pred, conf_thres, iou_thres, classes, agnostic_nms, max_det=max_det)
  File "/home/user/onnxexportyolov5/yolov5/utils/general.py", line 912, in non_max_suppression
    nc = prediction.shape[2] - nm - 5  # number of classes
IndexError: tuple index out of range

For Question 2, the notebook is a great addition. Stepping through the process of exporting the model and then inferencing using onnxruntime will be very helpful to others. I suspect the issue I'm having is the conversion of the image to a tensor. I'm trying to execute this within an AWS Lambda Function (this was not trivial to do). The way I was converting the image is different than your method:

imageStream = io.BytesIO(binary_content[0])
imageFile = Image.open(imageStream).convert('RGB').resize((512, 640))
imageFile_Array = np.asarray(imageFile).astype('float32') / 255.0
imageFile_Array = imageFile_Array[None]
imageFile_Array = np.transpose(imageFile_Array, [0, 3, 1, 2])

triple-Mu · 2023-01-04T15:54:23Z

The update is very close. The detections are off by only a couple (out of ~200 objects). While I drill into the root cause, I noticed a few things:

1. export.py fails on models where the --nms argument is used on export (see error message below)
nc = prediction.shape[2] - nm - 5 # number of classes

IndexError: tuple index out of range
2. The output of the inference using onnxruntime includes an object with 0 probability and -1 class. I don't recall seeing this before. Here's how I was inferencing:
ort_session = onnxruntime.InferenceSession(model, providers = ['CPUExecutionProvider'])

ort_inputs = {ort_session.get_inputs()[0].name: image}

ort_outs = ort_session.run(None, ort_inputs)

img_out_y = ort_outs
The update is very close. The detections are off by only a couple (out of ~200 objects). While I drill into the root cause, I noticed a few things:

1. export.py fails on models where the --nms argument is used on export (see error message below)
nc = prediction.shape[2] - nm - 5 # number of classes

IndexError: tuple index out of range
2. The output of the inference using onnxruntime includes an object with 0 probability and -1 class. I don't recall seeing this before. Here's how I was inferencing:
ort_session = onnxruntime.InferenceSession(model, providers = ['CPUExecutionProvider'])

ort_inputs = {ort_session.get_inputs()[0].name: image}

ort_outs = ort_session.run(None, ort_inputs)

img_out_y = ort_outs
Question 1: It should be caused by your use of the non_max_suppression function. This shouldn't happen when export.py is executed, can you provide a run command?

Question 2. In order to avoid detecting that there is no object in the picture, such as a randomly generated noise. I added a class of -1, boxes and a result of score 0 for this case in postprocessing. This prevents the network output from being empty. You can use the numeric value of the first output to do a secondary filter on the box and score. It's easy, please refer to my submitted notebook.

Sorry I had a typo. The error in Question 1 is when detect.py is used. It attempts to run the non_max_suppression function on the custom ONNX model where NMS is part of the graph.

Here's the run command:

python detect.py --weights weights/model1.onnx --source image1.tif --conf-thres 0.4 --imgsz 512 640 --save-txt --iou-thres 0.45

Here's more granular output of the error:
Loading weights/model1.onnx for ONNX Runtime inference...

Traceback (most recent call last):

  File "/home/user/onnxexportyolov5/yolov5/detect.py", line 261, in <module>

    main(opt)

  File "/home/user/onnxexportyolov5/yolov5/detect.py", line 256, in main

    run(**vars(opt))

  File "/home/user/.local/lib/python3.9/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context

    return func(*args, **kwargs)

  File "/home/user/onnxexportyolov5/yolov5/detect.py", line 132, in run

    pred = non_max_suppression(pred, conf_thres, iou_thres, classes, agnostic_nms, max_det=max_det)

  File "/home/user/onnxexportyolov5/yolov5/utils/general.py", line 912, in non_max_suppression

    nc = prediction.shape[2] - nm - 5  # number of classes

IndexError: tuple index out of range
For Question 2, the notebook is a great addition. Stepping through the process of exporting the model and then inferencing using onnxruntime will be very helpful to others. I suspect the issue I'm having is the conversion of the image to a tensor. I'm trying to execute this within an AWS Lambda Function (this was not trivial to do). The way I was converting the image is different than your method:
imageStream = io.BytesIO(binary_content[0])

imageFile = Image.open(imageStream).convert('RGB').resize((512, 640))

imageFile_Array = np.asarray(imageFile).astype('float32') / 255.0

imageFile_Array = imageFile_Array[None]

imageFile_Array = np.transpose(imageFile_Array, [0, 3, 1, 2])

It seems that you feed an input tensor with shape 512x640.
Because of we export onnx with shape 640x640, if you feed a wrong shape tensor, it won't work.

wolfpack12 · 2023-01-04T16:08:38Z

@triple-Mu Unfortunately that isn't the issue. I can send it a 640x640 image and the results still don't match. I suspect the issue is the use of letterbox (Still need to confirm). In your example notebook, you import letterbox from YOLOv5 which requires cv2 to be imported. If I want to run this in AWS Lambda, I don't want to import cv2 or torch since it would exceed the 250MB limit. So I'd need to implement using numpy or base python. Will provide results when I dig more into this.

triple-Mu · 2023-01-04T16:20:21Z

@triple-Mu Unfortunately that isn't the issue. I can send it a 640x640 image and the results still don't match. I suspect the issue is the use of letterbox (Still need to confirm). In your example notebook, you import letterbox from YOLOv5 which requires cv2 to be imported. If I want to run this in AWS Lambda, I don't want to import cv2 or torch since it would exceed the 250MB limit. So I'd need to implement using numpy or base python. Will provide results when I dig more into this.

Maybe you can save the input tensor to your local pc as npy file.
Besides, I suggest you using np.ascontiguousarray when transpose a ndarray.
I have no idea about the code you provide. It seems so simple.

wolfpack12 · 2023-01-04T17:35:17Z

I added the letterboxing function below. It helps increase the accuracy but its still slightly off.

def letterbox_image(image, size):
    iw, ih = image.size
    w, h = size
    scale = min(w/iw, h/ih)
    nw = int(iw*scale)
    nh = int(ih*scale)

    image = image.resize((nw,nh), Image.BICUBIC)
    new_image = Image.new('RGB', size, (114,114,114))
    new_image.paste(image, ((w-nw)//2, (h-nh)//2))
    return new_image

I call it in my Lambda function using this:

imageFile = letterbox_image(Image.open(imageStream), (640, 640) )

EDIT: I'm increasingly confident this is a resizing/letterbox issue. I've played around with changing the padding color from (114, 114, 114) to (0, 0, 0) and (255, 255, 255). This actually affects the number of calls the model makes!

In addition, the scaling method matters. In the code above, the Image.BICUBIC method is used for interpolation on the scaling. In YOLOv5, the letterbox function uses the cv2 code below:

im = cv2.resize(im, new_unpad, interpolation=cv2.INTER_LINEAR)

When I change the interpolation to Image.BILINEAR or Image.NEAREST, it makes a significant impact on the number of calls. I think this is the root cause of the problem and I am doubtful I will ever match the output.

The take away is that these models are extremely sensitive to very small changes. Scaling method, background color and input size have an unpredictable impact on the model performance.

EDIT2: For anyone that is morbidly curious, the difference in interpolation between PIL and CV2 is discuss ad-nauseum here: python-pillow/Pillow#2718

I found that Image.BICUBIC had the closest results to the cv2.resize method used in YOLOv5. I tried Image.BILINEAR since, you know, it should be equivalent to cv2.INTER_LINEAR. But it wasn't!

This commentary goes beyond the scope of this issue (exporting NMS for onnxruntime). I believe the branch that @triple-Mu created accomplishes this. The only thing I see that needs to be wrapped up is ensuring NMS-enabled ONNX models can use the detect.py function in YOLOv5 without throwing an error.

New PR for "ultralytics#7736" Remove not use Format onnxruntime and tensorrt onnx outputs fix unified outputs

github-actions · 2023-10-03T00:21:29Z

👋 Hello there! We wanted to let you know that we've decided to close this pull request due to inactivity. We appreciate the effort you put into contributing to our project, but unfortunately, not all contributions are suitable or aligned with our product roadmap.

We hope you understand our decision, and please don't let it discourage you from contributing to open source projects in the future. We value all of our community members and their contributions, and we encourage you to keep exploring new projects and ways to get involved.

For additional resources and information, please see the links below:

Docs: https://docs.ultralytics.com
HUB: https://hub.ultralytics.com
Community: https://community.ultralytics.com

Thank you for your contributions to YOLO 🚀 and Vision AI ⭐

triple-Mu force-pushed the trtNMS branch from fd22bfa to 2914980 Compare May 9, 2022 15:35

glenn-jocher changed the title ~~Add TensorRT EfficientNMS plugin regiseter~~ Add TensorRT EfficientNMS plugin register May 10, 2022

glenn-jocher mentioned this pull request May 13, 2022

Add trtexec TensorRT export #6984

Closed

glenn-jocher added the TODO High priority items label May 19, 2022

glenn-jocher mentioned this pull request May 19, 2022

ONNX Output format #3387

Closed

zhiqwang mentioned this pull request May 20, 2022

C++ inference pipeline for TensorRT #7892

Closed

d57montes mentioned this pull request May 20, 2022

Add nms and agnostic nms to export.py #5938

Merged

triple-Mu changed the title ~~Add TensorRT EfficientNMS plugin register~~ Add nms for tensorrt8.0+ / onnxruntime / openvino(the same way as onnxruntime) May 22, 2022

triple-Mu mentioned this pull request Jun 4, 2022

New way for register nms in onnx for tensorrt onnxruntime openvino #8101

Closed

glenn-jocher mentioned this pull request Jun 13, 2022

running default yolov5 on jetson nano, but the fps is just under 1 fps #8184

Closed

triple-Mu added a commit to triple-Mu/yolov5 that referenced this pull request Sep 4, 2022

New PR for "ultralytics#7736"

99de36c

triple-Mu force-pushed the trtNMS branch from 3beac23 to ae49fd7 Compare September 4, 2022 14:35

errx pushed a commit to errx/yolov5 that referenced this pull request Nov 23, 2022

New PR for "ultralytics#7736"

7335f28

triple-Mu added a commit to triple-Mu/yolov5 that referenced this pull request Dec 26, 2022

New PR for "ultralytics#7736"

87d81d3

triple-Mu added a commit to triple-Mu/yolov5 that referenced this pull request Dec 27, 2022

New PR for "ultralytics#7736"

1fdf2f1

triple-Mu added a commit to triple-Mu/yolov5 that referenced this pull request Dec 27, 2022

This is a combination of 5 commits.

60cb102

New PR for "ultralytics#7736" Remove not use Format onnxruntime and tensorrt onnx outputs fix unified outputs

triple-Mu closed this Dec 27, 2022

triple-Mu reopened this Dec 27, 2022

triple-Mu force-pushed the trtNMS branch from d4b72b0 to 60cb102 Compare December 27, 2022 09:06

triple-Mu added a commit to triple-Mu/yolov5 that referenced this pull request Dec 27, 2022

This is a combination of 5 commits.

258c698

New PR for "ultralytics#7736" Remove not use Format onnxruntime and tensorrt onnx outputs fix unified outputs

triple-Mu added a commit to triple-Mu/yolov5 that referenced this pull request Dec 27, 2022

This is a combination of 5 commits.

b2234c4

New PR for "ultralytics#7736" Remove not use Format onnxruntime and tensorrt onnx outputs fix unified outputs

triple-Mu force-pushed the trtNMS branch from f8d117e to b2234c4 Compare December 27, 2022 15:05

triple-Mu added a commit to triple-Mu/yolov5 that referenced this pull request Dec 28, 2022

This is a combination of 5 commits.

303b9ac

New PR for "ultralytics#7736" Remove not use Format onnxruntime and tensorrt onnx outputs fix unified outputs

triple-Mu force-pushed the trtNMS branch from b2234c4 to 303b9ac Compare December 28, 2022 04:03

wolfpack12 mentioned this pull request Jan 3, 2023

NMS without PyTorch for exported models #10652

Closed

1 task

triple-Mu added a commit to triple-Mu/yolov5 that referenced this pull request Jan 4, 2023

This is a combination of 5 commits.

2d557f2

New PR for "ultralytics#7736" Remove not use Format onnxruntime and tensorrt onnx outputs fix unified outputs

triple-Mu force-pushed the trtNMS branch from 269228d to 0028d95 Compare January 4, 2023 02:10

triple-Mu added 2 commits February 2, 2023 18:23

This is a combination of 5 commits.

12dbede

New PR for "ultralytics#7736" Remove not use Format onnxruntime and tensorrt onnx outputs fix unified outputs

Add an example notebook

2ce018e

triple-Mu force-pushed the trtNMS branch from 6066462 to 2ce018e Compare February 2, 2023 10:23

Merge branch 'ultralytics:master' into trtNMS

be3076b

github-actions bot added the Stale Stale and schedule for closing soon label Oct 3, 2023

github-actions bot closed this Nov 3, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add nms for tensorrt8.0+ / onnxruntime / openvino(the same way as onnxruntime) #7736

Add nms for tensorrt8.0+ / onnxruntime / openvino(the same way as onnxruntime) #7736

triple-Mu commented May 9, 2022 •

edited by UltralyticsAssistant

Loading

glenn-jocher commented May 9, 2022

triple-Mu commented May 9, 2022

glenn-jocher commented May 9, 2022 •

edited

Loading

triple-Mu commented May 9, 2022

triple-Mu commented May 9, 2022 •

edited

Loading

glenn-jocher commented May 19, 2022

triple-Mu commented May 19, 2022

glenn-jocher commented May 19, 2022

triple-Mu commented May 19, 2022 •

edited

Loading

glenn-jocher commented May 19, 2022 •

edited

Loading

triple-Mu commented May 19, 2022 •

edited

Loading

triple-Mu commented May 19, 2022 •

edited

Loading

triple-Mu commented May 22, 2022

triple-Mu commented Sep 4, 2022

wolfpack12 commented Dec 21, 2022 •

edited

Loading

pokidyshev commented Dec 27, 2022

triple-Mu commented Dec 27, 2022

triple-Mu commented Dec 27, 2022

wolfpack12 commented Dec 27, 2022

wolfpack12 commented Jan 3, 2023

triple-Mu commented Jan 4, 2023

wolfpack12 commented Jan 4, 2023

triple-Mu commented Jan 4, 2023

wolfpack12 commented Jan 4, 2023

triple-Mu commented Jan 4, 2023

wolfpack12 commented Jan 4, 2023 •

edited

Loading

github-actions bot commented Oct 3, 2023

Add nms for tensorrt8.0+ / onnxruntime / openvino(the same way as onnxruntime) #7736

Add nms for tensorrt8.0+ / onnxruntime / openvino(the same way as onnxruntime) #7736

Conversation

triple-Mu commented May 9, 2022 • edited by UltralyticsAssistant Loading

🛠️ PR Summary

🌟 Summary

📊 Key Changes

🎯 Purpose & Impact

glenn-jocher commented May 9, 2022

triple-Mu commented May 9, 2022

glenn-jocher commented May 9, 2022 • edited Loading

triple-Mu commented May 9, 2022

triple-Mu commented May 9, 2022 • edited Loading

glenn-jocher commented May 19, 2022

triple-Mu commented May 19, 2022

glenn-jocher commented May 19, 2022

triple-Mu commented May 19, 2022 • edited Loading

glenn-jocher commented May 19, 2022 • edited Loading

triple-Mu commented May 19, 2022 • edited Loading

triple-Mu commented May 19, 2022 • edited Loading

triple-Mu commented May 22, 2022

triple-Mu commented Sep 4, 2022

wolfpack12 commented Dec 21, 2022 • edited Loading

pokidyshev commented Dec 27, 2022

triple-Mu commented Dec 27, 2022

triple-Mu commented Dec 27, 2022

wolfpack12 commented Dec 27, 2022

wolfpack12 commented Jan 3, 2023

triple-Mu commented Jan 4, 2023

wolfpack12 commented Jan 4, 2023

triple-Mu commented Jan 4, 2023

wolfpack12 commented Jan 4, 2023

triple-Mu commented Jan 4, 2023

wolfpack12 commented Jan 4, 2023 • edited Loading

github-actions bot commented Oct 3, 2023

triple-Mu commented May 9, 2022 •

edited by UltralyticsAssistant

Loading

glenn-jocher commented May 9, 2022 •

edited

Loading

triple-Mu commented May 9, 2022 •

edited

Loading

triple-Mu commented May 19, 2022 •

edited

Loading

glenn-jocher commented May 19, 2022 •

edited

Loading

triple-Mu commented May 19, 2022 •

edited

Loading

triple-Mu commented May 19, 2022 •

edited

Loading

wolfpack12 commented Dec 21, 2022 •

edited

Loading

wolfpack12 commented Jan 4, 2023 •

edited

Loading