Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add nms for tensorrt8.0+ / onnxruntime / openvino(the same way as onnxruntime) #7736

Closed
wants to merge 3 commits into from

Conversation

triple-Mu
Copy link
Contributor

@triple-Mu triple-Mu commented May 9, 2022

Maybe it is the easiest way for registering EfficientNMS plugin in onnx and building tensorrt engine.
I am inspired by this issue : #6430

🛠️ PR Summary

Made with ❤️ by Ultralytics Actions

WARNING ⚠️ this PR is very large, summary may not cover all changes.

🌟 Summary

Ultralytics introduces advanced NMS (Non-Maximum Suppression) export capabilities for ONNX models in YOLOv5.

📊 Key Changes

  • New function export_onnx_with_nms added to handle ONNX export with integrated NMS.
  • Support for dynamic axes configurations during ONNX export.
  • Metadata and shape information adjustments for ONNX models to improve compatibility with TensorRT.
  • New Jupyter notebook onnxruntime-nms-export.ipynb for demonstration purposes.

🎯 Purpose & Impact

  • Enables users to export YOLOv5 models with NMS directly to ONNX, simplifying deployment in inference engines that support ONNX format.
  • Enhances cross-platform model deployment potential, particularly beneficial for environments where post-processing outside the model is complex or suboptimal.
  • The new export functionality with dynamic axes and simplification could lead to performance optimizations in applications using TensorRT and similar acceleration frameworks.

@glenn-jocher
Copy link
Member

@triple-Mu thanks for the PR, this looks great! Especially like the usage example notebook.

If this works for TRT can it also work for ONNX exports?

@triple-Mu
Copy link
Contributor Author

@triple-Mu thanks for the PR, this looks great! Especially like the usage example notebook.

If this works for TRT can it also work for ONNX exports?

This pr exports onnx by default method, and adds an additional graph structure to make the network output meet the input of the TRT nms plugin, and finally adds the nms plugin to allow the network to be detected end-to-end.
I don't know what you mean by onnx exports? Only the original onnx cannot achieve end-to-end.

@glenn-jocher
Copy link
Member

glenn-jocher commented May 9, 2022

@triple-Mu yes I mean right here, the ONNX-only export (no TRT), i.e.:

python export.py --include onnx --nms

Screen Shot 2022-05-09 at 3 49 56 PM

EDIT: Since it seems like the NMS modification is done directly on the ONNX model, perhaps the PR updates are suitable as well for the export_onnx() call on the line shown above.

@triple-Mu
Copy link
Contributor Author

@triple-Mu yes I mean right here, the ONNX-only export (no TRT), i.e.:

python export.py --include onnx --nms
Screen Shot 2022-05-09 at 3 49 56 PM

EDIT: Since it seems like the NMS modification is done directly on the ONNX model, perhaps the PR updates are suitable as well for the export_onnx() call on the line shown above.

Got it.Means that using --nms flag(and score/iou threshold) may export onnx which only used for TRT, remove TRT building in this pr. If so this onnx will be not available for onnxruntime and openvino,and so on

@triple-Mu
Copy link
Contributor Author

triple-Mu commented May 9, 2022

@glenn-jocher
This new pr modifies the onnx export method and adds the judgment of the nms flag. After the exported onnx has been tested, the engine can be directly exported by trtexec. All test code can be seen in notebook.

@glenn-jocher glenn-jocher changed the title Add TensorRT EfficientNMS plugin regiseter Add TensorRT EfficientNMS plugin register May 10, 2022
@glenn-jocher
Copy link
Member

@triple-Mu I'd like to handle your two PRs today. But I'm confused as the original PR #6984 was limited in scope to adding trtexec support but now seems expanded. Can you please summarize the changes in each and if they overlap anywhere? Also what's your recommendation, should we merge 1 or the other or both, and if both in which order?

@glenn-jocher glenn-jocher added the TODO High priority items label May 19, 2022
@glenn-jocher glenn-jocher mentioned this pull request May 19, 2022
@triple-Mu
Copy link
Contributor Author

@triple-Mu I'd like to handle your two PRs today. But I'm confused as the original PR #6984 was limited in scope to adding trtexec support but now seems expanded. Can you please summarize the changes in each and if they overlap anywhere? Also what's your recommendation, should we merge 1 or the other or both, and if both in which order?

@glenn-jocher
Thank you for your reply! Pr #6984 is just a simple attempt, using trtexec can directly convert the onnx exported by #7736 into an engine, which is shown in my notebook. Since the onnx exported by #7736 cannot be used together with detect.py, I suggest closing #6984 and adding the documentation for exporting using trtexec for #7736.

@glenn-jocher
Copy link
Member

@triple-Mu ok got it! Let's close #6984 then and please add the python export.py --include engine --trtexec flag capability to #7736 for trtexec engine exports. Can you do that?

@triple-Mu
Copy link
Contributor Author

triple-Mu commented May 19, 2022

@triple-Mu ok got it! Let's close #6984 then and please add the python export.py --include engine --trtexec flag capability to #7736 for trtexec engine exports. Can you do that?

It is my pleasure to be able to help you, I have the following questions:

  1. If use python export.py --include engine --trtexec , does it mean that Add nms for tensorrt8.0+ / onnxruntime / openvino(the same way as onnxruntime) #7736 the function of export_onnx needs to be deleted, which is back to the original version of this pr, and the modified onnx is placed in export_engine.

  2. If the current export_onnx function is still retained, does it mean that I need to call export_onnx and add the "export_engine_with_trtexec" function while executing this command.

@glenn-jocher
Copy link
Member

glenn-jocher commented May 19, 2022

@triple-Mu I think the two topics are separate:

  1. --trtexec: I think the original trtexec PR was limited in scope to simply adding a --trtexec flag to export.py which ran export via trtexec command instead of tensorrt pip package install (nothing changed about the exported TensorRT models). python export.py --include engine --trtexec export appeared to work maybe 2x faster than default (i.e. mabe 2 minutes instead of 4 minutes to export), which could be helpful to users exporting many models.

  2. NMS pipelining. This has been a topic of a variety of formats, i.e. CoreML, ONNX and TensorRT where users are looking to deploy without the PyTorch dependency. This PR appears to implement this well for TensorRT so no additional changes should be needed here.

@triple-Mu
Copy link
Contributor Author

triple-Mu commented May 19, 2022

@triple-Mu I think the two topics are separate:

  1. --trtexec: I think the original trtexec PR was limited in scope to simply adding a --trtexec flag to export.py which ran export via trtexec command instead of tensorrt pip package install (nothing changed about the exported TensorRT models). python export.py --include engine --trtexec export appeared to work maybe 2x faster than default (i.e. mabe 2 minutes instead of 4 minutes to export), which could be helpful to users exporting many models.
  2. NMS pipelining. This has been a topic of a variety of formats, i.e. CoreML, ONNX and TensorRT where users are looking to deploy without the PyTorch dependency. This PR appears to implement this well for TensorRT so no additional changes should be needed here.

@glenn-jocher All right!
However, after registering NMS, onnx cannot be exported normally using python-tensorrt, because the instruction trt.init_libnvinfer_plugins(trt_logger, namespace="") to introduce plugin namespace needs to be added.
In addition, when the pytorch model is loaded in the main process, it may be affected by problems such as cuda stream.
Exporting by trtexec may require opening a new process.
The above is what I am testing to work on.
In addition, I would like to ask if you have a social account to connect with?

@triple-Mu
Copy link
Contributor Author

triple-Mu commented May 19, 2022

@glenn-jocher
I'm not sure why I can't export with the following command --python export.py --weights yolov5s.pt --include engine --trtexec . after adding the above.
If I run this command alone subprocess.check_output(cmd,shell=True) , it executes correctly under the new python file.
So I suspect that it has something to do with pytorch model loading. Is there a conflict between main processes?
Log is as shown:

(torch) ubuntu@y9000p:~/work/yolov5$ python export.py --weights yolov5s.pt --include engine --trtexec
export: data=data/coco128.yaml, weights=['yolov5s.pt'], imgsz=[640, 640], batch_size=1, device=cpu, half=False, inplace=False, train=False, optimize=False, int8=False, dynamic=False, simplify=False, opset=12, verbose=False, workspace=4, trtexec=True, nms=False, agnostic_nms=False, topk_per_class=100, topk_all=100, iou_thres=0.45, conf_thres=0.25, include=['engine']
YOLOv5 🚀 v6.1-224-gba552fe Python-3.8.13 torch-1.11.0+cu115 CPU

Fusing layers... 
YOLOv5s summary: 213 layers, 7225885 parameters, 0 gradients

PyTorch: starting from yolov5s.pt with output shape (1, 25200, 85) (14.1 MB)
[05/19/2022-22:43:30] [W] --workspace flag has been deprecated by --memPoolSize flag.
Cuda failure: no CUDA-capable device is detected
Aborted (core dumped)
Traceback (most recent call last):
  File "export.py", line 646, in <module>
    main(opt)
  File "export.py", line 641, in main
    run(**vars(opt))
  File "/home/ubuntu/miniconda3/envs/torch/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "export.py", line 561, in run
    f[1] = export_engine(model, im, file, train, half, simplify, workspace, verbose, trtexec)
  File "export.py", line 258, in export_engine
    subprocess.check_output(cmd, shell=True)
  File "/home/ubuntu/miniconda3/envs/torch/lib/python3.8/subprocess.py", line 415, in check_output
    return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
  File "/home/ubuntu/miniconda3/envs/torch/lib/python3.8/subprocess.py", line 516, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '/usr/src/tensorrt/bin/trtexec --onnx=yolov5s.onnx --saveEngine=yolov5s.engine --workspace=4096' returned non-zero exit status 134.

@triple-Mu triple-Mu changed the title Add TensorRT EfficientNMS plugin register Add nms for tensorrt8.0+ / onnxruntime / openvino(the same way as onnxruntime) May 22, 2022
@triple-Mu
Copy link
Contributor Author

@glenn-jocher
Good NEWS!
I recently tried adding NMS to other inference backends like onnxruntime and openvino, and the results were astounding!
Just modifying a part of the onnx graph can achieve a very good effect. It is worth mentioning that although the --export-type ort flag can be turned on to export the graph with nms, some post-processing operations are still required.
I did not completely place the post-processing in the graph, which may cause the network to output too many tensors.
You can get all result in notebooks~

@triple-Mu
Copy link
Contributor Author

Recently I re-changed this branch again.
Could you please review this pr?

errx pushed a commit to errx/yolov5 that referenced this pull request Nov 23, 2022
@wolfpack12
Copy link

wolfpack12 commented Dec 21, 2022

Would greatly appreciate this feature being rolled into the production version. Exporting Object Detection models to something like ONNX with NMS will allow many people to use light weight frameworks on Edge devices or things like AWS Lambda. Torch is a lot of overhead for just implementing NMS.

Edit: I've tested the trtNMS branch to export the model using these arguments:

python export.py --weights mymodel.pt --include onnx --nms --conf-thres 0.4

When I inference using onnxruntime, I am getting different results than I am with detect.py. It seems like the conf_thres on the ONNX model has some lower bound of ~0.7. There are no predictions below that. The actual confidence values for each detection do not quite match either.

Edit2: It appears it is being limited to 100 response values. I tried modifying the "max_output_boxes" to be 1000 but it still only returns 100 detections per image.

Edit3: I needed to modify the --top-k-per-class and --top-k-all to be 100. This yielded more than 100 results. Detections and confidence with onnxruntime don't exactly match but we're in the ballpark.

triple-Mu added a commit to triple-Mu/yolov5 that referenced this pull request Dec 26, 2022
@pokidyshev
Copy link

Hi, @triple-Mu! Thanks for your amazing work on adding NMS!

@wolfpack12 has mentioned that outputs of the models exported with this PR do not exactly match original outputs of .pt model. Have you confirmed that models exported with this PR work properly and have same or close outputs to the original? I'm especially interested in the TensorRT version.

Thank you!

triple-Mu added a commit to triple-Mu/yolov5 that referenced this pull request Dec 27, 2022
triple-Mu added a commit to triple-Mu/yolov5 that referenced this pull request Dec 27, 2022
New PR for "ultralytics#7736"

Remove not use

Format onnxruntime and tensorrt onnx outputs

fix

unified outputs
@triple-Mu
Copy link
Contributor Author

@triple-Mu
Copy link
Contributor Author

Would greatly appreciate this feature being rolled into the production version. Exporting Object Detection models to something like ONNX with NMS will allow many people to use light weight frameworks on Edge devices or things like AWS Lambda. Torch is a lot of overhead for just implementing NMS.

Edit: I've tested the trtNMS branch to export the model using these arguments:

python export.py --weights mymodel.pt --include onnx --nms --conf-thres 0.4

When I inference using onnxruntime, I am getting different results than I am with detect.py. It seems like the conf_thres on the ONNX model has some lower bound of ~0.7. There are no predictions below that. The actual confidence values for each detection do not quite match either.

Edit2: It appears it is being limited to 100 response values. I tried modifying the "max_output_boxes" to be 1000 but it still only returns 100 detections per image.

Edit3: I needed to modify the --top-k-per-class and --top-k-all to be 100. This yielded more than 100 results. Detections and confidence with onnxruntime don't exactly match but we're in the ballpark.

I re-updated the code of this pr, please try again
Usage:
For tensorrt nms export:

python3 export.py --weights yolov5s.pt --include onnx --nms trt --iou 0.65 --conf 0.001 --topk-all 300 --simplify

For onnxruntime nms export:

python3 export.py --weights yolov5s.pt --include onnx --nms ort --iou 0.65 --conf 0.001 --topk-all 300 --simplify

For openvino nms export:

python3 export.py --weights yolov5s.pt --include openvino --nms ovo --iou 0.65 --conf 0.001 --topk-all 300 --simplify

In order to export the model supported by the corresponding backend, you need to specify --nms trt/ort/ovo to export onnx or xml.
Of course, onnx is a product that must be generated.

In addition, you can export models in dynamic shape. You can add --dynamic batch or --dynamic all to export dynamic batch or dynamic axes onnx first.
An example onnx for TensorRT export cmd is

python3 export.py --weights yolov5s.pt --include onnx --nms trt --iou 0.65 --conf 0.001 --topk-all 300 --simplify --dynamic batch

If you want to export orin yolov5 onnx model with dynamic shape, the cmd is:

python3 export.py --weights yolov5s.pt --include onnx -simplify --dynamic

You don't need to pass arguments to --dynamic

If you want to export orin yolov5 tflite model with nms, the cmd is:

python3 export.py --weights yolov5s.pt --include tflite  --nms

You don't need to pass arguments to --nms.

triple-Mu added a commit to triple-Mu/yolov5 that referenced this pull request Dec 27, 2022
New PR for "ultralytics#7736"

Remove not use

Format onnxruntime and tensorrt onnx outputs

fix

unified outputs
triple-Mu added a commit to triple-Mu/yolov5 that referenced this pull request Dec 27, 2022
New PR for "ultralytics#7736"

Remove not use

Format onnxruntime and tensorrt onnx outputs

fix

unified outputs
@wolfpack12
Copy link

I’ll test in the new year. Just curious, how is this implementation different than yolort?

triple-Mu added a commit to triple-Mu/yolov5 that referenced this pull request Dec 28, 2022
New PR for "ultralytics#7736"

Remove not use

Format onnxruntime and tensorrt onnx outputs

fix

unified outputs
@wolfpack12
Copy link

The update is very close. The detections are off by only a couple (out of ~200 objects). While I drill into the root cause, I noticed a few things:

1. export.py fails on models where the --nms argument is used on export (see error message below)

  nc = prediction.shape[2] - nm - 5  # number of classes
  IndexError: tuple index out of range

2. The output of the inference using onnxruntime includes an object with 0 probability and -1 class. I don't recall seeing this before. Here's how I was inferencing:

ort_session = onnxruntime.InferenceSession(model, providers = ['CPUExecutionProvider'])
ort_inputs = {ort_session.get_inputs()[0].name: image}
ort_outs = ort_session.run(None, ort_inputs)
img_out_y = ort_outs

triple-Mu added a commit to triple-Mu/yolov5 that referenced this pull request Jan 4, 2023
New PR for "ultralytics#7736"

Remove not use

Format onnxruntime and tensorrt onnx outputs

fix

unified outputs
@triple-Mu
Copy link
Contributor Author

The update is very close. The detections are off by only a couple (out of ~200 objects). While I drill into the root cause, I noticed a few things:

1. export.py fails on models where the --nms argument is used on export (see error message below)

  nc = prediction.shape[2] - nm - 5  # number of classes
  IndexError: tuple index out of range

2. The output of the inference using onnxruntime includes an object with 0 probability and -1 class. I don't recall seeing this before. Here's how I was inferencing:

ort_session = onnxruntime.InferenceSession(model, providers = ['CPUExecutionProvider'])
ort_inputs = {ort_session.get_inputs()[0].name: image}
ort_outs = ort_session.run(None, ort_inputs)
img_out_y = ort_outs

The update is very close. The detections are off by only a couple (out of ~200 objects). While I drill into the root cause, I noticed a few things:

1. export.py fails on models where the --nms argument is used on export (see error message below)

  nc = prediction.shape[2] - nm - 5  # number of classes
  IndexError: tuple index out of range

2. The output of the inference using onnxruntime includes an object with 0 probability and -1 class. I don't recall seeing this before. Here's how I was inferencing:

ort_session = onnxruntime.InferenceSession(model, providers = ['CPUExecutionProvider'])
ort_inputs = {ort_session.get_inputs()[0].name: image}
ort_outs = ort_session.run(None, ort_inputs)
img_out_y = ort_outs

Question 1: It should be caused by your use of the non_max_suppression function. This shouldn't happen when export.py is executed, can you provide a run command?

Question 2. In order to avoid detecting that there is no object in the picture, such as a randomly generated noise. I added a class of -1, boxes and a result of score 0 for this case in postprocessing. This prevents the network output from being empty. You can use the numeric value of the first output to do a secondary filter on the box and score. It's easy, please refer to my submitted notebook.

@wolfpack12
Copy link

The update is very close. The detections are off by only a couple (out of ~200 objects). While I drill into the root cause, I noticed a few things:
1. export.py fails on models where the --nms argument is used on export (see error message below)

  nc = prediction.shape[2] - nm - 5  # number of classes
  IndexError: tuple index out of range

2. The output of the inference using onnxruntime includes an object with 0 probability and -1 class. I don't recall seeing this before. Here's how I was inferencing:

ort_session = onnxruntime.InferenceSession(model, providers = ['CPUExecutionProvider'])
ort_inputs = {ort_session.get_inputs()[0].name: image}
ort_outs = ort_session.run(None, ort_inputs)
img_out_y = ort_outs

The update is very close. The detections are off by only a couple (out of ~200 objects). While I drill into the root cause, I noticed a few things:
1. export.py fails on models where the --nms argument is used on export (see error message below)

  nc = prediction.shape[2] - nm - 5  # number of classes
  IndexError: tuple index out of range

2. The output of the inference using onnxruntime includes an object with 0 probability and -1 class. I don't recall seeing this before. Here's how I was inferencing:

ort_session = onnxruntime.InferenceSession(model, providers = ['CPUExecutionProvider'])
ort_inputs = {ort_session.get_inputs()[0].name: image}
ort_outs = ort_session.run(None, ort_inputs)
img_out_y = ort_outs

Question 1: It should be caused by your use of the non_max_suppression function. This shouldn't happen when export.py is executed, can you provide a run command?

Question 2. In order to avoid detecting that there is no object in the picture, such as a randomly generated noise. I added a class of -1, boxes and a result of score 0 for this case in postprocessing. This prevents the network output from being empty. You can use the numeric value of the first output to do a secondary filter on the box and score. It's easy, please refer to my submitted notebook.

Sorry I had a typo. The error in Question 1 is when detect.py is used. It attempts to run the non_max_suppression function on the custom ONNX model where NMS is part of the graph.

Here's the run command:

python detect.py --weights weights/model1.onnx --source image1.tif --conf-thres 0.4 --imgsz 512 640 --save-txt --iou-thres 0.45

Here's more granular output of the error:

Loading weights/model1.onnx for ONNX Runtime inference...
Traceback (most recent call last):
  File "/home/user/onnxexportyolov5/yolov5/detect.py", line 261, in <module>
    main(opt)
  File "/home/user/onnxexportyolov5/yolov5/detect.py", line 256, in main
    run(**vars(opt))
  File "/home/user/.local/lib/python3.9/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/home/user/onnxexportyolov5/yolov5/detect.py", line 132, in run
    pred = non_max_suppression(pred, conf_thres, iou_thres, classes, agnostic_nms, max_det=max_det)
  File "/home/user/onnxexportyolov5/yolov5/utils/general.py", line 912, in non_max_suppression
    nc = prediction.shape[2] - nm - 5  # number of classes
IndexError: tuple index out of range

For Question 2, the notebook is a great addition. Stepping through the process of exporting the model and then inferencing using onnxruntime will be very helpful to others. I suspect the issue I'm having is the conversion of the image to a tensor. I'm trying to execute this within an AWS Lambda Function (this was not trivial to do). The way I was converting the image is different than your method:

imageStream = io.BytesIO(binary_content[0])
imageFile = Image.open(imageStream).convert('RGB').resize((512, 640))
imageFile_Array = np.asarray(imageFile).astype('float32') / 255.0
imageFile_Array = imageFile_Array[None]
imageFile_Array = np.transpose(imageFile_Array, [0, 3, 1, 2])

@triple-Mu
Copy link
Contributor Author

The update is very close. The detections are off by only a couple (out of ~200 objects). While I drill into the root cause, I noticed a few things:

1. export.py fails on models where the --nms argument is used on export (see error message below)

nc = prediction.shape[2] - nm - 5 # number of classes

IndexError: tuple index out of range

2. The output of the inference using onnxruntime includes an object with 0 probability and -1 class. I don't recall seeing this before. Here's how I was inferencing:

ort_session = onnxruntime.InferenceSession(model, providers = ['CPUExecutionProvider'])

ort_inputs = {ort_session.get_inputs()[0].name: image}

ort_outs = ort_session.run(None, ort_inputs)

img_out_y = ort_outs

The update is very close. The detections are off by only a couple (out of ~200 objects). While I drill into the root cause, I noticed a few things:

1. export.py fails on models where the --nms argument is used on export (see error message below)

nc = prediction.shape[2] - nm - 5 # number of classes

IndexError: tuple index out of range

2. The output of the inference using onnxruntime includes an object with 0 probability and -1 class. I don't recall seeing this before. Here's how I was inferencing:

ort_session = onnxruntime.InferenceSession(model, providers = ['CPUExecutionProvider'])

ort_inputs = {ort_session.get_inputs()[0].name: image}

ort_outs = ort_session.run(None, ort_inputs)

img_out_y = ort_outs

Question 1: It should be caused by your use of the non_max_suppression function. This shouldn't happen when export.py is executed, can you provide a run command?

Question 2. In order to avoid detecting that there is no object in the picture, such as a randomly generated noise. I added a class of -1, boxes and a result of score 0 for this case in postprocessing. This prevents the network output from being empty. You can use the numeric value of the first output to do a secondary filter on the box and score. It's easy, please refer to my submitted notebook.

Sorry I had a typo. The error in Question 1 is when detect.py is used. It attempts to run the non_max_suppression function on the custom ONNX model where NMS is part of the graph.

Here's the run command:

python detect.py --weights weights/model1.onnx --source image1.tif --conf-thres 0.4 --imgsz 512 640 --save-txt --iou-thres 0.45

Here's more granular output of the error:


Loading weights/model1.onnx for ONNX Runtime inference...

Traceback (most recent call last):

  File "/home/user/onnxexportyolov5/yolov5/detect.py", line 261, in <module>

    main(opt)

  File "/home/user/onnxexportyolov5/yolov5/detect.py", line 256, in main

    run(**vars(opt))

  File "/home/user/.local/lib/python3.9/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context

    return func(*args, **kwargs)

  File "/home/user/onnxexportyolov5/yolov5/detect.py", line 132, in run

    pred = non_max_suppression(pred, conf_thres, iou_thres, classes, agnostic_nms, max_det=max_det)

  File "/home/user/onnxexportyolov5/yolov5/utils/general.py", line 912, in non_max_suppression

    nc = prediction.shape[2] - nm - 5  # number of classes

IndexError: tuple index out of range

For Question 2, the notebook is a great addition. Stepping through the process of exporting the model and then inferencing using onnxruntime will be very helpful to others. I suspect the issue I'm having is the conversion of the image to a tensor. I'm trying to execute this within an AWS Lambda Function (this was not trivial to do). The way I was converting the image is different than your method:


imageStream = io.BytesIO(binary_content[0])

imageFile = Image.open(imageStream).convert('RGB').resize((512, 640))

imageFile_Array = np.asarray(imageFile).astype('float32') / 255.0

imageFile_Array = imageFile_Array[None]

imageFile_Array = np.transpose(imageFile_Array, [0, 3, 1, 2])

It seems that you feed an input tensor with shape 512x640.
Because of we export onnx with shape 640x640, if you feed a wrong shape tensor, it won't work.

@wolfpack12
Copy link

@triple-Mu Unfortunately that isn't the issue. I can send it a 640x640 image and the results still don't match. I suspect the issue is the use of letterbox (Still need to confirm). In your example notebook, you import letterbox from YOLOv5 which requires cv2 to be imported. If I want to run this in AWS Lambda, I don't want to import cv2 or torch since it would exceed the 250MB limit. So I'd need to implement using numpy or base python. Will provide results when I dig more into this.

@triple-Mu
Copy link
Contributor Author

@triple-Mu Unfortunately that isn't the issue. I can send it a 640x640 image and the results still don't match. I suspect the issue is the use of letterbox (Still need to confirm). In your example notebook, you import letterbox from YOLOv5 which requires cv2 to be imported. If I want to run this in AWS Lambda, I don't want to import cv2 or torch since it would exceed the 250MB limit. So I'd need to implement using numpy or base python. Will provide results when I dig more into this.

Maybe you can save the input tensor to your local pc as npy file.
Besides, I suggest you using np.ascontiguousarray when transpose a ndarray.
I have no idea about the code you provide. It seems so simple.

@wolfpack12
Copy link

wolfpack12 commented Jan 4, 2023

I added the letterboxing function below. It helps increase the accuracy but its still slightly off.

def letterbox_image(image, size):
    iw, ih = image.size
    w, h = size
    scale = min(w/iw, h/ih)
    nw = int(iw*scale)
    nh = int(ih*scale)

    image = image.resize((nw,nh), Image.BICUBIC)
    new_image = Image.new('RGB', size, (114,114,114))
    new_image.paste(image, ((w-nw)//2, (h-nh)//2))
    return new_image

I call it in my Lambda function using this:

imageFile = letterbox_image(Image.open(imageStream), (640, 640) )

EDIT: I'm increasingly confident this is a resizing/letterbox issue. I've played around with changing the padding color from (114, 114, 114) to (0, 0, 0) and (255, 255, 255). This actually affects the number of calls the model makes!

In addition, the scaling method matters. In the code above, the Image.BICUBIC method is used for interpolation on the scaling. In YOLOv5, the letterbox function uses the cv2 code below:

im = cv2.resize(im, new_unpad, interpolation=cv2.INTER_LINEAR)

When I change the interpolation to Image.BILINEAR or Image.NEAREST, it makes a significant impact on the number of calls. I think this is the root cause of the problem and I am doubtful I will ever match the output.

The take away is that these models are extremely sensitive to very small changes. Scaling method, background color and input size have an unpredictable impact on the model performance.

EDIT2: For anyone that is morbidly curious, the difference in interpolation between PIL and CV2 is discuss ad-nauseum here: python-pillow/Pillow#2718

I found that Image.BICUBIC had the closest results to the cv2.resize method used in YOLOv5. I tried Image.BILINEAR since, you know, it should be equivalent to cv2.INTER_LINEAR. But it wasn't!

This commentary goes beyond the scope of this issue (exporting NMS for onnxruntime). I believe the branch that @triple-Mu created accomplishes this. The only thing I see that needs to be wrapped up is ensuring NMS-enabled ONNX models can use the detect.py function in YOLOv5 without throwing an error.

New PR for "ultralytics#7736"

Remove not use

Format onnxruntime and tensorrt onnx outputs

fix

unified outputs
@github-actions
Copy link
Contributor

github-actions bot commented Oct 3, 2023

👋 Hello there! We wanted to let you know that we've decided to close this pull request due to inactivity. We appreciate the effort you put into contributing to our project, but unfortunately, not all contributions are suitable or aligned with our product roadmap.

We hope you understand our decision, and please don't let it discourage you from contributing to open source projects in the future. We value all of our community members and their contributions, and we encourage you to keep exploring new projects and ways to get involved.

For additional resources and information, please see the links below:

Thank you for your contributions to YOLO 🚀 and Vision AI ⭐

@github-actions github-actions bot added the Stale Stale and schedule for closing soon label Oct 3, 2023
@github-actions github-actions bot closed this Nov 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Stale Stale and schedule for closing soon TODO High priority items
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants