Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Batch Detection #7683

Closed
1 of 2 tasks
rafcy opened this issue May 3, 2022 · 5 comments
Closed
1 of 2 tasks

Batch Detection #7683

rafcy opened this issue May 3, 2022 · 5 comments
Labels
bug Something isn't working Stale Stale and schedule for closing soon

Comments

@rafcy
Copy link

rafcy commented May 3, 2022

Search before asking

  • I have searched the YOLOv5 issues and found no similar bug report.

YOLOv5 Component

No response

Bug

Hello everyone,
I am trying tiling methods so what I am trying to do is get an image, split it into patches and batch-detect objects on those images but instead there is much more delay instead. I don't know what I am doing wrong but nothing compares to any batched inference times that are mentioned in the documentations. I am getting like 20 fps with an image inference of size 1080p (using yolov5s custom trained model) and when splitting the image into 15 patches I am getting 5 FPS.
Things I tried:

  • using both pytorch hub with ''ultralytics/yolov5" and with local repo
  • using the code in detect.py of Yolov5
  • I've tried stacking the images into a tensor, instead of a tuple, before passing them in the model(imgs) but then it returns a tensor of size [16128, 9 ] for each image instead of pandas. In order to get the actual results for each image I need to call the nms function on the tensor [batch,16128, 9] and results to a huge delay.
    All these result to the same fps and yes, I am running using a GPU (RTX 2070)

Another example i've tried, is to split a 4k image into 60 patches of 512 x 512 and detect them with the pytorch hub example as a tuple. I am getting these results as per performance
Speed: 5.5ms pre-process, 56.4ms inference, 0.8ms NMS per image at shape (60, 3, 640, 640)
but it actually needed 3.7 seconds to run. So, the speeds are misleading because they represent the inference per image which makes no sense since I wanted batched inference.

Please help me specify if I am doing something wrong or if it is normal having these results and I should stop trying to find a solution to my issue.
Thank you in advance.

Environment

I am using a custom made Docker that includes:

Minimal Reproducible Example

No response

Additional

No response

Are you willing to submit a PR?

  • Yes I'd like to help by submitting a PR!
@rafcy rafcy added the bug Something isn't working label May 3, 2022
@glenn-jocher
Copy link
Member

glenn-jocher commented May 3, 2022

@rafcy 👋 Hello! Thanks for asking about inference speed issues. PyTorch Hub speeds will vary by hardware, software, model, inference settings, etc. Our default example in Colab with a V100 looks like this:

Screen Shot 2022-05-03 at 10 20 39 AM

YOLOv5 🚀 can be run on CPU (i.e. --device cpu, slow) or GPU if available (i.e. --device 0, faster). You can determine your inference device by viewing the YOLOv5 console output:

detect.py inference

python detect.py --weights yolov5s.pt --img 640 --conf 0.25 --source data/images/

Screen Shot 2022-05-03 at 2 48 42 PM

YOLOv5 PyTorch Hub inference

import torch

# Model
model = torch.hub.load('ultralytics/yolov5', 'yolov5s')

# Images
dir = 'https://ultralytics.com/images/'
imgs = [dir + f for f in ('zidane.jpg', 'bus.jpg')]  # batch of images

# Inference
results = model(imgs)
results.print()  # or .show(), .save()
# Speed: 631.5ms pre-process, 19.2ms inference, 1.6ms NMS per image at shape (2, 3, 640, 640)

Increase Speeds

If you would like to increase your inference speed some options are:

  • Use batched inference with YOLOv5 PyTorch Hub
  • Reduce --img-size, i.e. 1280 -> 640 -> 320
  • Reduce model size, i.e. YOLOv5x -> YOLOv5l -> YOLOv5m -> YOLOv5s -> YOLOv5n
  • Use half precision FP16 inference with python detect.py --half and python val.py --half
  • Use a faster GPUs, i.e.: P100 -> V100 -> A100
  • Export to ONNX or OpenVINO for up to 3x CPU speedup (CPU Benchmarks)
  • Export to TensorRT for up to 5x GPU speedup (GPU Benchmarks)
  • Use a free GPU backends with up to 16GB of CUDA memory: Open In Colab Open In Kaggle

Good luck 🍀 and let us know if you have any other questions!

@rafcy
Copy link
Author

rafcy commented May 4, 2022

hello @glenn-jocher and thank you for the response. So, as far as I understand with conducting some tests with the colab, batch detection does not help in any way with speeding up the detection process right?

@glenn-jocher
Copy link
Member

@rafcy
Copy link
Author

rafcy commented May 4, 2022

@rafcy see https://community.ultralytics.com/t/yolov5-study-batch-size-vs-speed

I have seen that, I did my own test as well in colab to check out the results. The inference time is indeed decreasing a bit when batches are increased but the overall delay gets a lot bigger.
You can see my colab test here:
https://colab.research.google.com/drive/1jmm0U_T1RNKQ3VSYKpKEulSZuR5-42-V?usp=sharing

1 image time: 0.0473322868347168
Speed: 18.2ms pre-process, 26.4ms inference, 2.1ms NMS per image at shape (1, 3, 384, 640)
4 images time: 0.17291879653930664
Speed: 17.2ms pre-process, 23.8ms inference, 1.8ms NMS per image at shape (4, 3, 384, 640)
8 images time: 0.37774181365966797
Speed: 17.1ms pre-process, 27.3ms inference, 2.5ms NMS per image at shape (8, 3, 384, 640)
16 images time: 0.5562183856964111
Speed: 17.7ms pre-process, 14.7ms inference, 2.0ms NMS per image at shape (16, 3, 384, 640)
32 images time: 1.035698413848877
Speed: 17.0ms pre-process, 13.1ms inference, 2.0ms NMS per image at shape (32, 3, 384, 640)

I know that I may sound annoying with my queries, but I am just trying to figure out if batch detection of your YoloV5 works for my application.
For instance, from your batch size comparison, you are basically stating for yolov5s model is almost the same overall duration as batch size 1, but through my comparison, this in fact is not true. Am I doing something wrong or am I not understanding something?

Batch Size || YOLOv5s
-- | -- |
1 | 1.0 |
8 | 7.0 |

Thank you in advance.

@github-actions
Copy link
Contributor

github-actions bot commented Jun 4, 2022

👋 Hello, this issue has been automatically marked as stale because it has not had recent activity. Please note it will be closed if no further activity occurs.

Access additional YOLOv5 🚀 resources:

Access additional Ultralytics ⚡ resources:

Feel free to inform us of any other issues you discover or feature requests that come to mind in the future. Pull Requests (PRs) are also always welcomed!

Thank you for your contributions to YOLOv5 🚀 and Vision AI ⭐!

@github-actions github-actions bot added the Stale Stale and schedule for closing soon label Jun 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Stale Stale and schedule for closing soon
Projects
None yet
Development

No branches or pull requests

2 participants