-
Notifications
You must be signed in to change notification settings - Fork 8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
YOLOv3-tiny in Darknet vs OpenCV DNN: large objects are missed #8146
Comments
maybe same issue opencv/opencv#17205? |
I think this has to do with how nms is utilized by default on openCV. for a darknet @stephanecharette are you setting nms in ur cfg file?, if not can you try setting it (set to a low value first and try varying it to see if you get the large cars detected) and running ur video again to confirm if this is the source of the issue? |
I tried to explicitly set For example, see the car at the very left side of this frame: |
@AlexeyAB Do you have any insight into why the same network would fail to detect large objects when running via OpenCV DNN vs Darknet? |
@stephanecharette Did you try to use cv.resize() to resize src_img to the network_size and then apply OpenCV-dnn? About different resizing approaches: #232 (comment)
|
I used resize() to make sure the input image matches the exact network dimensions, stretching the image and ignoring the aspect ratio just like Darknet does. |
See the 1st image at the top of this issue. The images are 1280x720, while the network measures 640x352. So the images have an aspect ratio of 1.78 and the network is 1.81. Not exactly the same, but as close as I could come. Also note the 1st image at the top of this issue shows the car in the very center of the image is "missed" by OpenCV. It isn't at the edge of the image. |
It would be great if you can check which of these models produce different results in Darknet and OpenCV: yolov4.weights, yolov4-csp-x-swish.weights and yolov4-tiny.weights https://github.com/AlexeyAB/darknet#pre-trained-models Initially when I added YOLOv2 to the OpenCV, I added tests to check that it produces identical results in both OpenCV and Darknet: opencv/opencv#9705 Now OpenCV>=4.5.4 supports Scaled-YOLOv4 ( opencv/opencv#18975 , opencv/opencv#20671 , opencv/opencv#20818 ) and all these models: https://github.com/AlexeyAB/darknet#pre-trained-models |
No, unfortunately this is not the solution for me. I added |
I was trying to use YOLO in OpenCV 4.5.0 but did run into similar problems. Looking deeper into the darknet code starting from yolo_v2_class.cpp it looks like detection results are taken from all YOLO layers, not only from the last layer, and then put into a single NMS. That could explain some missing detections. Also the code in yolo_v2_class.cpp is slightly different from the code darknet seems to use during training, e.g. NMS threshold 0.4 instead of 0.45 and it does not consider nms_kind layer parameter. It seems to be difficult to evaluate a YOLO net exactly right in an application. |
Glad to see I'm not the only person running into this problem. It has also been discussed several times on the Darknet/YOLO discord, but so far I don't know of anyone who knows how to solve the problem. |
@stephanecharette If you mean the general problem (correct default parameters, which way of image resizing to use, which kind of NMS, etc.) I also don't know how to establish a reference implementation that can be used by darknet, DarkHelper, and everyone. If you just ask how to get all yolo layer outputs from OpenCV, this did work for me: |
No, that's not what I mean. See the top of this post for the problem. This has been discussed on the discord server many times. Everyone who tries to use YOLOv4-tiny when using OpenCV's DNN module is stuck with the exact same issue where larger objects are not detected. |
Stéphane, I carefully read what this issue is about. Sorry for digressing into probabilities and other problems. My main point is: all OpenCV examples I find read output only from the last network layer, which is a YOLO layer. The tiny network I use has another YOLO layer in the middle. Darknet reads output from both YOLO layers. So when using OpenCV's DNN you have to do that too. |
Oh... I'm using Let me look into that. Would be nice to finally understand what is going on and have this fixed! |
Did you read the code I posted above? It did improve results for me. |
I didn't understand it. Am attempting to figure it out now. This is what I'm working with: https://github.com/stephanecharette/DarkHelp/blob/master/src-lib/DarkHelpNN.cpp#L1044 |
@AxelWS Thank you so much! Got it working. That was the key point, to take the output from all the yolo layers instead of just the last one. I'll update DarkHelp with these changes later tonight. |
I trained a YOLOv3-tiny network using several dash-cam datasets. I then run this network in the following 4 scenarios:
Other than the obvious timing differences, the results are nearly 100% identical. With 1 exception: when using OpenCV DNN -- both CPU and CUDA -- large objects seem to be missed.
Here is an example frame grab. The two on the left are Darknet, the two on the right are OpenCV DNN:
Source: https://www.youtube.com/watch?v=fFYV2uPt-XI
You can see even small objects like the traffic lights are detected correctly. But the large vehicles in the foreground are missed. Anyone know why large objects might be missed when using OpenCV DNN?
Network was trained using these options:
The text was updated successfully, but these errors were encountered: