Feature-request: YOLOv4-tiny (detector) #59

AlexeyAB · 2020-06-25T14:49:34Z

Feature-request: YOLOv4-tiny (detector)

source code: https://github.com/AlexeyAB/darknet
cfg: https://raw.githubusercontent.com/AlexeyAB/darknet/master/cfg/yolov4-tiny.cfg
weights: https://github.com/AlexeyAB/darknet/releases/download/darknet_yolo_v4_pre/yolov4-tiny.weights
discussion: YOLOv4-tiny released: 40.2% AP50, 371 FPS (GTX 1080 Ti), 1770 FPS tkDNN/TensorRT AlexeyAB/darknet#6067

Many other features from Darknet were added previously.
There is required only 1 feature:

Add groups= and group_id= to the [route] layer.

[route]
layers=-1
groups=2
group_id=1

So if input WxHxC, it divides input by 2 groups WxHx(C/2) (there are 2 groups: 0 and 1), and loads the 2nd group_1 WxHx(C/2).

If there are many layers specified in layers= parameter, then this will be done for each of the input layers specified in layer=, then results will be concatenated across channels.

The text was updated successfully, but these errors were encountered:

mive93 · 2020-06-29T07:42:40Z

Hi @AlexeyAB :)
We'll look into that this week.

mive93 · 2020-06-30T17:40:26Z

Hi @AlexeyAB ,
Yolov4 tiny is now supported on tkDNN.
Tomorrow I will do some performance tests, today my GPU is busy training.

JasonDoingGreat · 2020-07-01T02:55:44Z

@mive93 Thanks for the Yolov4-Tiny impl

I've tested on Jetson Nano with JetPack 4.4, TensorRT v7.1, 416 input size

For FP32, profile results:

Time stats:
Min: 37.3371 ms
Max: 122.952 ms
Avg: 38.0922 ms	26.2521 FPS

For FP16, profile results:

Time stats:
Min: 24.5687 ms
Max: 90.5088 ms
Avg: 25.5292 ms	39.1709 FPS

mive93 · 2020-07-01T08:20:17Z

Hi @JasonDoingGreat,

thanks :)
Here the inference results on the RTX 2080Ti (CUDA 10.2, TensorRT 7.0.0, Cudnn 7.6.5); for yolo4tiny 416x416, on 1200 images of size 416x416.

model	precision	batch	avg (ms)	min (ms)	max(ms)	avg FPS
yolo4tiny	fp32	1	1,64185	1,57668	1,71029	609,068
yolo4tiny	fp32	4	1,0385	1,03024	1,08981	962,926
yolo4tiny	fp16	1	1,26474	0,90607	1,4321	790,677
yolo4tiny	fp16	4	0,563628	0,556467	0,620871	1774,22
yolo4tiny	int8	1	1,03339	0,728739	1,16966	967,69
yolo4tiny	int8	4	0,474048	0,467551	0,506916	2109,49

If needed I can test it on the Xavier or tx2

AlexeyAB · 2020-07-01T11:14:29Z

@mive93 Thanks! Yes, please test it on AGX or NX with max_N.

mive93 · 2020-07-01T13:33:13Z

Here it is.
Results on Xavier AGX, Jetpack 4.3 (CUDA 10.0, CUDNN 7.6.3, tensorrt 6.0.1 ); for yolo4tiny 416x416, on 1200 images of size 416x416.

model	precision	batch	avg (ms)	min (ms)	max(ms)	avg FPS
yolo4tiny	fp32	1	6,36684	6,31811	6,48507	157,064
yolo4tiny	fp32	4	5,61027	5,58927	5,63641	178,244
yolo4tiny	fp16	1	3,48334	3,44269	3,56074	287,081
yolo4tiny	fp16	4	2,63374	2,61526	2,65826	379,688
yolo4tiny	int8	1	3,13312	3,08334	3,24114	319,17
yolo4tiny	int8	4	2,33578	2,32111	2,359	428,122

CSTEZCAN · 2020-07-01T15:26:49Z

@AlexeyAB @ceccocats @mive93 , single-handedly destroyed the reputation of google, facebook and nvidia. this is extraordinary.

AlexeyAB · 2020-07-02T16:20:16Z

@mive93 Hi,
Does tkDNN work only with converted weights Darknet->tkDNN?
Or can tkDNN work with yolov4.weights file directly without conversion.

mive93 · 2020-07-06T11:54:38Z

Hi @AlexeyAB,
it is necessary to export to our format.

mmaaz60 · 2020-07-07T12:00:50Z

Hi @AlexeyAB,
it is necessary to export to our format.

Hi @mive93 @AlexeyAB,

Is there any accuracy degradation when you convert darknet weights to tkDNN format? What about accuracy loss when inferring in FP16 or INT8 mode? Is there any way to fine-tune the models in FP16 or INT8 mode or perform quantization aware training beforehand? Thanks

mive93 · 2020-07-15T17:23:17Z

Hi @mmaaz60
The conversion of the weights does not lead to accuracy degradation. Actually the only thing we do is splitting weights layer by layer. However yes, I noticed that there is a very tiny accuracy drop. I checked (almost layer by layer) the weights of darknet and tkDNN and the problem is not there, rather on the output of the network. This is due to the different implementation of the operation (IMHO).

For the FP16 mode, the drop from full precision is negligible, while the drop from full precision to INT8 is heavy.
The problem with INT8 is also the calibration step. We have tried with 100/1000 images. Maybe using more, or using more variance would lead to better results.

I hope I covered all your doubts.

mive93 · 2020-09-11T07:27:41Z

Closing for now,
feel free to reopen.

AlexeyAB · 2020-09-27T13:49:42Z

@mive93 Hi,
Have you published a paper with YOLOv4/tiny results on tkDNN or are you planning to do so?

mive93 · 2020-10-09T10:11:49Z

Hi @AlexeyAB

Sorry for the late reply.
Actually yes, we submitted the results to a journal, and we are now under review.
However, you can find here some results: https://git.hipert.unimore.it/edbench/edbench

Anyway, if you need some test I am available to do some, I also have a Xavier NX now ;)

m-kzein · 2020-10-13T08:32:25Z

Hi @mive93
You mention in this thread that you reach 319,17 fps for yolov4tiny (int8) on the xavier; However, on the main readme, you mention that you reach 60.61. What is the difference?

mive93 · 2020-10-13T08:45:40Z

Hi @MohammadKassemZein
The difference is that here I'm talking about yolov4Tiny, in the readme of Yolov4 (not tiny).

m-kzein · 2020-10-13T08:47:48Z

@mive93 Nice !
I am going to test it now on the Xavier NX.

Thank you.

m-kzein · 2020-10-13T10:09:01Z

@mive93 Below are the results on Xavier NX.

model	precision	batch	avg (ms)	min (ms)	max(ms)	avg FPS
yolo4tiny	fp32	1	12.3885	10.1343	48.9955	80.7199
yolo4tiny	fp32	4	10.2807	9.80685	32.563	97.27
yolo4tiny	fp16	1	8.47916	6.8818	38.0609	117.936
yolo4tiny	fp16	4	5.01874	4.54197	24.2833	199.253

mive93 · 2020-10-13T10:28:32Z

Nice @MohammadKassemZein :)

How did you collect those results?

m-kzein · 2020-10-13T10:44:13Z

@mive93 I used your framework (tkDNN) on Jetson NX .

mive93 · 2020-10-13T10:47:44Z

yeah I guessed :) Sorry, I was vague.

Have you activated jetson_clocks?
For the measurements, did you use the demo itself, did you wrote your code or did you use this test?

m-kzein · 2020-10-13T10:52:03Z

I was using MODE 15W 2CORE (which I guess gives the highest clocking for GPU and CPU).
For the measurements, I used demo itself.

m-kzein · 2020-10-15T10:11:43Z

Hi again @mive93
Have you considered adding Mask-RCNN implementation to the framework?
I believe it is a very useful network and its benchmark in terms of FPS is not up to the expectations.

mive93 · 2020-12-04T09:48:33Z

Hi @MohammadKassemZein
Yes we have considered that, but we haven't had the need in any of your project yet.
Therefore for now, we don't plan to port it, but maybe in the future.
Lately, we have ported a semantic segmentation net.
Probably in the future we'll port also something related to instance/panoptic segmentation.

fix ceccocats#59

AlexeyAB mentioned this issue Jun 25, 2020

YOLOv4-tiny released: 40.2% AP50, 371 FPS (GTX 1080 Ti), 1770 FPS tkDNN/TensorRT AlexeyAB/darknet#6067

Open

ceccocats closed this as completed in 04602f3 Jun 30, 2020

ceccocats reopened this Jun 30, 2020

AlexeyAB mentioned this issue Jul 1, 2020

YOLOv4-tiny released: 40.2% AP50, 371 FPS (GTX 1080 Ti) pjreddie/darknet#2201

Open

YashasSamaga mentioned this issue Jul 17, 2020

opencv dnn & yolov4, yolov4-tiny Performance AlexeyAB/darknet#6245

Open

AlexeyAB mentioned this issue Jul 24, 2020

Update readme, add tensorrt yolov3-spp, yolov4 AlexeyAB/darknet#5453

Merged

mive93 closed this as completed Sep 11, 2020

zjd1988 mentioned this issue Sep 27, 2020

YOLOV4 problems /yolov4-tiny require OAID/Tengine#413

Closed

zjd1988 mentioned this issue Oct 9, 2020

trying fix onnx slice op bug OAID/Tengine#431

Merged

WongKinYiu mentioned this issue Nov 18, 2020

1774FPS该如何复现呢？ WongKinYiu/ScaledYOLOv4#16

Open

mive93 added the enhancement label Dec 4, 2020

KangGrandesty mentioned this issue Dec 14, 2020

Here it is. #170

Closed

rickymedrano mentioned this issue Feb 5, 2021

Run with TKDNN support Tossy0423/yolov4-for-darknet_ros#4

Closed

mohitkhubele pushed a commit to baggageai/tkDNN that referenced this issue Jun 9, 2021

Yolo4-tiny batched

8fad424

fix ceccocats#59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature-request: YOLOv4-tiny (detector) #59

Feature-request: YOLOv4-tiny (detector) #59

AlexeyAB commented Jun 25, 2020

mive93 commented Jun 29, 2020

mive93 commented Jun 30, 2020 •

edited

Loading

JasonDoingGreat commented Jul 1, 2020

mive93 commented Jul 1, 2020

AlexeyAB commented Jul 1, 2020

mive93 commented Jul 1, 2020

CSTEZCAN commented Jul 1, 2020

AlexeyAB commented Jul 2, 2020

mive93 commented Jul 6, 2020

mmaaz60 commented Jul 7, 2020 •

edited

Loading

mive93 commented Jul 15, 2020

mive93 commented Sep 11, 2020

AlexeyAB commented Sep 27, 2020

mive93 commented Oct 9, 2020

m-kzein commented Oct 13, 2020

mive93 commented Oct 13, 2020

m-kzein commented Oct 13, 2020

m-kzein commented Oct 13, 2020

mive93 commented Oct 13, 2020

m-kzein commented Oct 13, 2020

mive93 commented Oct 13, 2020

m-kzein commented Oct 13, 2020

m-kzein commented Oct 15, 2020

mive93 commented Dec 4, 2020

Feature-request: YOLOv4-tiny (detector) #59

Feature-request: YOLOv4-tiny (detector) #59

Comments

AlexeyAB commented Jun 25, 2020

mive93 commented Jun 29, 2020

mive93 commented Jun 30, 2020 • edited Loading

JasonDoingGreat commented Jul 1, 2020

mive93 commented Jul 1, 2020

AlexeyAB commented Jul 1, 2020

mive93 commented Jul 1, 2020

CSTEZCAN commented Jul 1, 2020

AlexeyAB commented Jul 2, 2020

mive93 commented Jul 6, 2020

mmaaz60 commented Jul 7, 2020 • edited Loading

mive93 commented Jul 15, 2020

mive93 commented Sep 11, 2020

AlexeyAB commented Sep 27, 2020

mive93 commented Oct 9, 2020

m-kzein commented Oct 13, 2020

mive93 commented Oct 13, 2020

m-kzein commented Oct 13, 2020

m-kzein commented Oct 13, 2020

mive93 commented Oct 13, 2020

m-kzein commented Oct 13, 2020

mive93 commented Oct 13, 2020

m-kzein commented Oct 13, 2020

m-kzein commented Oct 15, 2020

mive93 commented Dec 4, 2020

mive93 commented Jun 30, 2020 •

edited

Loading

mmaaz60 commented Jul 7, 2020 •

edited

Loading