Best way to use YOLO on Jetson Xavier with max FPS #5386

Kmarconi · 2020-04-28T13:44:57Z

Hi ! First thanks for the continuous update you are making on your repo, it's amazing. I'm working on a project on which i would like to be able to detect only one class of object but at a high speed ( at least 60 FPS). I just tested your yolov4 files and the yolov3 pruned weight and I'm blocked at 5 FPS on my Xavier whereas if I remember well, i was around 20 FPS with yolov3. I know that yolov4 is heavier than yolov3 but I was hopping that the pruned version of yolov3 would rise in terms of FPS but it did not and I think I did something wrong.

To compile darknet, I've put the flags GPU,CUDNN,CUDNN_HALF and OPENCV to 1. I also uncommented the ARCH version for the Xavier. Do I need to do something else ?

For now, i'm able to run some object detection algorithm at a speed of 150 FPS (ssd-inception) on the xavier but I really would like to use yolo because of it's accuracy. I know that I need to use TensorRT, the quantization so that the weight use FP16 or INT8 and not FP32 and I know how to do it with Tensorflow, but with darknet i'm kinda lost. Can you give me some help ?

Ps : I know that deepstream supports YOLO natively but I would like to do a python or C++ object-detection app and I'm not sure that It is possible to "import" the deepstream pipeline in a Python app and get the detected object from it.

Best Regards, SOrry for this long message but I'm passionate about YOLO ^^

DocF · 2020-04-28T15:05:57Z

In my view, for detection in one class, as long as it is not a dense small object, yolov3-tiny is enough.

AlexeyAB · 2020-04-28T15:18:15Z

know that deepstream supports YOLO natively but I would like to do a python or C++ object-detection app

Deepstream isn't C++ app? https://github.com/NVIDIA-AI-IOT/deepstream_reference_apps and https://github.com/NVIDIA-AI-IOT/deepstream_4.x_apps

I know that I need to use TensorRT, the quantization so that the weight use FP16 or INT8 and not FP32

Yes, you can try to do INT8 quantization with TensorRT + Deepstream.

i would like to be able to detect only one class of object but at a high speed ( at least 60 FPS).

Very approximately for yolov4.cfg

width=416 height=416 in cfg - 9 FPS
width=320 height=320 in cfg - 13 FPS
width=320 height=320 in cfg INT8-TensorRT - 25 FPS
width=320 height=320 in cfg INT8-TensorRT batch=32 - 50 FPS

I made repo with INT8 quantization for Yolov2/v3 but it doesn't support Yolov4 https://github.com/AlexeyAB/yolo2_light

So may be better for you to use yolov3-tiny-prn.cfg

marcoslucianops · 2020-04-28T17:36:53Z

Ps : I know that deepstream supports YOLO natively but I would like to do a python or C++ object-detection app and I'm not sure that It is possible to "import" the deepstream pipeline in a Python app and get the detected object from it.

You can get metadata from deepstream in Python and C. For C, you need edit deepstream-app or deepstream-test code. For Python your need install and edit this.

You need manipulate NvDsObjectMeta, NvDsFrameMeta and NvOSD_RectParams to get label, position, etc. of bboxs.

In C deepstream-app aplication, your code need be in analytics_done_buf_prob function. In C/Python deepstream-test application, your code need be in tiler_src_pad_buffer_probe function.

Example using C: https://www.youtube.com/watch?v=eFv4P1oj9pA
Example using Python: https://www.youtube.com/watch?v=n3uYS550PDo

Python is slightly slower than C (on Jetson Nano, ~2FPS).

Kmarconi · 2020-04-29T12:07:46Z

Hi, first thanks for your 3 quick replies ! Since I would like my model to detect objects which could be big at foreground but also could be small at the very background of the image, I'm not sure Yolov3-tiny is valuable option for me ? Correct me if i'm wrong but I know that YOlov3 is analyzing the image at three different scale which is a good feature for my purpose. But it is done with 106 convolutions layers and I don't know if the few layers from yolov3-tiny could be enough to detect one object at a large and a small scale. Will take a look to your links @marcoslucianops thanks ! :) And thanks for your answer too @AlexeyAB :)

AlexeyAB · 2020-04-29T12:26:17Z

@Kmarconi @marcoslucianops You can use Yolov4 on tensorRT using tkDNN with 32 FPS(FP16) / 17 FPS(FP32) with batch=1 on AGX Xavier: #5354 (comment)

With batch=4 FPS will be higher.

Kmarconi · 2020-04-29T12:30:58Z

Thanks ! WIll give it a try !

marcoslucianops · 2020-04-29T14:28:19Z

@AlexeyAB, I will compare tkDNN and DeepStream. Thanks!

Kmarconi · 2020-04-29T14:46:09Z

To keep you updated, I'm actually around 34 FPS with yolov4 on the Xavier with tkDNN.

AlexeyAB · 2020-04-29T14:49:17Z

@Kmarconi What batch-size, network resolution, and Float-point-precision (32/16) do you use?

Kmarconi · 2020-04-29T15:06:15Z

I'm using batch_size of 4, fp16 mode and didn't touch to the network resolution for the moment so the default yolov4 one

AlexeyAB · 2020-04-29T22:19:45Z

So do you get 34 FPS on Jetson Xavier by using yolov4.cfg width=608 height=608 batch_size=4 fp16 by using tkDNN+TensorRT?

Kmarconi · 2020-04-30T06:53:11Z

Sorry for the late response, I'm working in France so I'm not awaken in the same hours as you are ^^ I'm using width=416 height=416 batch=4 and fp16 with tkDNN+TRT to get 34 FPS on the Xavier yes ! :) I know that it is probably something too hard or too time consuming to do but it would be amazing to see one day an easy integration of TensorRT in the darknet project for every gpu architecture which support it. Will continue to test tkDNN today, will keep posting results

marcoslucianops · 2020-05-01T16:51:08Z

@AlexeyAB, DeepStream is faster than tkDNN. tkDNN shows 45.381ms inference time buts diplay video seems like 10-15 fps on Jeston Nano. I think it's due to OpenCV.

AlexeyAB · 2020-05-01T16:53:41Z

@mive93 Hi, Can you comment on this?

mive93 · 2020-05-01T17:44:46Z

Hi @marcoslucianops,
how are you using tkDNN? Have you enabled FP16 inference? Have you enabled preprocessing on GPU? We have never tested tkDNN on a Jetson Nano, so I do not have data on that. However, yes, you are right, OpenCV could be a problem for performances.

Hi @Kmarconi

To keep you updated, I'm actually around 34 FPS with yolov4 on the Xavier with tkDNN.

How did you obtain this number? I think you are doing something wrong, those are the FPS with batch = 1.

marcoslucianops · 2020-05-01T19:27:09Z

Have you enabled FP16 inference?

I compared DeepStream FP32 vs tkDNN FP32

Have you enabled preprocessing on GPU?

Yes

I think that's problem (delay) in OpenCV when write bbox and imshow.

AlexeyAB · 2020-05-02T16:03:04Z

@mive93

I think, for tkDNN

it shouldn't show all frames on the screen. So CPU-thread that shows detections on the screen should work asynchronously and shows only the last frame
if it is implemented, it should write all frames to the output.avi file

Kmarconi · 2020-05-04T07:56:55Z

Hi @mive93 ,

Yeah I just saw that I was mistaken about the batch_size. Haven't seen

The test will still run with a batch of 1, but the created tensorRT can manage the desidered batch size.

So even if I export the batch_size variable to 4 for example, I will do my inference with only a batch_size of 1 ? Then how can I use the full potential of my trt engine ?

PS : 160 FPS with mobilenet on the Xavier,woaw. ^^

mive93 · 2020-05-11T10:43:48Z

@AlexeyAB @marcoslucianops yeah, it's due to OpenCV. And @AlexeyAB you are right, we should insert some flag to disable the graphics. However it is thought to be a library, so the demo is just an example, it's not how you use it. Ofc when I use it in other projects, the graphic part is handled by other tasks. But I could add a demo like that maybe.

@Kmarconi thanks :)
Right now the batch can be only tested to check the FPS (using the rt_interence test). But this week I'm planning to allow using it in a demo, so that anyone can test it for real with more batches. It was a WIP.

AlexeyAB · 2020-05-11T13:36:02Z

@mive93 You can just add such part of code for bbox_drawing(), wait_key_cv() and show() functions, so these functions will be used no more than a 100 times per 1 second in the Demo:

darknet/src/demo.c

Lines 295 to 296 in 0c7305c

    
           const int each_frame = max_val_cmp(1, avg_fps / 100); 
        
           if(global_frame_counter % each_frame == 0) show_image_mat(show_img, "Demo");

harsco-jfernandez · 2020-05-30T00:21:58Z

How are all getting xavier to work at 34 FPS? I'm only able to get 24FPS!

I've set the following and my model is 320x320, not 416x416 like yours is.

TKDNN_BATCHSIZE=4
TKDNN_MODE=FP16

What else do I need?

yolo4_fp16.rt
New NetworkRT (TensorRT v6.01)
Float16 support: 1
Int8 support: 1
DLAs: 2
create execution context
Input/outputs numbers: 4
input idex = 0 -> output index = 3
Data dim: 1 3 320 320 1
Data dim: 1 33 10 10 1
RtBuffer 0 dim: Data dim: 1 3 320 320 1
RtBuffer 1 dim: Data dim: 1 33 40 40 1
RtBuffer 2 dim: Data dim: 1 33 20 20 1
RtBuffer 3 dim: Data dim: 1 33 10 10 1
===== TENSORRT detection ====
Time: 0.725123 ms
Data dim: 1 3 320 320 1
Time: 19.7376 ms
Data dim: 1 33 10 10 1
Time: 0.585052 ms

===== TENSORRT detection ====
Time: 0.71021 ms
Data dim: 1 3 320 320 1
Time: 19.7166 ms
Data dim: 1 33 10 10 1
Time: 0.396787 ms

===== TENSORRT detection ====
Time: 0.676224 ms
Data dim: 1 3 320 320 1
Time: 19.7656 ms
Data dim: 1 33 10 10 1
Time: 0.360881 ms

===== TENSORRT detection ====
Time: 0.758276 ms
Data dim: 1 3 320 320 1
Time: 19.7501 ms
Data dim: 1 33 10 10 1
Time: 0.458837 ms

Kmarconi · 2020-05-30T06:43:01Z

Are you in MAXN mode and did you used sudo /usr/bin/jetson_clocks?

mive93 · 2020-05-30T10:21:03Z

Hi @harsco-jfernandez ,

first, @Kmarconi is right.
How did you create the rt file?
Which command did you run to print those results?
How did you use the batches? Right now batches are supported in the demo only in the branch eval.

harsco-jfernandez · 2020-05-30T13:12:07Z

Thank you, fellows!

Your questions are as good as answers. I made some assumptions. It is running at 40 FPS now.

AlexeyAB · 2020-05-30T15:57:39Z

@harsco-jfernandez 40 FPS is a good speed for Yolov4 on Jetson Xavier AGX.

harsco-jfernandez · 2020-05-30T16:01:21Z

@AlexeyAB It is excellent! I love it!

I'm now trying int8 inference. My camera is capable of 100 fps.

rafcy · 2020-06-10T12:05:08Z

Has anyone tested the performance on jetson xavier nx instead of AGX? (it's almost half the price of AGX)

mive93 · 2020-06-11T08:55:38Z

Hi @rafcy,
Not yet, I'm waiting for the board to be shipped.
But soonish I will do some tests on the nano.

AlexeyAB added the question label Apr 29, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Best way to use YOLO on Jetson Xavier with max FPS #5386

Best way to use YOLO on Jetson Xavier with max FPS #5386

Kmarconi commented Apr 28, 2020

DocF commented Apr 28, 2020

AlexeyAB commented Apr 28, 2020

marcoslucianops commented Apr 28, 2020 •

edited

Loading

Kmarconi commented Apr 29, 2020

AlexeyAB commented Apr 29, 2020

Kmarconi commented Apr 29, 2020

marcoslucianops commented Apr 29, 2020

Kmarconi commented Apr 29, 2020

AlexeyAB commented Apr 29, 2020

Kmarconi commented Apr 29, 2020

AlexeyAB commented Apr 29, 2020

Kmarconi commented Apr 30, 2020

marcoslucianops commented May 1, 2020

AlexeyAB commented May 1, 2020

mive93 commented May 1, 2020

marcoslucianops commented May 1, 2020 •

edited

Loading

AlexeyAB commented May 2, 2020

Kmarconi commented May 4, 2020 •

edited

Loading

mive93 commented May 11, 2020

AlexeyAB commented May 11, 2020

harsco-jfernandez commented May 30, 2020 •

edited

Loading

Kmarconi commented May 30, 2020

mive93 commented May 30, 2020

harsco-jfernandez commented May 30, 2020 •

edited

Loading

AlexeyAB commented May 30, 2020

harsco-jfernandez commented May 30, 2020 •

edited

Loading

rafcy commented Jun 10, 2020

mive93 commented Jun 11, 2020

Best way to use YOLO on Jetson Xavier with max FPS #5386

Best way to use YOLO on Jetson Xavier with max FPS #5386

Comments

Kmarconi commented Apr 28, 2020

DocF commented Apr 28, 2020

AlexeyAB commented Apr 28, 2020

marcoslucianops commented Apr 28, 2020 • edited Loading

Kmarconi commented Apr 29, 2020

AlexeyAB commented Apr 29, 2020

Kmarconi commented Apr 29, 2020

marcoslucianops commented Apr 29, 2020

Kmarconi commented Apr 29, 2020

AlexeyAB commented Apr 29, 2020

Kmarconi commented Apr 29, 2020

AlexeyAB commented Apr 29, 2020

Kmarconi commented Apr 30, 2020

marcoslucianops commented May 1, 2020

AlexeyAB commented May 1, 2020

mive93 commented May 1, 2020

marcoslucianops commented May 1, 2020 • edited Loading

AlexeyAB commented May 2, 2020

Kmarconi commented May 4, 2020 • edited Loading

mive93 commented May 11, 2020

AlexeyAB commented May 11, 2020

harsco-jfernandez commented May 30, 2020 • edited Loading

Kmarconi commented May 30, 2020

mive93 commented May 30, 2020

harsco-jfernandez commented May 30, 2020 • edited Loading

AlexeyAB commented May 30, 2020

harsco-jfernandez commented May 30, 2020 • edited Loading

rafcy commented Jun 10, 2020

mive93 commented Jun 11, 2020

marcoslucianops commented Apr 28, 2020 •

edited

Loading

marcoslucianops commented May 1, 2020 •

edited

Loading

Kmarconi commented May 4, 2020 •

edited

Loading

harsco-jfernandez commented May 30, 2020 •

edited

Loading

harsco-jfernandez commented May 30, 2020 •

edited

Loading

harsco-jfernandez commented May 30, 2020 •

edited

Loading