-
Notifications
You must be signed in to change notification settings - Fork 8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Best way to use YOLO on Jetson Xavier with max FPS #5386
Comments
In my view, for detection in one class, as long as it is not a dense small object, yolov3-tiny is enough. |
Deepstream isn't C++ app? https://github.com/NVIDIA-AI-IOT/deepstream_reference_apps and https://github.com/NVIDIA-AI-IOT/deepstream_4.x_apps
Yes, you can try to do INT8 quantization with TensorRT + Deepstream.
Very approximately for yolov4.cfg
I made repo with INT8 quantization for Yolov2/v3 but it doesn't support Yolov4 https://github.com/AlexeyAB/yolo2_light So may be better for you to use |
You can get metadata from deepstream in Python and C. For C, you need edit deepstream-app or deepstream-test code. For Python your need install and edit this. You need manipulate NvDsObjectMeta, NvDsFrameMeta and NvOSD_RectParams to get label, position, etc. of bboxs. In C deepstream-app aplication, your code need be in analytics_done_buf_prob function. In C/Python deepstream-test application, your code need be in tiler_src_pad_buffer_probe function. Example using C: https://www.youtube.com/watch?v=eFv4P1oj9pA Python is slightly slower than C (on Jetson Nano, ~2FPS). |
Hi, first thanks for your 3 quick replies ! Since I would like my model to detect objects which could be big at foreground but also could be small at the very background of the image, I'm not sure Yolov3-tiny is valuable option for me ? Correct me if i'm wrong but I know that YOlov3 is analyzing the image at three different scale which is a good feature for my purpose. But it is done with 106 convolutions layers and I don't know if the few layers from yolov3-tiny could be enough to detect one object at a large and a small scale. Will take a look to your links @marcoslucianops thanks ! :) And thanks for your answer too @AlexeyAB :) |
@Kmarconi @marcoslucianops You can use Yolov4 on tensorRT using tkDNN with 32 FPS(FP16) / 17 FPS(FP32) with batch=1 on AGX Xavier: #5354 (comment) With batch=4 FPS will be higher. |
Thanks ! WIll give it a try ! |
@AlexeyAB, I will compare tkDNN and DeepStream. Thanks! |
To keep you updated, I'm actually around 34 FPS with yolov4 on the Xavier with tkDNN. |
@Kmarconi What batch-size, network resolution, and Float-point-precision (32/16) do you use? |
I'm using batch_size of 4, fp16 mode and didn't touch to the network resolution for the moment so the default yolov4 one |
So do you get 34 FPS on Jetson Xavier by using yolov4.cfg width=608 height=608 batch_size=4 fp16 by using tkDNN+TensorRT? |
Sorry for the late response, I'm working in France so I'm not awaken in the same hours as you are ^^ I'm using width=416 height=416 batch=4 and fp16 with tkDNN+TRT to get 34 FPS on the Xavier yes ! :) I know that it is probably something too hard or too time consuming to do but it would be amazing to see one day an easy integration of TensorRT in the darknet project for every gpu architecture which support it. Will continue to test tkDNN today, will keep posting results |
@AlexeyAB, DeepStream is faster than tkDNN. tkDNN shows 45.381ms inference time buts diplay video seems like 10-15 fps on Jeston Nano. I think it's due to OpenCV. |
@mive93 Hi, Can you comment on this? |
Hi @marcoslucianops, Hi @Kmarconi
How did you obtain this number? I think you are doing something wrong, those are the FPS with batch = 1. |
I think, for tkDNN
|
Hi @mive93 , Yeah I just saw that I was mistaken about the batch_size. Haven't seen
So even if I export the batch_size variable to 4 for example, I will do my inference with only a batch_size of 1 ? Then how can I use the full potential of my trt engine ? PS : 160 FPS with mobilenet on the Xavier,woaw. ^^ |
@AlexeyAB @marcoslucianops yeah, it's due to OpenCV. And @AlexeyAB you are right, we should insert some flag to disable the graphics. However it is thought to be a library, so the demo is just an example, it's not how you use it. Ofc when I use it in other projects, the graphic part is handled by other tasks. But I could add a demo like that maybe. @Kmarconi thanks :) |
@mive93 You can just add such part of code for Lines 295 to 296 in 0c7305c
|
How are all getting xavier to work at 34 FPS? I'm only able to get 24FPS! I've set the following and my model is 320x320, not 416x416 like yours is. TKDNN_BATCHSIZE=4 What else do I need? yolo4_fp16.rt ===== TENSORRT detection ==== ===== TENSORRT detection ==== ===== TENSORRT detection ==== |
Are you in MAXN mode and did you used sudo /usr/bin/jetson_clocks? |
Hi @harsco-jfernandez , first, @Kmarconi is right. |
Thank you, fellows! Your questions are as good as answers. I made some assumptions. It is running at 40 FPS now. |
@harsco-jfernandez 40 FPS is a good speed for Yolov4 on Jetson Xavier AGX. |
@AlexeyAB It is excellent! I love it! I'm now trying int8 inference. My camera is capable of 100 fps. |
Has anyone tested the performance on jetson xavier nx instead of AGX? (it's almost half the price of AGX) |
Hi @rafcy, |
Hi ! First thanks for the continuous update you are making on your repo, it's amazing. I'm working on a project on which i would like to be able to detect only one class of object but at a high speed ( at least 60 FPS). I just tested your yolov4 files and the yolov3 pruned weight and I'm blocked at 5 FPS on my Xavier whereas if I remember well, i was around 20 FPS with yolov3. I know that yolov4 is heavier than yolov3 but I was hopping that the pruned version of yolov3 would rise in terms of FPS but it did not and I think I did something wrong.
To compile darknet, I've put the flags GPU,CUDNN,CUDNN_HALF and OPENCV to 1. I also uncommented the ARCH version for the Xavier. Do I need to do something else ?
For now, i'm able to run some object detection algorithm at a speed of 150 FPS (ssd-inception) on the xavier but I really would like to use yolo because of it's accuracy. I know that I need to use TensorRT, the quantization so that the weight use FP16 or INT8 and not FP32 and I know how to do it with Tensorflow, but with darknet i'm kinda lost. Can you give me some help ?
Ps : I know that deepstream supports YOLO natively but I would like to do a python or C++ object-detection app and I'm not sure that It is possible to "import" the deepstream pipeline in a Python app and get the detected object from it.
Best Regards, SOrry for this long message but I'm passionate about YOLO ^^
The text was updated successfully, but these errors were encountered: