A benchmark for reference such as speed, memory, accuracy and so on #272

liuzhuang1024 · 2022-01-11T09:26:06Z

📚 The doc issue

Not found the benchmark.

Suggest a potential alternative/fix

A detailed test report about speed, mAP, memory and so on.

zhiqwang · 2022-01-25T05:31:01Z

Hi @lzmisscc ,

Thank you for raising this ticket, it is a long term plan and our current idea is to improve the ease of use.

Now I wanna share with you our gains on TensorRT, we also got some improvement with the whole pipeline embedding the PostProcessing into the graph. (Use EfficientNMS_TRT plugin in #288 and test on Geforce 1080ti, TensorRT 8.2 with yolov5n6 512x640)

[I] === Performance summary w/o NMS plugin ===
[I] Throughput: 383.298 qps
[I] Latency: min = 3.66479 ms, max = 5.41199 ms, mean = 4.00543 ms, median = 3.99316 ms, percentile(99%) = 4.23831 ms
[I] End-to-End Host Latency: min = 3.76599 ms, max = 6.45874 ms, mean = 5.08597 ms, median = 5.07544 ms, percentile(99%) = 5.50839 ms
[I] Enqueue Time: min = 0.743408 ms, max = 5.27966 ms, mean = 0.940805 ms, median = 0.924805 ms, percentile(99%) = 1.37329 ms
[I] H2D Latency: min = 0.502045 ms, max = 0.62674 ms, mean = 0.538255 ms, median = 0.537354 ms, percentile(99%) = 0.582153 ms
[I] GPU Compute Time: min = 2.23233 ms, max = 3.92395 ms, mean = 2.59913 ms, median = 2.58661 ms, percentile(99%) = 2.8201 ms
[I] D2H Latency: min = 0.851807 ms, max = 0.900421 ms, mean = 0.868048 ms, median = 0.867676 ms, percentile(99%) = 0.889191 ms
[I] Total Host Walltime: 3.0081 s
[I] Total GPU Compute Time: 2.99679 s
[I] Explanations of the performance metrics are printed in the verbose logs.
[I]
&&&& PASSED TensorRT.trtexec [TensorRT v8201] # trtexec --onnx=yolov5n6-no-nms.onnx --workspace=8096

[I] === Performance summary w/ EfficientNMS_TRT ===
[I] Throughput: 389.234 qps
[I] Latency: min = 2.81482 ms, max = 9.77234 ms, mean = 3.1062 ms, median = 3.07642 ms, percentile(99%) = 3.33548 ms
[I] End-to-End Host Latency: min = 2.82202 ms, max = 11.6749 ms, mean = 4.939 ms, median = 4.95587 ms, percentile(99%) = 5.45207 ms
[I] Enqueue Time: min = 0.999878 ms, max = 11.3833 ms, mean = 1.28942 ms, median = 1.18579 ms, percentile(99%) = 4.53088 ms
[I] H2D Latency: min = 0.488159 ms, max = 0.633881 ms, mean = 0.546754 ms, median = 0.546631 ms, percentile(99%) = 0.570557 ms
[I] GPU Compute Time: min = 2.30298 ms, max = 9.21094 ms, mean = 2.54921 ms, median = 2.51904 ms, percentile(99%) = 2.78528 ms
[I] D2H Latency: min = 0.00610352 ms, max = 0.302734 ms, mean = 0.0102295 ms, median = 0.00976562 ms, percentile(99%) = 0.0151367 ms
[I] Total Host Walltime: 3.00591 s
[I] Total GPU Compute Time: 2.98258 s
[I] Explanations of the performance metrics are printed in the verbose logs.
[I]
&&&& PASSED TensorRT.trtexec [TensorRT v8201] # trtexec --onnx=yolov5n6-efficient-nms.onnx --workspace=8096

The above is the result without running NMS, and the below is the result we run with EfficientNMS_TRT embedded on the TensorRT graph. As you can see, the inference time is even reduced, we guess it is because the data copied to the device will be much less after doing NMS. (The Latency of D2H is reduced from 0.868048 ms to 0.0102295 ms)

zhiqwang added the enhancement New feature or request label Jan 11, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A benchmark for reference such as speed, memory, accuracy and so on #272

A benchmark for reference such as speed, memory, accuracy and so on #272

liuzhuang1024 commented Jan 11, 2022

zhiqwang commented Jan 25, 2022 •

edited

Loading

A benchmark for reference such as speed, ​​memory, accuracy and so on #272

A benchmark for reference such as speed, ​​memory, accuracy and so on #272

Comments

liuzhuang1024 commented Jan 11, 2022

📚 The doc issue

Suggest a potential alternative/fix

zhiqwang commented Jan 25, 2022 • edited Loading

A benchmark for reference such as speed, memory, accuracy and so on #272

A benchmark for reference such as speed, memory, accuracy and so on #272

zhiqwang commented Jan 25, 2022 •

edited

Loading