Use `EfficientNMS_TRT` plugin when exporting TensorRT #288

zhiqwang · 2022-01-24T18:22:19Z

The previous example works ok after this change.

import os
import torch

os.environ["CUDA_DEVICE_ORDER"] = "PCI_BUS_ID"
os.environ["CUDA_VISIBLE_DEVICES"] = "0"

assert torch.cuda.is_available()
device = torch.device('cuda')

from yolort.utils import get_image_from_url, read_image_to_tensor
from yolort.v5 import letterbox, attempt_download
from yolort.runtime import PredictorTRT
from yolort.runtime.trt_helper import EngineBuilder
from yolort.runtime.yolo_graphsurgeon import YOLOGraphSurgeon

# Define some parameters
img_size = 640
stride = 64
score_thresh = 0.35
iou_thresh = 0.45
detections_per_img = 100
half = False

# yolov5s6.pt is downloaded from 'https://github.com/ultralytics/yolov5/releases/download/v6.0/yolov5n6.pt'
model_path = "yolov5n6.pt"

checkpoint_path = attempt_download(model_path)
onnx_path = "yolov5n6.onnx"
engine_path = "yolov5n6.engine"

img_source = "https://huggingface.co/spaces/zhiqwang/assets/resolve/main/bus.jpg"
# img_source = "https://huggingface.co/spaces/zhiqwang/assets/resolve/main/zidane.jpg"
img_raw = get_image_from_url(img_source)

# Pre Processing
image = letterbox(img_raw, new_shape=(img_size, img_size), stride=stride)[0]
image = read_image_to_tensor(image)
image = image[None]
image = image.to(device)
image = image.contiguous()

# Export to ONNX models
yolo_gs = YOLOGraphSurgeon(model_path, input_sample=image, version="r6.0", enable_dynamic=False)
# Embed the `EfficientNMS_TRT` at the end of `LogitsDecoder`.
yolo_gs.register_nms(score_thresh=score_thresh, nms_thresh=iou_thresh, detections_per_img=detections_per_img)

yolo_gs.save(onnx_path)

# Build TensorRT Engine
engine_builder = EngineBuilder()
engine_builder.create_network(onnx_path)
engine_builder.create_engine(engine_path, precision="fp32")

# Inference on TensorRT
engine = PredictorTRT(engine_path, device)
engine.warmup(img_size=image.shape, half=half)

# Inferencing
detections = engine.run_on_image(image)

Known cons

We have to update the TensorRT to 8.2 to call the EfficientNMS_TRT plugin. And seems that there is a bug about the float16 of this plugin: NVIDIA/TensorRT#1758 (comment) and was fixed since version 8.2.4.

CLAassistant · 2022-01-24T18:25:25Z

All committers have signed the CLA.

codecov · 2022-01-24T18:32:09Z

Codecov Report

Merging #288 (ce13ca5) into main (d2db932) will not change coverage.
The diff coverage is n/a.

❗ Current head ce13ca5 differs from pull request most recent head a3f80fb. Consider uploading reports for the commit a3f80fb to get more accurate results

@@           Coverage Diff           @@
##             main     #288   +/-   ##
=======================================
  Coverage   94.01%   94.01%           
=======================================
  Files          11       11           
  Lines         718      718           
=======================================
  Hits          675      675           
  Misses         43       43

Flag	Coverage Δ
unittests	`94.01% <ø> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d2db932...a3f80fb. Read the comment docs.

zhiqwang added 5 commits January 25, 2022 02:13

Add device argument in YOLOGraphSurgeon

821c3c0

Update links

92e078e

Add run_wo_postprocessing in PredictorTRT

466b186

Use EfficientNMS_TRT plugin instead

7b015ed

Separate out module LogitsDecoder

4a0c599

zhiqwang added enhancement New feature or request code quality Code format and unit tests labels Jan 24, 2022

zhiqwang added 2 commits January 25, 2022 02:24

Set enable_dynamic as False by default

1a5125b

Update TensorRT inference notebook

88a4c22

zhiqwang force-pushed the EfficientNMS_TRT_PLUGIN branch from 41f1e02 to a1f9059 Compare January 24, 2022 18:26

Apply pre-commit

c72dd6a

zhiqwang force-pushed the EfficientNMS_TRT_PLUGIN branch from ce13ca5 to c72dd6a Compare January 24, 2022 18:28

Minor fixes

a3f80fb

zhiqwang merged commit 51f9d41 into main Jan 24, 2022

zhiqwang deleted the EfficientNMS_TRT_PLUGIN branch January 24, 2022 18:59

zhiqwang mentioned this pull request Jan 25, 2022

A benchmark for reference such as speed, memory, accuracy and so on #272

Open

zhiqwang added the deployment Inference acceleration for production label Jan 25, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use `EfficientNMS_TRT` plugin when exporting TensorRT #288

Use `EfficientNMS_TRT` plugin when exporting TensorRT #288

zhiqwang commented Jan 24, 2022 •

edited

Loading

CLAassistant commented Jan 24, 2022 •

edited

Loading

codecov bot commented Jan 24, 2022 •

edited

Loading

Use EfficientNMS_TRT plugin when exporting TensorRT #288

Use EfficientNMS_TRT plugin when exporting TensorRT #288

Conversation

zhiqwang commented Jan 24, 2022 • edited Loading

Known cons

CLAassistant commented Jan 24, 2022 • edited Loading

codecov bot commented Jan 24, 2022 • edited Loading

Codecov Report

Use `EfficientNMS_TRT` plugin when exporting TensorRT #288

Use `EfficientNMS_TRT` plugin when exporting TensorRT #288

zhiqwang commented Jan 24, 2022 •

edited

Loading

CLAassistant commented Jan 24, 2022 •

edited

Loading

codecov bot commented Jan 24, 2022 •

edited

Loading