diff --git a/notebooks/onnx-graphsurgeon-inference-tensorrt.ipynb b/notebooks/onnx-graphsurgeon-inference-tensorrt.ipynb index d99ab6fb4..f93752a9d 100644 --- a/notebooks/onnx-graphsurgeon-inference-tensorrt.ipynb +++ b/notebooks/onnx-graphsurgeon-inference-tensorrt.ipynb @@ -5,7 +5,7 @@ "id": "ab5ff80c", "metadata": {}, "source": [ - "# TensorRT Python Inference for yolort" + "# Deploying yolort on TensorRT" ] }, { @@ -13,9 +13,9 @@ "id": "ed6e21c1", "metadata": {}, "source": [ - "Unlike other TensorRT examples that deal with yolov5, we embed the whole post-processing into the Graph with `onnx-graghsurgeon`. We gain a lot with this whole pipeline. The ablation experiment results are below. The first one is the result without running `EfficientNMS_TRT `, and the second one is the result with `EfficientNMS_TRT` embedded. As you can see, the inference time is even reduced, we guess it is because the data copied to the device will be much less after doing `EfficientNMS_TRT`. (The mean Latency of D2H is reduced from `0.868048 ms` to `0.0102295 ms`, running on Nivdia Geforce GTX 1080ti, using TensorRT 8.2 with yolov5n6 and scaling images to `512x640`.)\n", + "Unlike other TensorRT examples that deal with yolov5, we embed the whole post-processing into the Graph with `onnx-graghsurgeon`. We gain a lot with this whole pipeline. The ablation experiment results are below. The first one is the result without running `EfficientNMS_TRT`, and the second one is the result with `EfficientNMS_TRT` embedded. As you can see, the inference time is even reduced, we guess it is because the data copied to the device will be much less after doing `EfficientNMS_TRT`. (The mean Latency of D2H is reduced from `0.868048 ms` to `0.0102295 ms`, running on Nivdia Geforce GTX 1080ti, using TensorRT 8.2 with yolov5n6 and scaling images to `512x640`.)\n", "\n", - "And `onnx-graphsurgeon` is easy to install, you can just use the prebuilt wheels\n", + "And `onnx-graphsurgeon` is easy to install, you can just use their prebuilt wheels:\n", "\n", "```\n", "python3 -m pip install onnx_graphsurgeon --index-url https://pypi.ngc.nvidia.com\n", @@ -24,7 +24,7 @@ "The detailed results:\n", "\n", "```\n", - "[I] === Performance summary w/o NMS plugin ===\n", + "[I] === Performance summary w/o EfficientNMS_TRT plugin ===\n", "[I] Throughput: 383.298 qps\n", "[I] Latency: min = 3.66479 ms, max = 5.41199 ms, mean = 4.00543 ms, median = 3.99316 ms, percentile(99%) = 4.23831 ms\n", "[I] End-to-End Host Latency: min = 3.76599 ms, max = 6.45874 ms, mean = 5.08597 ms, median = 5.07544 ms, percentile(99%) = 5.50839 ms\n", @@ -40,7 +40,7 @@ "```\n", "\n", "```\n", - "[I] === Performance summary w/ NMS plugin ===\n", + "[I] === Performance summary w/ EfficientNMS_TRT plugin ===\n", "[I] Throughput: 389.234 qps\n", "[I] Latency: min = 2.81482 ms, max = 9.77234 ms, mean = 3.1062 ms, median = 3.07642 ms, percentile(99%) = 3.33548 ms\n", "[I] End-to-End Host Latency: min = 2.82202 ms, max = 11.6749 ms, mean = 4.939 ms, median = 4.95587 ms, percentile(99%) = 5.45207 ms\n",