This repository contains code to optimize PyTorch image models using ONNX Runtime and TensorRT, achieving up to 8x faster inference speeds. Read the full blog post here.

Installation

Create and activate a conda environment:

conda create -n supercharge_timm_tensorrt python=3.11
conda activate supercharge_timm_tensorrt

Install required packages:

pip install timm
pip install onnx
pip install onnxruntime-gpu==1.19.2
pip install cupy-cuda12x
pip install tensorrt==10.1.0 tensorrt-cu12==10.1.0 tensorrt-cu12-bindings==10.1.0 tensorrt-cu12-libs==10.1.0

Install CUDA dependencies:

conda install -c nvidia cuda=12.2.2 cuda-tools=12.2.2 cuda-toolkit=12.2.2 cuda-version=12.2 cuda-command-line-tools=12.2.2 cuda-compiler=12.2.2 cuda-runtime=12.2.2

Install cuDNN:

conda install cudnn==9.2.1.18

Set up library paths:

export LD_LIBRARY_PATH="/home/dnth/mambaforge-pypy3/envs/supercharge_timm_tensorrt/lib:$LD_LIBRARY_PATH"
export LD_LIBRARY_PATH="/home/dnth/mambaforge-pypy3/envs/supercharge_timm_tensorrt/lib/python3.11/site-packages/tensorrt_libs:$LD_LIBRARY_PATH"

Running the code

The following codes correspond to the steps in the blog post.

Load timm model and run inference:

python 00_load_and_infer.py

Read more here

PyTorch latency benchmark:

python 01_pytorch_latency_benchmark.py

Read more here

Convert model to ONNX:

python 02_convert_to_onnx.py

Read more here

ONNX Runtime CPU inference:

python 03_onnx_cpu_inference.py

Read more here

ONNX Runtime CUDA inference:

python 04_onnx_cuda_inference.py

Read more here

ONNX Runtime TensorRT inference:

python 05_onnx_trt_inference.py

Read more here

Export preprocessing to ONNX:

python 06_export_preprocessing_onnx.py

Read more here

Merge preprocessing and model ONNX:

python 07_onnx_compose_merge.py

Read more here

Run inference on merged model:

python 08_inference_merged_model.py

Read more here

Run inference on video:

python 09_video_inference.py sample.mp4 output.mp4 --live

output.mp4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Installation

Running the code

Load timm model and run inference:

PyTorch latency benchmark:

Convert model to ONNX:

ONNX Runtime CPU inference:

ONNX Runtime CUDA inference:

ONNX Runtime TensorRT inference:

Export preprocessing to ONNX:

Merge preprocessing and model ONNX:

Run inference on merged model:

Run inference on video:

Files

README.md

Latest commit

History

README.md

File metadata and controls

Installation

Running the code

Load timm model and run inference:

PyTorch latency benchmark:

Convert model to ONNX:

ONNX Runtime CPU inference:

ONNX Runtime CUDA inference:

ONNX Runtime TensorRT inference:

Export preprocessing to ONNX:

Merge preprocessing and model ONNX:

Run inference on merged model:

Run inference on video: