GPU TensorRT Speed-Accuracy Comparison

Analysis and insights for GPU inference results:

For low-to-medium accuracies, the most efficient models are LeViT-256 and LeViT-384
For higher accuracies (> 83.5%), TResNet-L and LeViT-768* models provides the best trade-off among the models tested
Other transformer models (Swin, TnT and ViT), provide inferior speed-accuracy trade-off compared to modern CNNs. Mobile-oriented models also do not excel on GPU inference
Note that several modern architectures, titled ”Small” (ConvNext-S, Swin-S, TnT-S), are in fact quite resource-intensive - their inference speed is more than four times slower compared to a plain ResNet50 model

CPU OpenVino Speed-Accuracy Comparison

Analysis and insights for CPU inference results:

On CPU, Mobile oriented models (OFA, MobileNet, EfficientNet) provide the best speed-accuracy trade-off
LeViT models, who excelled on GPU inference, are not as efficient for CPU inference.
Other transformer models (Swin, TnT and ViT) again provide inferior speed-accuracy trade-off.

Implementation Details

On GPU, The throughput was tested with TensorRT inference engine, FP16, and batch size of 256. On CPU, The throughput was tested using Intel’s OpenVINO Inference Engine, FP16, a batch size of 1 and 16 streams (equivalent to the number of CPU cores). All measurements were done after models were optimized to inference by batch-norm fusion. This significantly accelerates models that utilize batch-norm, like ResNet50, TResNet and LeViT. Note that LeViT-768* model is not a part of the original paper, but a model defined by us, to test LeViT design on higher accuracies regime.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speed_Accuracy_Comparisons.md

Speed_Accuracy_Comparisons.md

GPU TensorRT Speed-Accuracy Comparison

Analysis and insights for GPU inference results:

CPU OpenVino Speed-Accuracy Comparison

Analysis and insights for CPU inference results:

Implementation Details

Files

Speed_Accuracy_Comparisons.md

Latest commit

History

Speed_Accuracy_Comparisons.md

File metadata and controls

GPU TensorRT Speed-Accuracy Comparison

Analysis and insights for GPU inference results:

CPU OpenVino Speed-Accuracy Comparison

Analysis and insights for CPU inference results:

Implementation Details