OpenVINO™ toolkit provides a set of public pre-trained models that you can use for learning and demo purposes or for developing deep learning software. Most recent version is available in the repo on Github. The table Public Pre-Trained Models Device Support summarizes devices supported by each model.
You can download models and convert them into Inference Engine format (*.xml + *.bin) using the OpenVINO™ Model Downloader and other automation tools.
Model Name | Implementation | OMZ Model Name | Accuracy | GFlops | mParams |
---|---|---|---|---|---|
AlexNet | Caffe* | alexnet | 56.598%/79.812% | 1.5 | 60.965 |
AntiSpoofNet | PyTorch* | anti-spoof-mn3 | 3.81% | 0.15 | 3.02 |
CaffeNet | Caffe* | caffenet | 56.714%/79.916% | 1.5 | 60.965 |
DenseNet 121 | Caffe* TensorFlow* |
densenet-121 densenet-121-tf |
74.42%/92.136% 74.46%/92.13% |
5.723~5.7287 | 7.971 |
DLA 34 | PyTorch* | dla-34 | 74.64%/92.06% | 6.1368 | 15.7344 |
EfficientNet B0 | TensorFlow* PyTorch* |
efficientnet-b0 efficientnet-b0-pytorch |
75.70%/92.76% 76.91%/93.21% |
0.819 | 5.268 |
EfficientNet V2 B0 | PyTorch* | efficientnet-v2-b0 | 78.36%/94.02% | 1.4641 | 7.1094 |
EfficientNet V2 Small | PyTorch* | efficientnet-v2-s | 84.29%/97.26% | 16.9406 | 21.3816 |
HBONet 1.0 | PyTorch* | hbonet-1.0 | 73.1%/91.0% | 0.6208 | 4.5443 |
HBONet 0.25 | PyTorch* | hbonet-0.25 | 57.3%/79.8% | 0.0758 | 1.9299 |
Inception (GoogleNet) V1 | Caffe* TensorFlow* |
googlenet-v1 googlenet-v1-tf |
68.928%/89.144% 69.814%/89.6% |
3.016~3.266 | 6.619~6.999 |
Inception (GoogleNet) V2 | Caffe* TensorFlow* |
googlenet-v2 googlenet-v2-tf |
72.024%/90.844% 74.084%/91.798% |
4.058 | 11.185 |
Inception (GoogleNet) V3 | TensorFlow* PyTorch* |
googlenet-v3 googlenet-v3-pytorch |
77.904%/93.808% 77.69%/93.7% |
11.469 | 23.817 |
Inception (GoogleNet) V4 | TensorFlow* | googlenet-v4-tf | 80.204%/95.21% | 24.584 | 42.648 |
Inception-ResNet V2 | TensorFlow* | inception-resnet-v2-tf | 80.14%/95.10% | 22.227 | 30.223 |
MixNet L | TensorFlow* | mixnet-l | 78.30%/93.91% | 0.565 | 7.3 |
MobileNet V1 0.25 128 | Caffe* | mobilenet-v1-0.25-128 | 40.54%/65% | 0.028 | 0.468 |
MobileNet V1 1.0 224 | Caffe* TensorFlow* |
mobilenet-v1-1.0-224 mobilenet-v1-1.0-224-tf |
69.496%/89.224% 71.03%/89.94% |
1.148 | 4.221 |
MobileNet V2 1.0 224 | Caffe* TensorFlow* PyTorch* |
mobilenet-v2 mobilenet-v2-1.0-224 mobilenet-v2-pytorch |
71.218%/90.178% 71.85%/90.69% 71.81%/90.396% |
0.615~0.876 | 3.489 |
MobileNet V2 1.4 224 | TensorFlow* | mobilenet-v2-1.4-224 | 74.09%/91.97% | 1.183 | 6.087 |
MobileNet V3 Small 1.0 | TensorFlow* | mobilenet-v3-small-1.0-224-tf | 67.36%/87.44% | 0.1168 | 2.537 |
MobileNet V3 Large 1.0 | TensorFlow* | mobilenet-v3-large-1.0-224-tf | 75.30%/92.62% | 0.4450 | 5.4721 |
NFNet F0 | PyTorch* | nfnet-f0 | 83.34%/96.56% | 24.8053 | 71.4444 |
RegNetX-3.2GF | PyTorch* | regnetx-3.2gf | 78.17%/94.08% | 6.3893 | 15.2653 |
ResNet 26, alpha=0.25 | MXNet* | octave-resnet-26-0.25 | 76.076%/92.584% | 3.768 | 15.99 |
open-closed-eye-0001 | PyTorch* | open-closed-eye-0001 | 95.84% | 0.0014 | 0.0113 |
RepVGG A0 | PyTorch* | repvgg-a0 | 72.40%/90.49% | 2.7286 | 8.3094 |
RepVGG B1 | PyTorch* | repvgg-b1 | 78.37%/94.09% | 23.6472 | 51.8295 |
RepVGG B3 | PyTorch* | repvgg-b3 | 80.50%/95.25% | 52.4407 | 110.9609 |
ResNeSt 50 | PyTorch* | resnest-50-pytorch | 81.11%/95.36% | 10.8148 | 27.4493 |
ResNet 18 | PyTorch* | resnet-18-pytorch | 69.754%/89.088% | 3.637 | 11.68 |
ResNet 34 | PyTorch* | resnet-34-pytorch | 73.30%/91.42% | 7.3409 | 21.7892 |
ResNet 50 | PyTorch* TensorFlow* |
resnet-50-pytorchresnet-50-tf | 75.168%/92.212% 76.38%/93.188% 76.17%/92.98% |
6.996~8.216 | 25.53 |
ReXNet V1 x1.0 | PyTorch* | rexnet-v1-x1.0 | 77.86%/93.87% | 0.8325 | 4.7779 |
SE-Inception | Caffe* | se-inception | 75.996%/92.964% | 4.091 | 11.922 |
SE-ResNet 50 | Caffe* | se-resnet-50 | 77.596%/93.85% | 7.775 | 28.061 |
SE-ResNeXt 50 | Caffe* | se-resnext-50 | 78.968%/94.63% | 8.533 | 27.526 |
Shufflenet V2 x0.5 | Caffe* | shufflenet-v2-x0.5 | 58.80%/81.13% | 0.08465 | 1.363 |
Shufflenet V2 x1.0 | PyTorch* | shufflenet-v2-x1.0 | 69.36%/88.32% | 0.2957 | 2.2705 |
SqueezeNet v1.0 | Caffe* | squeezenet1.0 | 57.684%/80.38% | 1.737 | 1.248 |
SqueezeNet v1.1 | Caffe* | squeezenet1.1 | 58.382%/81% | 0.785 | 1.236 |
Swin Transformer Tiny, window size=7 | PyTorch* | swin-tiny-patch4-window7-224 | 81.38%/95.51% | 9.0280 | 28.8173 |
T2T-ViT, transformer layers number=14 | PyTorch* | t2t-vit-14 | 81.44%/95.66% | 9.5451 | 21.5498 |
VGG 16 | Caffe* | vgg16 | 70.968%/89.878% | 30.974 | 138.358 |
VGG 19 | Caffe* | vgg19 | 71.062%/89.832% | 39.3 | 143.667 |
Semantic segmentation is an extension of object detection problem. Instead of returning bounding boxes, semantic segmentation models return a "painted" version of the input image, where the "color" of each pixel represents a certain class. These networks are much bigger than respective object detection networks, but they provide a better (pixel-level) localization of objects and they can detect areas with complex shape.
Model Name | Implementation | OMZ Model Name | Accuracy | GFlops | mParams |
---|---|---|---|---|---|
DeepLab V3 | TensorFlow* | deeplabv3 | 68.41% | 11.469 | 23.819 |
DRN-D-38 | PyTorch* | drn-d-38 | 71.31% | 1768.3276 | 25.9939 |
HRNet V2 C1 Segmentation | PyTorch* | hrnet-v2-c1-segmentation | 77.69% | 81.993 | 66.4768 |
Fastseg MobileV3Large LR-ASPP, F=128 | PyTorch* | fastseg-large | 72.67% | 140.9611 | 3.2 |
Fastseg MobileV3Small LR-ASPP, F=128 | PyTorch* | fastseg-small | 67.15% | 69.2204 | 1.1 |
PSPNet R-50-D8 | PyTorch* | pspnet-pytorch | 70.6% | 357.1719 | 46.5827 |
OCRNet HRNet_w48 | Paddle* | ocrnet-hrnet-w48-paddle | 82.15% | 324.66 | 70.47 |
Instance segmentation is an extension of object detection and semantic segmentation problems. Instead of predicting a bounding box around each object instance instance segmentation model outputs pixel-wise masks for all instances.
Model Name | Implementation | OMZ Model Name | Accuracy | GFlops | mParams |
---|---|---|---|---|---|
Mask R-CNN Inception ResNet V2 | TensorFlow* | mask_rcnn_inception_resnet_v2_atrous_coco | 39.86%/35.36% | 675.314 | 92.368 |
Mask R-CNN ResNet 50 | TensorFlow* | mask_rcnn_resnet50_atrous_coco | 29.75%/27.46% | 294.738 | 50.222 |
YOLACT ResNet 50 FPN | PyTorch* | yolact-resnet50-fpn-pytorch | 28.0%/30.69% | 118.575 | 36.829 |
Model Name | Implementation | OMZ Model Name | Accuracy | GFlops | mParams |
---|---|---|---|---|---|
Brain Tumor Segmentation | MXNet* | brain-tumor-segmentation-0001 | 92.4003% | 409.996 | 38.192 |
Brain Tumor Segmentation 2 | PyTorch* | brain-tumor-segmentation-0002 | 91.4826% | 300.801 | 4.51 |
Several detection models can be used to detect a set of the most popular objects - for example, faces, people, vehicles. Most of the networks are SSD-based and provide reasonable accuracy/performance trade-offs.
Model Name | Implementation | OMZ Model Name | Accuracy | GFlops | mParams |
---|---|---|---|---|---|
CTPN | TensorFlow* | ctpn | 73.67% | 55.813 | 17.237 |
CenterNet (CTDET with DLAV0) 512x512 | ONNX* | ctdet_coco_dlav0_512 | 44.2756% | 62.211 | 17.911 |
DETR-ResNet50 | PyTorch* | detr-resnet50 | 39.27% / 42.36% | 174.4708 | 41.3293 |
EfficientDet-D0 | TensorFlow* | efficientdet-d0-tf | 31.95% | 2.54 | 3.9 |
EfficientDet-D1 | TensorFlow* | efficientdet-d1-tf | 37.54% | 6.1 | 6.6 |
FaceBoxes | PyTorch* | faceboxes-pytorch | 83.565% | 1.8975 | 1.0059 |
Face Detection Retail | Caffe* | face-detection-retail-0044 | 83.00% | 1.067 | 0.588 |
Faster R-CNN with Inception-ResNet v2 | TensorFlow* | faster_rcnn_inception_resnet_v2_atrous_coco | 40.69% | 30.687 | 13.307 |
Faster R-CNN with ResNet 50 | TensorFlow* | faster_rcnn_resnet50_coco | 31.09% | 57.203 | 29.162 |
MobileFace Detection V1 | MXNet* | mobilefacedet-v1-mxnet | 78.7488% | 3.5456 | 7.6828 |
Mobilenet-yolo-v4-syg | Keras* | mobilenet-yolo-v4-syg | 84.44% | 65.981 | 61.922 |
MTCNN | Caffe* | mtcnn: mtcnn-p mtcnn-r mtcnn-o |
48.1308%/62.2625% | 3.3715 0.0031 0.0263 |
0.0066 0.1002 0.3890 |
Pelee | Caffe* | pelee-coco | 21.9761% | 1.290 | 5.98 |
RetinaFace with ResNet 50 | PyTorch* | retinaface-resnet50-pytorch | 91.78% | 88.8627 | 27.2646 |
RetinaNet with Resnet 50 | TensorFlow* | retinanet-tf | 33.15% | 238.9469 | 64.9706 |
R-FCN with Resnet-101 | TensorFlow* | rfcn-resnet101-coco-tf | 28.40%/45.02% | 53.462 | 171.85 |
SSD 300 | Caffe* | ssd300 | 87.09% | 62.815 | 26.285 |
SSD 512 | Caffe* | ssd512 | 91.07% | 180.611 | 27.189 |
SSD with MobileNet | Caffe* TensorFlow* |
mobilenet-ssd ssd_mobilenet_v1_coco |
67.00% 23.32% |
2.316~2.494 | 5.783~6.807 |
SSD with MobileNet FPN | TensorFlow* | ssd_mobilenet_v1_fpn_coco | 35.5453% | 123.309 | 36.188 |
SSD lite with MobileNet V2 | TensorFlow* | ssdlite_mobilenet_v2 | 24.2946% | 1.525 | 4.475 |
SSD with ResNet 34 1200x1200 | PyTorch* | ssd-resnet34-1200-onnx | 20.7198%/39.2752% | 433.411 | 20.058 |
Ultra Lightweight Face Detection RFB 320 | PyTorch* | ultra-lightweight-face-detection-rfb-320 | 84.78% | 0.2106 | 0.3004 |
Ultra Lightweight Face Detection slim 320 | PyTorch* | ultra-lightweight-face-detection-slim-320 | 83.32% | 0.1724 | 0.2844 |
Vehicle License Plate Detection Barrier | TensorFlow* | vehicle-license-plate-detection-barrier-0123 | 99.52% | 0.271 | 0.547 |
YOLO v1 Tiny | TensorFlow.js* | yolo-v1-tiny-tf | 54.79% | 6.9883 | 15.8587 |
YOLO v2 Tiny | Keras* | yolo-v2-tiny-tf | 27.3443%/29.1184% | 5.4236 | 11.2295 |
YOLO v2 | Keras* | yolo-v2-tf | 53.1453%/56.483% | 63.0301 | 50.9526 |
YOLO v3 | Keras* ONNX* |
yolo-v3-tf yolo-v3-onnx |
62.2759%/67.7221% 48.30%/47.07% |
65.9843~65.998 | 61.9221~61.930 |
YOLO v3 Tiny | Keras* ONNX* |
yolo-v3-tiny-tf yolo-v3-tiny-onnx |
35.9%/39.7% 17.07%/13.64% |
5.582 | 8.848~8.8509 |
YOLO v4 | Keras* | yolo-v4-tf | 71.23%/77.40%/50.26% | 129.5567 | 64.33 |
YOLO v4 Tiny | Keras* | yolo-v4-tiny-tf | 6.9289 | 6.0535 | |
YOLOF | PyTorch* | yolof | 60.69%/66.23%/43.63% | 175.37942 | 48.228 |
YOLOX Tiny | PyTorch* | yolox-tiny | 47.85%/52.56%/31.82% | 6.4813 | 5.0472 |
Model Name | Implementation | OMZ Model Name | Accuracy | GFlops | mParams |
---|---|---|---|---|---|
FaceNet | TensorFlow* | facenet-20180408-102900 | 99.14% | 2.846 | 23.469 |
LResNet100E-IR,ArcFace@ms1m-refine-v2 | MXNet* | face-recognition-resnet100-arcface-onnx | 99.68% | 24.2115 | 65.1320 |
SphereFace | Caffe* | Sphereface | 98.8321% | 3.504 | 22.671 |
Human pose estimation task is to predict a pose: body skeleton, which consists of keypoints and connections between them, for every person in an input image or video. Keypoints are body joints, i.e. ears, eyes, nose, shoulders, knees, etc. There are two major groups of such methods: top-down and bottom-up. The first detects persons in a given frame, crops or rescales detections, then runs pose estimation network for every detection. These methods are very accurate. The second finds all keypoints in a given frame, then groups them by person instances, thus faster than previous, because network runs once.
Model Name | Implementation | OMZ Model Name | Accuracy | GFlops | mParams |
---|---|---|---|---|---|
human-pose-estimation-3d-0001 | PyTorch* | human-pose-estimation-3d-0001 | 100.44437mm | 18.998 | 5.074 |
single-human-pose-estimation-0001 | PyTorch* | single-human-pose-estimation-0001 | 69.0491% | 60.125 | 33.165 |
higher-hrnet-w32-human-pose-estimation | PyTorch* | higher-hrnet-w32-human-pose-estimation | 64.64% | 92.8364 | 28.6180 |
The task of monocular depth estimation is to predict a depth (or inverse depth) map based on a single input image. Since this task contains - in the general setting - some ambiguity, the resulting depth maps are often only defined up to an unknown scaling factor.
Model Name | Implementation | OMZ Model Name | Accuracy | GFlops | mParams |
---|---|---|---|---|---|
midasnet | PyTorch* | midasnet | 0.07071 | 207.25144 | 104.081 |
FCRN ResNet50-Upproj | TensorFlow* | fcrn-dp-nyu-depth-v2-tf | 0.573 | 63.5421 | 34.5255 |
Image inpainting task is to estimate suitable pixel information to fill holes in images.
Model Name | Implementation | OMZ Model Name | Accuracy | GFlops | mParams |
---|---|---|---|---|---|
GMCNN Inpainting | TensorFlow* | gmcnn-places2-tf | 33.47Db | 691.1589 | 12.7773 |
Hybrid-CS-Model-MRI | TensorFlow* | hybrid-cs-model-mri | 34.27Db | 146.6037 | 11.3313 |
Style transfer task is to transfer the style of one image to another.
Model Name | Implementation | OMZ Model Name | Accuracy | GFlops | mParams |
---|---|---|---|---|---|
fast-neural-style-mosaic-onnx | ONNX* | fast-neural-style-mosaic-onnx | 12.04dB | 15.518 | 1.679 |
The task of action recognition is to predict action that is being performed on a short video clip (tensor formed by stacking sampled frames from input video).
Model Name | Implementation | OMZ Model Name | Accuracy | GFlops | mParams |
---|---|---|---|---|---|
RGB-I3D, pretrained on ImageNet* | TensorFlow* | i3d-rgb-tf | 65.96%/86.01% | 278.9815 | 12.6900 |
common-sign-language-0001 | PyTorch* | common-sign-language-0001 | 93.58% | 4.2269 | 4.1128 |
Colorization task is to predict colors of scene from grayscale image.
Model Name | Implementation | OMZ Model Name | Accuracy | GFlops | mParams |
---|---|---|---|---|---|
colorization-v2 | PyTorch* | colorization-v2 | 26.99dB | 83.6045 | 32.2360 |
colorization-siggraph | PyTorch* | colorization-siggraph | 27.73dB | 150.5441 | 34.0511 |
The task of sound classification is to predict what sounds are in an audio fragment.
Model Name | Implementation | OMZ Model Name | Accuracy | GFlops | mParams |
---|---|---|---|---|---|
ACLNet | PyTorch* | aclnet | 86%/92% | 1.4 | 2.7 |
ACLNet-int8 | PyTorch* | aclnet-int8 | 87%/93% | 1.41 | 2.71 |
The task of speech recognition is to recognize and translate spoken language into text.
Model Name | Implementation | OMZ Model Name | Accuracy | GFlops | mParams |
---|---|---|---|---|---|
DeepSpeech V0.6.1 | TensorFlow* | mozilla-deepspeech-0.6.1 | 7.55% | 0.0472 | 47.2 |
DeepSpeech V0.8.2 | TensorFlow* | mozilla-deepspeech-0.8.2 | 6.13% | 0.0472 | 47.2 |
QuartzNet | PyTorch* | quartznet-15x5-en | 3.86% | 2.4195 | 18.8857 |
Wav2Vec 2.0 Base | PyTorch* | wav2vec2-base | 3.39% | 26.843 | 94.3965 |
The task of image translation is to generate the output based on exemplar.
Model Name | Implementation | OMZ Model Name | Accuracy | GFlops | mParams |
---|---|---|---|---|---|
CoCosNet | PyTorch* | cocosnet | 12.93dB | 1080.7032 | 167.9141 |
Model Name | Implementation | OMZ Model Name | Accuracy | GFlops | mParams |
---|---|---|---|---|---|
license-plate-recognition-barrier-0007 | TensorFlow* | license-plate-recognition-barrier-0007 | 98% | 0.347 | 1.435 |
The task of place recognition is to quickly and accurately recognize the location of a given query photograph.
Model Name | Implementation | OMZ Model Name | Accuracy | GFlops | mParams |
---|---|---|---|---|---|
NetVLAD | TensorFlow* | netvlad-tf | 82.0321% | 36.6374 | 149.0021 |
The task of image deblurring.
Model Name | Implementation | OMZ Model Name | Accuracy | GFlops | mParams |
---|---|---|---|---|---|
DeblurGAN-v2 | PyTorch* | deblurgan-v2 | 28.25Db | 80.8919 | 2.1083 |
The task of restoration images from jpeg format.
Model Name | Implementation | OMZ Model Name | Accuracy | GFlops | mParams |
---|---|---|---|---|---|
FBCNN | PyTorch* | fbcnn | 34.34Db | 1420.78235 | 71.922 |
Salient object detection is a task-based on a visual attention mechanism, in which algorithms aim to explore objects or regions more attentive than the surrounding areas on the scene or images.
Model Name | Implementation | OMZ Model Name | Accuracy | GFlops | mParams |
---|---|---|---|---|---|
F3Net | PyTorch* | f3net | 84.21% | 31.2883 | 25.2791 |
Text prediction is a task to predict the next word, given all of the previous words within some text.
Model Name | Implementation | OMZ Model Name | Accuracy | GFlops | mParams |
---|---|---|---|---|---|
GPT-2 | PyTorch* | gpt-2 | 29.00% | 293.0489 | 175.6203 |
Scene text recognition is a task to recognize text on a given image. Researchers compete on creating algorithms which are able to recognize text of different shapes, fonts and background. See details about datasets in here The reported metric is collected over the alphanumeric subset of ICDAR13 (1015 images) in case-insensitive mode.
Model Name | Implementation | OMZ Model Name | Accuracy | GFlops | mParams |
---|---|---|---|---|---|
Resnet-FC | PyTorch* | text-recognition-resnet-fc | 90.94% | 40.3704 | 177.9668 |
ViTSTR Small patch=16, size=224 | PyTorch* | vitstr-small-patch16-224 | 90.34% | 9.1544 | 21.5061 |
Model Name | Implementation | OMZ Model Name | Accuracy | GFlops | mParams |
---|---|---|---|---|---|
ForwardTacotron | PyTorch* | forward-tacotron: forward-tacotron-duration-prediction forward-tacotron-regression |
6.66 4.91 |
13.81 3.05 |
|
WaveRNN | PyTorch* | wavernn: wavernn-upsampler wavernn-rnn |
0.37 0.06 |
0.4 3.83 |
Named entity recognition (NER) is the task of tagging entities in text with their corresponding type.
Model Name | Implementation | OMZ Model Name | Accuracy | GFlops | mParams |
---|---|---|---|---|---|
bert-base-NER | PyTorch* | bert-base-ner | 94.45% | 22.3874 | 107.4319 |
Model Name | Implementation | OMZ Model Name | Accuracy | GFlops | mParams |
---|---|---|---|---|---|
vehicle-reid-0001 | PyTorch* | vehicle-reid-0001 | 96.31%/85.15 % | 2.643 | 2.183 |
[*] Other names and brands may be claimed as the property of others.