Overview of OpenVINO™ Toolkit Public Pre-Trained Models

OpenVINO™ toolkit provides a set of public pre-trained models that you can use for learning and demo purposes or for developing deep learning software. Most recent version is available in the repo on Github. The table Public Pre-Trained Models Device Support summarizes devices supported by each model.

You can download models and convert them into Inference Engine format (*.xml + *.bin) using the OpenVINO™ Model Downloader and other automation tools.

Classification

Model Name	Implementation	OMZ Model Name	Accuracy	GFlops	mParams
AlexNet	Caffe*	alexnet	56.598%/79.812%	1.5	60.965
AntiSpoofNet	PyTorch*	anti-spoof-mn3	3.81%	0.15	3.02
CaffeNet	Caffe*	caffenet	56.714%/79.916%	1.5	60.965
DenseNet 121	Caffe* TensorFlow*	densenet-121 densenet-121-tf	74.42%/92.136% 74.46%/92.13%	5.723~5.7287	7.971
DLA 34	PyTorch*	dla-34	74.64%/92.06%	6.1368	15.7344
EfficientNet B0	TensorFlow* PyTorch*	efficientnet-b0 efficientnet-b0-pytorch	75.70%/92.76% 76.91%/93.21%	0.819	5.268
EfficientNet V2 B0	PyTorch*	efficientnet-v2-b0	78.36%/94.02%	1.4641	7.1094
EfficientNet V2 Small	PyTorch*	efficientnet-v2-s	84.29%/97.26%	16.9406	21.3816
HBONet 1.0	PyTorch*	hbonet-1.0	73.1%/91.0%	0.6208	4.5443
HBONet 0.25	PyTorch*	hbonet-0.25	57.3%/79.8%	0.0758	1.9299
Inception (GoogleNet) V1	Caffe* TensorFlow*	googlenet-v1 googlenet-v1-tf	68.928%/89.144% 69.814%/89.6%	3.016~3.266	6.619~6.999
Inception (GoogleNet) V2	Caffe* TensorFlow*	googlenet-v2 googlenet-v2-tf	72.024%/90.844% 74.084%/91.798%	4.058	11.185
Inception (GoogleNet) V3	TensorFlow* PyTorch*	googlenet-v3 googlenet-v3-pytorch	77.904%/93.808% 77.69%/93.7%	11.469	23.817
Inception (GoogleNet) V4	TensorFlow*	googlenet-v4-tf	80.204%/95.21%	24.584	42.648
Inception-ResNet V2	TensorFlow*	inception-resnet-v2-tf	80.14%/95.10%	22.227	30.223
MixNet L	TensorFlow*	mixnet-l	78.30%/93.91%	0.565	7.3
MobileNet V1 0.25 128	Caffe*	mobilenet-v1-0.25-128	40.54%/65%	0.028	0.468
MobileNet V1 1.0 224	Caffe* TensorFlow*	mobilenet-v1-1.0-224 mobilenet-v1-1.0-224-tf	69.496%/89.224% 71.03%/89.94%	1.148	4.221
MobileNet V2 1.0 224	Caffe* TensorFlow* PyTorch*	mobilenet-v2 mobilenet-v2-1.0-224 mobilenet-v2-pytorch	71.218%/90.178% 71.85%/90.69% 71.81%/90.396%	0.615~0.876	3.489
MobileNet V2 1.4 224	TensorFlow*	mobilenet-v2-1.4-224	74.09%/91.97%	1.183	6.087
MobileNet V3 Small 1.0	TensorFlow*	mobilenet-v3-small-1.0-224-tf	67.36%/87.44%	0.1168	2.537
MobileNet V3 Large 1.0	TensorFlow*	mobilenet-v3-large-1.0-224-tf	75.30%/92.62%	0.4450	5.4721
NFNet F0	PyTorch*	nfnet-f0	83.34%/96.56%	24.8053	71.4444
RegNetX-3.2GF	PyTorch*	regnetx-3.2gf	78.17%/94.08%	6.3893	15.2653
ResNet 26, alpha=0.25	MXNet*	octave-resnet-26-0.25	76.076%/92.584%	3.768	15.99
open-closed-eye-0001	PyTorch*	open-closed-eye-0001	95.84%	0.0014	0.0113
RepVGG A0	PyTorch*	repvgg-a0	72.40%/90.49%	2.7286	8.3094
RepVGG B1	PyTorch*	repvgg-b1	78.37%/94.09%	23.6472	51.8295
RepVGG B3	PyTorch*	repvgg-b3	80.50%/95.25%	52.4407	110.9609
ResNeSt 50	PyTorch*	resnest-50-pytorch	81.11%/95.36%	10.8148	27.4493
ResNet 18	PyTorch*	resnet-18-pytorch	69.754%/89.088%	3.637	11.68
ResNet 34	PyTorch*	resnet-34-pytorch	73.30%/91.42%	7.3409	21.7892
ResNet 50	PyTorch* TensorFlow*	resnet-50-pytorch resnet-50-tf	75.168%/92.212% 76.38%/93.188% 76.17%/92.98%	6.996~8.216	25.53
ReXNet V1 x1.0	PyTorch*	rexnet-v1-x1.0	77.86%/93.87%	0.8325	4.7779
SE-Inception	Caffe*	se-inception	75.996%/92.964%	4.091	11.922
SE-ResNet 50	Caffe*	se-resnet-50	77.596%/93.85%	7.775	28.061
SE-ResNeXt 50	Caffe*	se-resnext-50	78.968%/94.63%	8.533	27.526
Shufflenet V2 x0.5	Caffe*	shufflenet-v2-x0.5	58.80%/81.13%	0.08465	1.363
Shufflenet V2 x1.0	PyTorch*	shufflenet-v2-x1.0	69.36%/88.32%	0.2957	2.2705
SqueezeNet v1.0	Caffe*	squeezenet1.0	57.684%/80.38%	1.737	1.248
SqueezeNet v1.1	Caffe*	squeezenet1.1	58.382%/81%	0.785	1.236
Swin Transformer Tiny, window size=7	PyTorch*	swin-tiny-patch4-window7-224	81.38%/95.51%	9.0280	28.8173
T2T-ViT, transformer layers number=14	PyTorch*	t2t-vit-14	81.44%/95.66%	9.5451	21.5498
VGG 16	Caffe*	vgg16	70.968%/89.878%	30.974	138.358
VGG 19	Caffe*	vgg19	71.062%/89.832%	39.3	143.667

Segmentation

Semantic segmentation is an extension of object detection problem. Instead of returning bounding boxes, semantic segmentation models return a "painted" version of the input image, where the "color" of each pixel represents a certain class. These networks are much bigger than respective object detection networks, but they provide a better (pixel-level) localization of objects and they can detect areas with complex shape.

Semantic Segmentation

Model Name	Implementation	OMZ Model Name	Accuracy	GFlops	mParams
DeepLab V3	TensorFlow*	deeplabv3	68.41%	11.469	23.819
DRN-D-38	PyTorch*	drn-d-38	71.31%	1768.3276	25.9939
HRNet V2 C1 Segmentation	PyTorch*	hrnet-v2-c1-segmentation	77.69%	81.993	66.4768
Fastseg MobileV3Large LR-ASPP, F=128	PyTorch*	fastseg-large	72.67%	140.9611	3.2
Fastseg MobileV3Small LR-ASPP, F=128	PyTorch*	fastseg-small	67.15%	69.2204	1.1
PSPNet R-50-D8	PyTorch*	pspnet-pytorch	70.6%	357.1719	46.5827
OCRNet HRNet_w48	Paddle*	ocrnet-hrnet-w48-paddle	82.15%	324.66	70.47

Instance Segmentation

Instance segmentation is an extension of object detection and semantic segmentation problems. Instead of predicting a bounding box around each object instance instance segmentation model outputs pixel-wise masks for all instances.

Model Name	Implementation	OMZ Model Name	Accuracy	GFlops	mParams
Mask R-CNN Inception ResNet V2	TensorFlow*	mask_rcnn_inception_resnet_v2_atrous_coco	39.86%/35.36%	675.314	92.368
Mask R-CNN ResNet 50	TensorFlow*	mask_rcnn_resnet50_atrous_coco	29.75%/27.46%	294.738	50.222
YOLACT ResNet 50 FPN	PyTorch*	yolact-resnet50-fpn-pytorch	28.0%/30.69%	118.575	36.829

3D Semantic Segmentation

Model Name	Implementation	OMZ Model Name	Accuracy	GFlops	mParams
Brain Tumor Segmentation	MXNet*	brain-tumor-segmentation-0001	92.4003%	409.996	38.192
Brain Tumor Segmentation 2	PyTorch*	brain-tumor-segmentation-0002	91.4826%	300.801	4.51

Object Detection

Several detection models can be used to detect a set of the most popular objects - for example, faces, people, vehicles. Most of the networks are SSD-based and provide reasonable accuracy/performance trade-offs.

Model Name	Implementation	OMZ Model Name	Accuracy	GFlops	mParams
CTPN	TensorFlow*	ctpn	73.67%	55.813	17.237
CenterNet (CTDET with DLAV0) 512x512	ONNX*	ctdet_coco_dlav0_512	44.2756%	62.211	17.911
DETR-ResNet50	PyTorch*	detr-resnet50	39.27% / 42.36%	174.4708	41.3293
EfficientDet-D0	TensorFlow*	efficientdet-d0-tf	31.95%	2.54	3.9
EfficientDet-D1	TensorFlow*	efficientdet-d1-tf	37.54%	6.1	6.6
FaceBoxes	PyTorch*	faceboxes-pytorch	83.565%	1.8975	1.0059
Face Detection Retail	Caffe*	face-detection-retail-0044	83.00%	1.067	0.588
Faster R-CNN with Inception-ResNet v2	TensorFlow*	faster_rcnn_inception_resnet_v2_atrous_coco	40.69%	30.687	13.307
Faster R-CNN with ResNet 50	TensorFlow*	faster_rcnn_resnet50_coco	31.09%	57.203	29.162
MobileFace Detection V1	MXNet*	mobilefacedet-v1-mxnet	78.7488%	3.5456	7.6828
Mobilenet-yolo-v4-syg	Keras*	mobilenet-yolo-v4-syg	84.44%	65.981	61.922
MTCNN	Caffe*	mtcnn: mtcnn-p mtcnn-r mtcnn-o	48.1308%/62.2625%	3.3715 0.0031 0.0263	0.0066 0.1002 0.3890
Pelee	Caffe*	pelee-coco	21.9761%	1.290	5.98
RetinaFace with ResNet 50	PyTorch*	retinaface-resnet50-pytorch	91.78%	88.8627	27.2646
RetinaNet with Resnet 50	TensorFlow*	retinanet-tf	33.15%	238.9469	64.9706
R-FCN with Resnet-101	TensorFlow*	rfcn-resnet101-coco-tf	28.40%/45.02%	53.462	171.85
SSD 300	Caffe*	ssd300	87.09%	62.815	26.285
SSD 512	Caffe*	ssd512	91.07%	180.611	27.189
SSD with MobileNet	Caffe* TensorFlow*	mobilenet-ssd ssd_mobilenet_v1_coco	67.00% 23.32%	2.316~2.494	5.783~6.807
SSD with MobileNet FPN	TensorFlow*	ssd_mobilenet_v1_fpn_coco	35.5453%	123.309	36.188
SSD lite with MobileNet V2	TensorFlow*	ssdlite_mobilenet_v2	24.2946%	1.525	4.475
SSD with ResNet 34 1200x1200	PyTorch*	ssd-resnet34-1200-onnx	20.7198%/39.2752%	433.411	20.058
Ultra Lightweight Face Detection RFB 320	PyTorch*	ultra-lightweight-face-detection-rfb-320	84.78%	0.2106	0.3004
Ultra Lightweight Face Detection slim 320	PyTorch*	ultra-lightweight-face-detection-slim-320	83.32%	0.1724	0.2844
Vehicle License Plate Detection Barrier	TensorFlow*	vehicle-license-plate-detection-barrier-0123	99.52%	0.271	0.547
YOLO v1 Tiny	TensorFlow.js*	yolo-v1-tiny-tf	54.79%	6.9883	15.8587
YOLO v2 Tiny	Keras*	yolo-v2-tiny-tf	27.3443%/29.1184%	5.4236	11.2295
YOLO v2	Keras*	yolo-v2-tf	53.1453%/56.483%	63.0301	50.9526
YOLO v3	Keras* ONNX*	yolo-v3-tf yolo-v3-onnx	62.2759%/67.7221% 48.30%/47.07%	65.9843~65.998	61.9221~61.930
YOLO v3 Tiny	Keras* ONNX*	yolo-v3-tiny-tf yolo-v3-tiny-onnx	35.9%/39.7% 17.07%/13.64%	5.582	8.848~8.8509
YOLO v4	Keras*	yolo-v4-tf	71.23%/77.40%/50.26%	129.5567	64.33
YOLO v4 Tiny	Keras*	yolo-v4-tiny-tf		6.9289	6.0535
YOLOF	PyTorch*	yolof	60.69%/66.23%/43.63%	175.37942	48.228
YOLOX Tiny	PyTorch*	yolox-tiny	47.85%/52.56%/31.82%	6.4813	5.0472

Face Recognition

Model Name	Implementation	OMZ Model Name	Accuracy	GFlops	mParams
FaceNet	TensorFlow*	facenet-20180408-102900	99.14%	2.846	23.469
LResNet100E-IR,ArcFace@ms1m-refine-v2	MXNet*	face-recognition-resnet100-arcface-onnx	99.68%	24.2115	65.1320
SphereFace	Caffe*	Sphereface	98.8321%	3.504	22.671

Human Pose Estimation

Human pose estimation task is to predict a pose: body skeleton, which consists of keypoints and connections between them, for every person in an input image or video. Keypoints are body joints, i.e. ears, eyes, nose, shoulders, knees, etc. There are two major groups of such methods: top-down and bottom-up. The first detects persons in a given frame, crops or rescales detections, then runs pose estimation network for every detection. These methods are very accurate. The second finds all keypoints in a given frame, then groups them by person instances, thus faster than previous, because network runs once.

Model Name	Implementation	OMZ Model Name	Accuracy	GFlops	mParams
human-pose-estimation-3d-0001	PyTorch*	human-pose-estimation-3d-0001	100.44437mm	18.998	5.074
single-human-pose-estimation-0001	PyTorch*	single-human-pose-estimation-0001	69.0491%	60.125	33.165
higher-hrnet-w32-human-pose-estimation	PyTorch*	higher-hrnet-w32-human-pose-estimation	64.64%	92.8364	28.6180

Monocular Depth Estimation

The task of monocular depth estimation is to predict a depth (or inverse depth) map based on a single input image. Since this task contains - in the general setting - some ambiguity, the resulting depth maps are often only defined up to an unknown scaling factor.

Model Name	Implementation	OMZ Model Name	Accuracy	GFlops	mParams
midasnet	PyTorch*	midasnet	0.07071	207.25144	104.081
FCRN ResNet50-Upproj	TensorFlow*	fcrn-dp-nyu-depth-v2-tf	0.573	63.5421	34.5255

Image Inpainting

Image inpainting task is to estimate suitable pixel information to fill holes in images.

Model Name	Implementation	OMZ Model Name	Accuracy	GFlops	mParams
GMCNN Inpainting	TensorFlow*	gmcnn-places2-tf	33.47Db	691.1589	12.7773
Hybrid-CS-Model-MRI	TensorFlow*	hybrid-cs-model-mri	34.27Db	146.6037	11.3313

Style Transfer

Style transfer task is to transfer the style of one image to another.

Model Name	Implementation	OMZ Model Name	Accuracy	GFlops	mParams
fast-neural-style-mosaic-onnx	ONNX*	fast-neural-style-mosaic-onnx	12.04dB	15.518	1.679

Action Recognition

The task of action recognition is to predict action that is being performed on a short video clip (tensor formed by stacking sampled frames from input video).

Model Name	Implementation	OMZ Model Name	Accuracy	GFlops	mParams
RGB-I3D, pretrained on ImageNet*	TensorFlow*	i3d-rgb-tf	65.96%/86.01%	278.9815	12.6900
common-sign-language-0001	PyTorch*	common-sign-language-0001	93.58%	4.2269	4.1128

Colorization

Colorization task is to predict colors of scene from grayscale image.

Model Name	Implementation	OMZ Model Name	Accuracy	GFlops	mParams
colorization-v2	PyTorch*	colorization-v2	26.99dB	83.6045	32.2360
colorization-siggraph	PyTorch*	colorization-siggraph	27.73dB	150.5441	34.0511

Sound Classification

The task of sound classification is to predict what sounds are in an audio fragment.

Model Name	Implementation	OMZ Model Name	Accuracy	GFlops	mParams
ACLNet	PyTorch*	aclnet	86%/92%	1.4	2.7
ACLNet-int8	PyTorch*	aclnet-int8	87%/93%	1.41	2.71

Speech Recognition

The task of speech recognition is to recognize and translate spoken language into text.

Model Name	Implementation	OMZ Model Name	Accuracy	GFlops	mParams
DeepSpeech V0.6.1	TensorFlow*	mozilla-deepspeech-0.6.1	7.55%	0.0472	47.2
DeepSpeech V0.8.2	TensorFlow*	mozilla-deepspeech-0.8.2	6.13%	0.0472	47.2
QuartzNet	PyTorch*	quartznet-15x5-en	3.86%	2.4195	18.8857
Wav2Vec 2.0 Base	PyTorch*	wav2vec2-base	3.39%	26.843	94.3965

Image Translation

The task of image translation is to generate the output based on exemplar.

Model Name	Implementation	OMZ Model Name	Accuracy	GFlops	mParams
CoCosNet	PyTorch*	cocosnet	12.93dB	1080.7032	167.9141

Optical Character Recognition

Model Name	Implementation	OMZ Model Name	Accuracy	GFlops	mParams
license-plate-recognition-barrier-0007	TensorFlow*	license-plate-recognition-barrier-0007	98%	0.347	1.435

Place Recognition

The task of place recognition is to quickly and accurately recognize the location of a given query photograph.

Model Name	Implementation	OMZ Model Name	Accuracy	GFlops	mParams
NetVLAD	TensorFlow*	netvlad-tf	82.0321%	36.6374	149.0021

Deblurring

The task of image deblurring.

Model Name	Implementation	OMZ Model Name	Accuracy	GFlops	mParams
DeblurGAN-v2	PyTorch*	deblurgan-v2	28.25Db	80.8919	2.1083

JPEG artifacts removal

The task of restoration images from jpeg format.

Model Name	Implementation	OMZ Model Name	Accuracy	GFlops	mParams
FBCNN	PyTorch*	fbcnn	34.34Db	1420.78235	71.922

Salient object detection

Salient object detection is a task-based on a visual attention mechanism, in which algorithms aim to explore objects or regions more attentive than the surrounding areas on the scene or images.

Model Name	Implementation	OMZ Model Name	Accuracy	GFlops	mParams
F3Net	PyTorch*	f3net	84.21%	31.2883	25.2791

Text Prediction

Text prediction is a task to predict the next word, given all of the previous words within some text.

Model Name	Implementation	OMZ Model Name	Accuracy	GFlops	mParams
GPT-2	PyTorch*	gpt-2	29.00%	293.0489	175.6203

Text Recognition

Scene text recognition is a task to recognize text on a given image. Researchers compete on creating algorithms which are able to recognize text of different shapes, fonts and background. See details about datasets in here The reported metric is collected over the alphanumeric subset of ICDAR13 (1015 images) in case-insensitive mode.

Model Name	Implementation	OMZ Model Name	Accuracy	GFlops	mParams
Resnet-FC	PyTorch*	text-recognition-resnet-fc	90.94%	40.3704	177.9668
ViTSTR Small patch=16, size=224	PyTorch*	vitstr-small-patch16-224	90.34%	9.1544	21.5061

Text to Speech

Model Name	Implementation	OMZ Model Name	Accuracy	GFlops	mParams
ForwardTacotron	PyTorch*	forward-tacotron: forward-tacotron-duration-prediction forward-tacotron-regression		6.66 4.91	13.81 3.05
WaveRNN	PyTorch*	wavernn: wavernn-upsampler wavernn-rnn		0.37 0.06	0.4 3.83

Named Entity Recognition

Named entity recognition (NER) is the task of tagging entities in text with their corresponding type.

Model Name	Implementation	OMZ Model Name	Accuracy	GFlops	mParams
bert-base-NER	PyTorch*	bert-base-ner	94.45%	22.3874	107.4319

Vehicle Reidentification

Model Name	Implementation	OMZ Model Name	Accuracy	GFlops	mParams
vehicle-reid-0001	PyTorch*	vehicle-reid-0001	96.31%/85.15 %	2.643	2.183

Legal Information

[*] Other names and brands may be claimed as the property of others.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

index.md

index.md

Overview of OpenVINO™ Toolkit Public Pre-Trained Models

Classification

Segmentation

Semantic Segmentation

Instance Segmentation

3D Semantic Segmentation

Object Detection

Face Recognition

Human Pose Estimation

Monocular Depth Estimation

Image Inpainting

Style Transfer

Action Recognition

Colorization

Sound Classification

Speech Recognition

Image Translation

Optical Character Recognition

Place Recognition

Deblurring

JPEG artifacts removal

Salient object detection

Text Prediction

Text Recognition

Text to Speech

Named Entity Recognition

Vehicle Reidentification

See Also

Legal Information

Files

index.md

Latest commit

History

index.md

File metadata and controls

Overview of OpenVINO™ Toolkit Public Pre-Trained Models

Classification

Segmentation

Semantic Segmentation

Instance Segmentation

3D Semantic Segmentation

Object Detection

Face Recognition

Human Pose Estimation

Monocular Depth Estimation

Image Inpainting

Style Transfer

Action Recognition

Colorization

Sound Classification

Speech Recognition

Image Translation

Optical Character Recognition

Place Recognition

Deblurring

JPEG artifacts removal

Salient object detection

Text Prediction

Text Recognition

Text to Speech

Named Entity Recognition

Vehicle Reidentification

See Also

Legal Information