A human gesture recognition model for the Jester dataset recognition scenario (gesture-level recognition). The model uses an S3D framework with MobileNet V3 backbone. Please refer to the Jester dataset specification to see the list of gestures that are recognized by this model.
The model accepts a stack of frames (8 frames) sampled with a constant frame rate (15 FPS) and produces a prediction on the input clip.
Metric | Value |
---|---|
Top-1 accuracy (continuous Jester) | 93.58% |
GFlops | 4.2269 |
MParams | 4.1128 |
Source framework | PyTorch* |
Batch of images of the shape 1, 3, 8, 224, 224
in the B, C, T, H, W
format, where:
B
- batch sizeC
- channelT
- sequence lengthH
- heightW
- width
Channel order is RGB
.
Batch of images of the shape 1, 3, 8, 224, 224
in the B, C, T, H, W
format, where:
B
- batch sizeC
- channelT
- sequence lengthH
- heightW
- width
Channel order is RGB
.
The model outputs a tensor with the shape B, 27
, each row is a logits vector of performed Jester gestures.
Blob of the shape 1, 27
in the B, C
format, where:
B
- batch sizeC
- predicted logits size
Blob of the shape 1, 27
in the B, C
format, where:
B
- batch sizeC
- predicted logits size
You can download models and if necessary convert them into Inference Engine format using the Model Downloader and other automation tools as shown in the examples below.
An example of using the Model Downloader:
omz_downloader --name <model_name>
An example of using the Model Converter:
omz_converter --name <model_name>
[*] Other names and brands may be claimed as the property of others.
The original model is distributed under the Apache License 2.0.