Skip to content

Latest commit

 

History

History
 
 

403-action-recognition-webcam

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 

Live Action Recognition with OpenVINO™

Human action recognition finds actions over time in a video. The list of actions in this notebook is extensive (400 in total) and covers Person Actions, (for example, drawing, drinking, laughing), Person-Person Actions (for example, hugging, shaking hands), and Person-Object Actions (for example, opening present, mowing the lawn, playing "instrument"). You could find several parent-child groupings on the list of labels, such as braiding hair and brushing hair, salsa dancing, robot dancing, or playing violin and playing guitar. For more information about the labels and the dataset, see the "The Kinetics Human Action Video Dataset" research paper.

Binder
Binder is a free service where the webcam will not work, and performance on the video will not be good. For the best performance, install the notebooks locally.

Notebook Contents

This notebook demonstrates live human action recognition with OpenVINO, using the Action Recognition Models from Open Model Zoo, specifically the Encoder and Decoder from action-recognition-0001. Both models create a sequence to sequence ("seq2seq")1 system to identify the human activities for Kinetics-400 dataset. The models use the Video Transformer approach with ResNet34 encoder2. The notebook shows how to create the following pipeline:

Final part of this notebook shows live inference results from a webcam. Additionaly, you can also upload a video file.

NOTE: To use the webcam, you must run this Jupyter notebook on a computer with a webcam. If you run on a server, the webcam will not work. However, you can still do inference on a video in the final step.

1 seq2seq: Deep learning models that take a sequence of items to the input and output. In this case, input: video frames, output: actions sequence. This "seq2seq" is composed of an encoder and a decoder. The encoder captures "context" of the inputs to be analyzed by the decoder and finally gets the human action and confidence.

2 Video Transformer, and ResNet34.

For more information about the pre-trained models, refer to the Intel models documentation and public. All are included in the Open Model Zoo

Installation Instructions

If you have not installed all required dependencies, follow the Installation Guide.

See Also