Awesome Action Recognition

Awesome Action Recognition:

A curated list of action recognition and related area (e.g. object recognition, pose estimation) resources, inspired by awesome-computer-vision.

Action Recognition and Video Understanding

Deep Learning for Videos: A 2018 Guide to Action Recognition - Summary of major landmark action recognition research papers till 2018

Video Dataset Overview from Antoine Miech
HACS
Moments in Time, paper
AVA, paper, [INRIA web] for missing videos
Kinetics, paper, download toolkit
OOPS - A dataset of unintentional action, paper
COIN - a large-scale dataset for comprehensive instructional video analysis, paper
YouTube-8M, technical report
YouTube-BB, technical report
DALY Daily Action Localization in Youtube videos. Note: Weakly supervised action detection dataset. Annotations consist of start and end time of each action, one bounding box per each action per video.
20BN-JESTER, 20BN-SOMETHING-SOMETHING
ActivityNet Note: They provide a download script and evaluation code here .
Charades
Charades-Ego, paper - First person and third person video aligned dataset
EPIC-Kitchens, paper - First person videos recorded in kitchens. Note they provide download scripts and a python library here
Sports-1M - Large scale action recognition dataset.
THUMOS14 Note: It overlaps with UCF-101 dataset.
THUMOS15 Note: It overlaps with UCF-101 dataset.
HOLLYWOOD2: Spatio-Temporal annotations
UCF-101, annotation provided by THUMOS-14, and corrupted annotation list, UCF-101 corrected annotations and different version annotaions. And there are also some pre-computed spatiotemporal action detection results
UCF-50.
UCF-Sports, note: the train/test split link in the official website is broken. Instead, you can download it from here.
HMDB
J-HMDB
LIRIS-HARL
KTH
MSR Action Note: It overlaps with KTH datset.
Sports Videos in the Wild
NTU RGB+D
Mixamo Mocap Dataset
UWA3D Multiview Activity II Dataset
Northwestern-UCLA Dataset
SYSU 3D Human-Object Interaction Dataset
MEVA (Multiview Extended Video with Activities) Dataset

Efficiently scaling up crowdsourced video annotation - C. Vondrick et. al, IJCV2013. [code]
The Design and Implementation of ViPER - D. Mihalcik and D. Doermann, Technical report.
VTT: Visual Object Tagging Tool. Modern app to annotate objects in videos and images. It facilitates the development of an end-to-end machine learning pipeline encompassing the annotation/export/import of assets. Moreover, it could run as a native app or via web.
VIA: VGG Image Annotator. Simple and standalone manual annotation web-app for image, audio and video. It runs in the web browser and does not require any installation or setup.

Object Recognition

Deformable Convolutional Networks - J. Dai et al., ICCV2017. [official code]
Detectron - Open Source Object Detection Framework from Facebook AI Research. Includes Mask R-CNN, FPN, and etc. Caffe2 implementation.
Mask R-CNN - K. He et al, [Detectron], [TensorFlow + Keras], [MXNet], [TensorFlow], [PyTorch] - State-of-the-art object detection/instance segmentation algorithm.
Faster R-CNN - S. Ren et al, NIPS2015. [official MatCaffe code], [PyCaffe], [TensorFlow], [Another TF implementation] [Keras] - State-of-the-art object detector.
YOLO - J. Redmon et al, CVPR2016. [official code], [TensorFLow] - Fast object detector.
YOLO9000 - J. Redmon and A. Farhadi, CVPR2017. [official code] - State-of-the-art object detector which can detect 9000 objects in realtime.
SSD - W. Liu et al, ECCV2016. [official PyCaffe code], [TensorFlow], [Keras] - State-of-the-art object detector with realtime processing speed.
RetinaNet - Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He and Piotr Dollár, Facebook AI Research FAIR & ICCV 2017.[Keras] - State-of-the-art object detector with realtime processing speed.

[Detect to Track and Track to Detect] - C. Feichtenhofer et al., ICCV2017. [code], [project web]
[Flow-Guided Feature Aggregation for Video Object Detection] - X. Zhu et al., ICCV2017. [code], aka FGFA

Pose Estimation

AlphaPose - PyTorch based realtime and accurate pose estimation and tracking tool from SJTU.
Detect-and-Track: Efficient Pose Estimation in Videos - R. Girdhar et al., arXiv2017.
OpenPose Library - Caffe based realtime pose estimation library from CMU.
Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields - Z. Cao et al, CVPR2017. [code] depends on the [caffe RT pose] - Earlier version of OpenPose from CMU.
DensePose [code] - Dense pose human estimation in the wild implemented in the Detectron framework.
MultiPoseNet: Fast Multi-Person Pose Estimation using Pose Residual Network - M. Kocabas et al, ECCV2018. [code]

Competitions

ActEV (Activities in Extended Video - Activity detection in security camera videos. Runs through 2021. Hosted by NIST.

Licenses

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Awesome Action Recognition

Awesome Action Recognition:

Contents

Action Recognition and Video Understanding

Object Recognition

Pose Estimation

Competitions

Licenses

Awesome Action Recognition

Awesome Action Recognition:

Contents

Action Recognition and Video Understanding

Summary posts

Video Representation

Useful Code Repos on Video Representation Learning

Action Classification

Skeleton-Based Action Classification

Temporal Action Detection

Spatio-Temporal Action Detection

Ego-Centric Action Recognition

Miscellaneous

Action Recognition Datasets

Video Annotation

Object Recognition

Object Detection

Video Object Detection

Video Object Detection Datasets

Pose Estimation

Pose Estimation

Competitions

Competitions

Licenses

Clone this wiki locally