-
Notifications
You must be signed in to change notification settings - Fork 3
Awesome Action Recognition
Mert Caglar edited this page Apr 8, 2020
·
1 revision
A curated list of action recognition and related area (e.g. object recognition, pose estimation) resources, inspired by awesome-computer-vision.
- Deep Learning for Videos: A 2018 Guide to Action Recognition - Summary of major landmark action recognition research papers till 2018
- Why Can't I Dance in the Mall? Learning to Mitigate Scene Bias in Action Recognition - J. Choi et al., NeurIPS2019. [project web] [code] [arXiv]
- SlowFast Networks for Video Recognition - C. Feichtenhofer et al., ICCV2019. [code]
- Large-scale weakly-supervised pre-training for video action recognition - D. Ghadiyaram et al., arXiv2019.
- Video Classification with Channel-Separated Convolutional Networks - D. Tran et al., arXiv2019.
- DistInit: Learning Video Representations without a Single Labeled Video - R. Girdhar et al., arXiv2019.
- SCSampler: Sampling Salient Clips from Video for Efficient Action Recognition - B. Korbar et al., arXiv2019.
- Video Action Transformer Network - R. Girdhar et al., CVPR2019. [project web]
- Learning Correspondence from the Cycle-consistency of Time - X. Wang et al., CVPR2019. [code] [project web]
- Representation Flow for Action Recognition - AJ. Piergiovanni and M. S. Ryoo et al., CVPR2019.
- Collaborative Spatiotemporal Feature Learning for Video Action Recognition - C. Li et al., CVPR2019.
- Learning Video Representations from Correspondence Proposals - X. Liu et al., CVPR2019.
- Timeception for Complex Action Recognition - N. Hussein et al., CVPR2019.
- The Visual Centrifuge: Model-Free Layered Video Representations - J.-B. Alayrac et al., CVPR2019.
- Long-Term Feature Banks for Detailed Video Understanding - C.-Y. Wu. et al., CVPR2019. [code]
- Temporal Relational Reasoning in Videos - B. Zhou et al., ECCV2018. [code] [project web]
- Action Recognition Zoo - Codes for popular action recognition models, written based on pytorch, verified on the something-something dataset.
- Videos as Space-Time Region Graphs - X. Wang and A. Gupta, ECCV2018.
- Can Spatiotemporal 3D CNNs Retrace the History of 2D CNNs and ImageNet? - K. Hara et al., CVPR2019. [code]
- A Closer Look at Spatiotemporal Convolutions for Action Recognition - D. Tran et al., CVPR2018. [code] [PyTorch]
- Attend and Interact: Higher-Order Object Interactions for Video Understanding - CY. Ma et al., CVPR 2018.
- Non-Local Neural Networks - X. Wang et al., CVPR2018. [code]
- Rethinking Spatiotemporal Feature Learning For Video Understanding - S. Xie et al., arXiv2017.
- ConvNet Architecture Search for Spatiotemporal Feature Learning - D. Tran et al, arXiv2017. Note: Aka Res3D. [code]: In the repository, C3D-v1.1 is the Res3D implementation.
- Learning Spatio-Temporal Representation with Pseudo-3D Residual Networks - Z. Qui et al, ICCV2017. [code]
- Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset - J. Carreira et al, CVPR2017. [code][PyTorch code], [another PyTorch code]
- Learning Spatiotemporal Features with 3D Convolutional Networks - D. Tran et al, ICCV2015. [the official Caffe code] [project web] Note: Aka C3D. [Python Wrapper] Note that the official caffe does not support python wrapper. [TensorFlow], [TensorFlow + Keras], [Another TensorFlow Implemetation], [Keras C3D Project web]: [Keras code], [Pretrained weights].
- Deep Temporal Linear Encoding Networks - A. Diba et al, CVPR2017.
- Temporal Convolutional Networks: A Unified Approach to Action Segmentation and Detection - C. Lea et al, CVPR 2017. [code]
- Long-term Temporal Convolutions - G. Varol et al, TPAMI2017. [project web] [code]
- Temporal Segment Networks: Towards Good Practices for Deep Action Recognition - L. Wang et al, arXiv 2016. [code]
- Convolutional Two-Stream Network Fusion for Video Action Recognition - C. Feichtenhofer et al, CVPR2016. [code]
- Two-Stream Convolutional Networks for Action Recognition in Videos - K. Simonyan and A. Zisserman, NIPS2014.
- [3D ResNet PyTorch]
- [PyTorch Video Research]
- [M-PACT: Michigan Platform for Activity Classification in Tensorflow]
- [Inflated models on PyTorch]
- [I3D models transfered from Tensorflow to PyTorch]
- [A Two Stream Baseline on Kinectics dataset]
- [MMAction]
- [PySlowFast]
- Neural Graph Matching Networks for Fewshot 3D Action Recognition - M. Guo et al., ECCV2018.
- Temporal 3D ConvNets using Temporal Transition Layer - A. Diba et al., CVPRW2018.
- Temporal 3D ConvNets: New Architecture and Transfer Learning for Video Classification - A. Diba et al., arXiv2017.
- Attentional Pooling for Action Recognition - R. Girdhar and D. Ramanan, NIPS2017. [code]
- Fully Context-Aware Video Prediction - Byeon et al, arXiv2017.
- Hidden Two-Stream Convolutional Networks for Action Recognition - Y. Zhu et al, arXiv2017. [code]
- Dynamic Image Networks for Action Recognition - H. Bilen et al, CVPR2016. [code] [project web]
- Long-term Recurrent Convolutional Networks for Visual Recognition and Description - J. Donahue et al, CVPR2015. [code] [project web]
- Describing Videos by Exploiting Temporal Structure - L. Yao et al, ICCV2015. [code] note: from the same group of RCN paper “Delving Deeper into Convolutional Networks for Learning Video Representations"
- Two-Stream SR-CNNs for Action Recognition in Videos - L. Wang et al, BMVC2016.
- Real-time Action Recognition with Enhanced Motion Vector CNNs - B. Zhang et al, CVPR2016. [code]
- Action Recognition with Trajectory-Pooled Deep-Convolutional Descriptors - L. Wang et al, CVPR2015. [code]
- Actional-Structural Graph Convolutional Networks for Skeleton-Based Action Recognition - M. Li et al., CVPR2019.
- An Attention Enhanced Graph Convolutional LSTM Network for Skeleton-Based Action Recognition - C. Si et al., CVPR2019.
- View Adaptive Neural Networks for High Performance Skeleton-Based Human Action Recognition - P. Zhang et al., TPAMI2019.
- Spatial Temporal Graph Convolutional Networks for Skeleton-Based Action Recognition - S. Yan et al., AAAI2018. [code]
- Deep Progressive Reinforcement Learning for Skeleton-Based Action Recognition - Y. Tang et al., CVPR2018.
- Co-occurrence Feature Learning from Skeleton Data for Action Recognition and Detection with Hierarchical Aggregation - C. Li et al., IJCAI2018.
- Part-based Graph Convolutional Network for Action Recognition - K. Thakkar et al., BMVC2018.
- Rethinking the Faster R-CNN Architecture for Temporal Action Localization - Yu-Wei Chao et al., CVPR2018
- Weakly Supervised Action Localization by Sparse Temporal Pooling Network - Phuc Nguyen et al., CVPR 2018
- Temporal Deformable Residual Networks for Action Segmentation in Videos - P. Lei and S. Todrovic., CVPR2018.
- End-to-End, Single-Stream Temporal Action Detection in Untrimmed Videos - Shayamal Buch et al., BMVC 2017 [code]
- Cascaded Boundary Regression for Temporal Action Detection - Jiyang Gao et al., BMVC 2017 [code]
- Temporal Tessellation: A Unified Approach for Video Analysis - Kaufman et al., ICCV2017. [code]
- Temporal Action Detection with Structured Segment Networks - Y. Zhao et al., ICCV2017. [code] [project web]
- Temporal Context Network for Activity Localization in Videos - X. Dai et al., ICCV2017.
- Detecting the Moment of Completion: Temporal Models for Localising Action Completion - F. Heidarivincheh et al., arXiv2017.
- CDC: Convolutional-De-Convolutional Networks for Precise Temporal Action Localization in Untrimmed Videos - Z. Shou et al, CVPR2017. [code]
- SST: Single-Stream Temporal Action Proposals - S. Buch et al, CVPR2017. [code]
- R-C3D: Region Convolutional 3D Network for Temporal Activity Detection - H. Xu et al, arXiv2017. [code] [project web] [PyTorch]
- DAPs: Deep Action Proposals for Action Understanding - V. Escorcia et al, ECCV2016. [code] [raw data]
- Online Action Detection using Joint Classification-Regression Recurrent Neural Networks - Y. Li et al, ECCV2016. Noe: RGB-D Action Detection
- Temporal Action Localization in Untrimmed Videos via Multi-stage CNNs - Z. Shou et al, CVPR2016. [code] Note: Aka S-CNN.
- Fast Temporal Activity Proposals for Efficient Detection of Human Actions in Untrimmed Videos - F. Heilbron et al, CVPR2016. [code] Note: Depends on C3D, aka SparseProp.
- Actionness Estimation Using Hybrid Fully Convolutional Networks - L. Wang et al, CVPR2016. [code] Note: The code is not a complete verision. It only contains a demo, not training. [project web]
- Learning Activity Progression in LSTMs for Activity Detection and Early Detection - S. Ma et al, CVPR2016.
- End-to-end Learning of Action Detection from Frame Glimpses in Videos - S. Yeung et al, CVPR2016. [code] [project web] Note: This method uses reinforcement learning
- Fast Action Proposals for Human Action Detection and Search - G. Yu and J. Yuan, CVPR2015. Note: code for FAP is NOT available online. Note: Aka FAP.
- Bag-of-fragments: Selecting and encoding video fragments for event detection and recounting - P. Mettes et al, ICMR2015.
- Action localization in videos through context walk - K. Soomro et al, ICCV2015.
- A Better Baseline for AVA - R. Girdhar et al., ActivityNet Workshop, CVPR2018.
- Real-Time End-to-End Action Detection with Two-Stream Networks - A. El-Nouby and G. Taylor, arXiv2018.
- Human Action Localization with Sparse Spatial Supervision - P. Weinzaepfel et al., arXiv2017.
- Unsupervised Action Discovery and Localization in Videos - K. Soomro and M. Shah, ICCV2017.
- Spatial-Aware Object Embeddings for Zero-Shot Localization and Classification of Actions - P. Mettes and C. G. M. Snoek, ICCV2017.
- Action Tubelet Detector for Spatio-Temporal Action Localization - V. Kalogeiton et al, ICCV2017. [code] [project web]
- Tube Convolutional Neural Network (T-CNN) for Action Detection in Videos - R. Hou et al, ICCV2017. [project web]
- Chained Multi-stream Networks Exploiting Pose, Motion, and Appearance for Action Classification and Detection - M. Zolfaghari et al, ICCV2017. [project web]
- TORNADO: A Spatio-Temporal Convolutional Regression Network for Video Action Proposal - H. Zhu et al., ICCV2017.
- Online Real time Multiple Spatiotemporal Action Localisation and Prediction - G. Singh et al, ICCV2017. [code]
- AMTnet: Action-Micro-Tube regression by end-to-end trainable deep architecture - S. Saha et al, ICCV2017.
- Am I Done? Predicting Action Progress in Videos - F. Becattini et al, BMVC2017.
- Generic Tubelet Proposals for Action Localization - J. He et al, arXiv2017.
- Incremental Tube Construction for Human Action Detection - H. S. Behl et al, arXiv2017.
- Multi-region two-stream R-CNN for action detection - X. Peng and C. Schmid. ECCV2016. [code]
- Spot On: Action Localization from Pointly-Supervised Proposals - P. Mettes et al, ECCV2016.
- Deep Learning for Detecting Multiple Space-Time Action Tubes in Videos - S. Saha et al, BMVC2016. [code] [project web]
- Learning to track for spatio-temporal action localization - P. Weinzaepfel et al. ICCV2015.
- Action detection by implicit intentional motion clustering - W. Chen and J. Corso, ICCV2015.
- Finding Action Tubes - G. Gkioxari and J. Malik CVPR2015. [code] [project web]
- APT: Action localization proposals from dense trajectories - J. Gemert et al, BMVC2015. [code]
- Spatio-Temporal Object Detection Proposals - D. Oneata et al, ECCV2014. [code] [project web]
- Action localization with tubelets from motion - M. Jain et al, CVPR2014.
- Spatiotemporal deformable part models for action detection - Y. Tian et al, CVPR2013. [code]
- Action localization in videos through context walk - K. Soomro et al, ICCV2015.
- Fast Action Proposals for Human Action Detection and Search - G. Yu and J. Yuan, CVPR2015. Note: code for FAP is NOT available online. Note: Aka FAP.
- Actor and Observer: Joint Modeling of First and Third-Person Videos - G. Sigurdsson et al., CVPR2018. [code]
- What and How Well You Performed? A Multitask Learning Approach to Action Quality Assessment - P. Parma and B. T. Morris. CVPR2019.
- PathTrack: Fast Trajectory Annotation with Path Supervision - S. Manen et al., ICCV2017.
- CortexNet: a Generic Network Family for Robust Visual Temporal Representations A. Canziani and E. Culurciello - arXiv2017. [code] [project web]
- Slicing Convolutional Neural Network for Crowd Video Understanding - J. Shao et al, CVPR2016. [code]
- Two-Stream (RGB and Flow) pretrained model weights
- Video Dataset Overview from Antoine Miech
- HACS
- Moments in Time, paper
- AVA, paper, [INRIA web] for missing videos
- Kinetics, paper, download toolkit
- OOPS - A dataset of unintentional action, paper
- COIN - a large-scale dataset for comprehensive instructional video analysis, paper
- YouTube-8M, technical report
- YouTube-BB, technical report
- DALY Daily Action Localization in Youtube videos. Note: Weakly supervised action detection dataset. Annotations consist of start and end time of each action, one bounding box per each action per video.
- 20BN-JESTER, 20BN-SOMETHING-SOMETHING
- ActivityNet Note: They provide a download script and evaluation code here .
- Charades
- Charades-Ego, paper - First person and third person video aligned dataset
- EPIC-Kitchens, paper - First person videos recorded in kitchens. Note they provide download scripts and a python library here
- Sports-1M - Large scale action recognition dataset.
- THUMOS14 Note: It overlaps with UCF-101 dataset.
- THUMOS15 Note: It overlaps with UCF-101 dataset.
- HOLLYWOOD2: Spatio-Temporal annotations
- UCF-101, annotation provided by THUMOS-14, and corrupted annotation list, UCF-101 corrected annotations and different version annotaions. And there are also some pre-computed spatiotemporal action detection results
- UCF-50.
- UCF-Sports, note: the train/test split link in the official website is broken. Instead, you can download it from here.
- HMDB
- J-HMDB
- LIRIS-HARL
- KTH
- MSR Action Note: It overlaps with KTH datset.
- Sports Videos in the Wild
- NTU RGB+D
- Mixamo Mocap Dataset
- UWA3D Multiview Activity II Dataset
- Northwestern-UCLA Dataset
- SYSU 3D Human-Object Interaction Dataset
- MEVA (Multiview Extended Video with Activities) Dataset
- Efficiently scaling up crowdsourced video annotation - C. Vondrick et. al, IJCV2013. [code]
- The Design and Implementation of ViPER - D. Mihalcik and D. Doermann, Technical report.
- VTT: Visual Object Tagging Tool. Modern app to annotate objects in videos and images. It facilitates the development of an end-to-end machine learning pipeline encompassing the annotation/export/import of assets. Moreover, it could run as a native app or via web.
- VIA: VGG Image Annotator. Simple and standalone manual annotation web-app for image, audio and video. It runs in the web browser and does not require any installation or setup.
- Deformable Convolutional Networks - J. Dai et al., ICCV2017. [official code]
- Detectron - Open Source Object Detection Framework from Facebook AI Research. Includes Mask R-CNN, FPN, and etc. Caffe2 implementation.
- Mask R-CNN - K. He et al, [Detectron], [TensorFlow + Keras], [MXNet], [TensorFlow], [PyTorch] - State-of-the-art object detection/instance segmentation algorithm.
- Faster R-CNN - S. Ren et al, NIPS2015. [official MatCaffe code], [PyCaffe], [TensorFlow], [Another TF implementation] [Keras] - State-of-the-art object detector.
- YOLO - J. Redmon et al, CVPR2016. [official code], [TensorFLow] - Fast object detector.
- YOLO9000 - J. Redmon and A. Farhadi, CVPR2017. [official code] - State-of-the-art object detector which can detect 9000 objects in realtime.
- SSD - W. Liu et al, ECCV2016. [official PyCaffe code], [TensorFlow], [Keras] - State-of-the-art object detector with realtime processing speed.
- RetinaNet - Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He and Piotr Dollár, Facebook AI Research FAIR & ICCV 2017.[Keras] - State-of-the-art object detector with realtime processing speed.
- [Detect to Track and Track to Detect] - C. Feichtenhofer et al., ICCV2017. [code], [project web]
- [Flow-Guided Feature Aggregation for Video Object Detection] - X. Zhu et al., ICCV2017. [code], aka FGFA
- AlphaPose - PyTorch based realtime and accurate pose estimation and tracking tool from SJTU.
- Detect-and-Track: Efficient Pose Estimation in Videos - R. Girdhar et al., arXiv2017.
- OpenPose Library - Caffe based realtime pose estimation library from CMU.
- Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields - Z. Cao et al, CVPR2017. [code] depends on the [caffe RT pose] - Earlier version of OpenPose from CMU.
- DensePose [code] - Dense pose human estimation in the wild implemented in the Detectron framework.
- MultiPoseNet: Fast Multi-Person Pose Estimation using Pose Residual Network - M. Kocabas et al, ECCV2018. [code]
- ActEV (Activities in Extended Video - Activity detection in security camera videos. Runs through 2021. Hosted by NIST.
License