Awesome Human Action Recognition Repository

awesome-human-action-recognition

list the most popular methods about human action recognition

arXiv Papers
Journal Papers
Conference Papers
- 2019：ICCV,
- 2018: CVPR, ECCV, NIPS，Others
- 2017: CVPR, ICCV, Others
- 2016: CVPR, ECCV, Others
- 2015: CVPR, ICCV, Others
- 2014: CVPR, Others & Before
Directions:
- Traditional Machine Learning Methods
- Deep Learning Methods
Datasets
Current Accuracy on Main Datasets
Workshops
Challenges
Other Related Papers

arxiv Papers

[arXiv:1808.07507] Model-based Hand Pose Estimation for Generalized Hand Shape with Appearance Normalization. [PDF]

Unaiza Ahsan,Rishi Madhok

[arXiv:1711.04161] End-to-end Video-level Representation Learning for Action Recognition. [PDF][code]

Jiagang Zhu, Wei Zou, Zheng Zhu

Journal Papers

[2017 IEEE Access:TPAMI] Long-Term Temporal Convolutions for Action Recognition [PDF]

Gul Varol , Ivan Laptev, and Cordelia Schmid, Fellow, IEEE

Review works

Human Action Recognition and Prediction: A Survey [PDF]

Yu Kong, Member, IEEE, and Yun Fu, Senior Member, IEEE

Conference Papers

2019 ICCV

Graph Convolutional Networks for Temporal Action Localization 作者：Chuang Gan 等

Action recognition with spatial-temporal discriminative filter banks 作者：Yuanjun Xiong 等

AssembleNet: Searching for Multi-Stream Neural Connectivity in Video Architectures 作者：Google Brain

neural architecture search for video understanding——大力出奇迹

DynamoNet: Dynamic Action and Motion Network 作者：Ali Diba Luc Van Gool

Reasoning About Human-Object Interactions Through Dual Attention Networks 作者：Bolei Zhou

Learning Temporal Action Proposals with Fewer Labels 作者：Stanford Feifei组 Juan Carlos Niebles

EPIC-Fusion: Audio-Visual Temporal Binding for Egocentric Action Recognition 作者：Dima Damen 等

SlowFast Networks for Video Recognition （文章链接：https://arxiv.org/abs/1812.03982） kaiming 大神 from FAIR

Video Classification with Channel-Separated Convolutional Networks （文章链接：https://arxiv.org/abs/1904.02811） Du Tran 大神 from FAIR

SCSampler: Sampling Salient Clips from Video for Efficient Action Recognition. oral （文章链接：https://arxiv.org/abs/1904.04289） Du Tran 大神 from FAIR

DistInit: Learning Video Representations without a Single Labeled Video. （文章链接：https://arxiv.org/abs/1901.09244） Du Tran 大神 from FAIR 很简单的思路

TSM: Temporal Shift Module for Efficient Video Understanding 作者：Ji Lin, Chuang Gan, Song Han 论文链接：https://arxiv.org/abs/1811.08383 Github链接：https://github.com/mit-han-lab/temporal-shift-module emmm感觉吧，就像是搞了个带Mask的固定卷积核？

BMN: Boundary-Matching Network for Temporal Action Proposal Generation （文章链接：https://arxiv.org/abs/1907.09702）来自作者大大解读：林天威：[ICCV 2019][时序动作提名] 边界匹配网络详解（原文链接：https://zhuanlan.zhihu.com/p/75444151）

Weakly Supervised Energy-Based Learning for Action Segmentation.oral 文章链接：https://github.com/JunLi-Galios/CDFL

Pose-aware Dynamic Attention for Human Object Interaction Detection 文章链接：https://github.com/bobwan1995/PMFNet

What Would You Expect? Anticipating Egocentric Actions With Rolling-Unrolling LSTMs and Modality Attention 项目链接：https://iplab.dmi.unict.it/rulstm/ 论文链接：https://arxiv.org/pdf/1905.09035.pdf GitHub：https://github.com/fpv-iplab/rulstm

Fine-Grained Action Retrieval Through Multiple Parts-of-Speech Embeddings 论文链接：https://arxiv.org/abs/1908.03477 项目链接：https://mwray.github.io/FGAR/

HowTo100M: Learning a Text-Video Embedding by Watching Hundred Million Narrated Video Clips 作者：Antoine Miech, Dimitri Zhukov, Jean-Baptiste Alayrac, Makarand Tapaswi, Ivan Laptev, Josef Sivic 论文链接：https://arxiv.org/abs/1906.03327 项目链接：https://github.com/antoine77340/howto100m code（链接：https://github.com/antoine77340/howto100m）

Temporal Attentive Alignment for Large-Scale Video Domain Adaptation 作者：Min-Hung Chen, Zsolt Kira, Ghassan AlRegib, Jaekwon Woo, Ruxin Chen, Jian Zheng 论文链接：https://arxiv.org/abs/1907.12743 Github链接：https://github.com/cmhungsteve/TA3N

STM- SpatioTemporal and Motion Encoding for Action Recognition from ZJU && SenseTime Group Limited 论文链接：https://arxiv.org/abs/1908.02486

2018 ECCV

[2018,ECCV] Temporal Relational Reasoning in Videos [PDF] [code]

[2018,ECCV] Modality Distillation with Multiple Stream Networks for Action Recognition [PDF]

Bolei Zhou, Alex Andonian, Aude Oliva, and Antonio Torralba

[2018,ECCV] Graph Distillation for Action Detection with Privileged Modalities [PDF]

Stanford University 2 Google Inc.

above two papers, they are similar, which belong to a new hole

[2018,ECCV] Spatio-Temporal Channel Correlation Networks for Action Classification [PDF]

note: qustion:3D network cannot learn the relation between spacial and temporal .why?

[2018,ECCV] Learning Human-Object Interactions by Graph Parsing Neural Networks [PDF] [code]

Siyuan Qi, Wenguan Wang, Baoxiong Jia, Jianbing Shen, Song-Chun Zhu

[2018,ECCV] Interaction-aware Spatio-temporal Pyramid Attention Networks for Action Classification[PDF]

Yang Du,Chunfeng Yuan, Bing Li, Lili Zhao, Yangxi Li and Weiming Hu

[2018,ECCV] Action Search: Spotting Actions in Videos and Its Application to Temporal Action Localization[PDF]

Humam Alwassel, Fabian Caba Heilbron, and Bernard Ghanem

[2018,ECCV] Action Anticipation with RBF Kernelized Feature Mapping RNN [PDF]

Yuge Shi, Basura Fernando, Richard Hartley

[2018,ECCV] Skeleton-Based Action Recognition with Spatial Reasoning and Temporal Stack Learning[PDF]

Chenyang Si, Ya Jing, Wei Wang, Liang Wang, Tieniu Tan

[2018,ECCV] Scenes-Objects-Actions: A Multi-Task, Multi-Label Video Dataset

Jamie Ray, Heng Wang, Du Tran, Yufei Wang, Matt Feiszli, Lorenzo Torresani, Manohar Paluri

[2018,ECCV] End-to-End Joint Semantic Segmentation of Actors and Actions in Video [PDF]

[2018,ECCV] Scenes-Objects-Actions: A Multi-Task, Multi-Label Video Dataset [PDF]

Jamie Ray1, Heng Wang1, Du Tran1 Yufei Wang1 ,etc

2018 CVPR

[2018,CVPR] Optical Flow Guided Feature: A Fast and Robust Motion Representation for

Video Action Recognition [PDF] Shuyang Sun, Zhanghui Kuang, Wanli Ouyang, Lu Sheng, Wei Zhang

[2018,CVPR] Appearance-and-Relation Networks for Video Classification [PDF] [code]

L. Wang, W. Li, W. Li, and L. Van Gool

2018 NIPS

[2018,NIPS] Trajectory Convolution for Action Recognition[PDF] [code]

Yue Zhao， Yuanjun，Xiong

2018 Others

2017 ICCV

2017 CVPR

AdaScan: Adaptive Scan Pooling in Deep Convolutional Neural Networks for Human Action Recognition in Videos [PDF]

Amlan Kar, Nishant Rai， Karan Sikka,Gaurav Sharma

[2017,CVPR] On the Integration of Optical Flow and Action Recognition [PDF]

Laura Sevilla-Lara, Yiyi Liao, Fatma Guney, Varun Jampani, Andreas Geiger, Michael J. Black

2017 Others

2016 CVPR

[2016,CVPR] Convolutional Two-Stream Network Fusion for Video Action Recognition[PDF]

Christoph Feichtenhofer，Axel Pinz，Andrew Zisserman

[2016,CVPR] A Key Volume Mining Deep Framework for Action Recognition[PDF]

Wangjiang Zhu,Jie Hu,Gang Sun,Xudong Cao,Yu Qiao

2016 ECCV

[2016,ECCV] Temporal Segment Networks: Towards Good Practices for Deep Action Recognition [PDF]

Limin Wang,Yuanjun XiongZhe WangYu QiaoDahua LinXiaoou TangLuc Van Gool

2016 ICCV

2016 Others

2015 CVPR

[2015,CVPR] Action Recognition with Trajectory-Pooled Deep-Convolutional Descriptors [PDF]

Limin Wang, Yu Qiao, Xiaoou Tang

2015 ECCV

2015 ICCV

[2015,ICCV] Learning Spatiotemporal Features with 3D Convolutional Networks [PDF]

D. Tran, L. Bourdev, R. Fergus, L. Torresani, and M. Paluri

2015 Others

2014 CVPR

[2014,CVPR] Large-Scale Video Classification with Convolutional Neural Networks [PDF]

A Karpathy ， G Toderici ， S Shetty ， T Leung ， R Sukthankar,L. Fei-Fei

2014 ECCV

2014 ICCV

2014 Others

[2014,NIPS] Two-Stream Convolutional Networks for Action Recognition in Videos[PDF]

Karen Simonyan, Andrew Zisserman

Two-Stream Convolutional Networks for Action Recognition in Videos

AdaScan: Adaptive Scan Pooling in Deep Convolutional Neural Networks for Human Action Recognition in Videos [PDF]

Karen Simonyan, Andrew Zisserman

Directions

Traditional Machine Learning Methods

Here we pay more attention on DL methods as follows.

Deep Learning Methods

2D convolutional netwoks

AdaScan: Adaptive Scan Pooling in Deep Convolutional Neural Networks for Human Action Recognition in Videos [PDF]

Amlan Kar, Nishant Rai， Karan Sikka,Gaurav Sharma

3D convolutional networks

[2014,IEEE Acess:TPAMI] 3D Convolutional Neural Networks for Human Action Recognition

Shuiwang Ji ,Wei Xu,Ming Yang ,Kai Yu

[2017 IEEE Access:TPAMI] Long-Term Temporal Convolutions for Action Recognition [PDF]

Gul Varol , Ivan Laptev, and Cordelia Schmid, Fellow, IEEE

LSTM networks

multistream networks

[2014,NIPS] Two-Stream Convolutional Networks for Action Recognition in Videos[PDF]

[2016,ECCV] Temporal Segment Networks: Towards Good Practices for Deep Action Recognition [PDF]

[2017,ICCV] Temporal Relational Reasoning in Videos [PDF] [code]

[2016,CVPR] A Key Volume Mining Deep Framework for Action Recognition[PDF]

new feature

[2018,CVPR] Optical Flow Guided Feature: A Fast and Robust Motion Representation for

Video Action Recognition [PDF] _Shuyang Sun, Zhanghui Kuang, Wanli Ouyang, Lu Sheng, Wei Zhang

[2015,CVPR] Action Recognition with Trajectory-Pooled Deep-Convolutional Descriptors [PDF]

[2017,CVPR] On the Integration of Optical Flow and Action Recognition [PDF]

Laura Sevilla-Lara, Yiyi Liao, Fatma Guney, Varun Jampani, Andreas Geiger, Michael J. Black

explanation deep representation

[arXiv:1712.08416] What have we learned from deep representations for action recognition?

Laura Sevilla-Lara, Yiyi Liao, Fatma Guney, Varun Jampani, Andreas Geiger, Michael J. Black

semantic

[arXiv:1802] Structured Label Inference for Visual Understanding Nelson Nauata, Hexiang Hu, Guang-Tong Zhou, Zhiwei Deng, Zicheng Liao and Greg Mori

datasets

[2018,ECCV] Scenes-Objects-Actions: A Multi-Task, Multi-Label Video Dataset [PDF]

Datasets

Year： publish date
Videos: amount of flips
Views: amount of view angles
Actions: amount of action class
Subjects: people in Videos
Modility: RGB or RGB-D
Env: Controlled(C) or Uncontrolled(U)

dataset papers 2017 [PDF]

2018 video benchmarks: a review[PDF]

video datasets online(html)[HTML]

compute vision datasets online[HTML]

Dataset	Year	Videos	Views	Actions	Subjects	Modility	Env(C\U)	Related Paper
KTH	2004	599	1	6	25	RGB	C	Recognizing human actions: A local svm approach, IEEE ICPR 2004 [PDF]
HMDB51	2011	7000	-	51	-	RGB	U	LHmdb: A large video database for human motion recognition, ICCV 2011 [PDF]
UCF101	2012	13320	-	101	-	RGB	U	Ucf101: A dataset of 101 human action classes from videos in the wild, 2012,cRCV-TR-12-01 [PDF]

Awesome Human Action Recognition Repository

awesome-human-action-recognition

Table of Contents

arxiv Papers

[arXiv:1808.07507] Model-based Hand Pose Estimation for Generalized Hand Shape with Appearance Normalization. [PDF]

[arXiv:1711.04161] End-to-end Video-level Representation Learning for Action Recognition. [PDF][code]

Journal Papers

[2017 IEEE Access:TPAMI] Long-Term Temporal Convolutions for Action Recognition [PDF]

Review works

Human Action Recognition and Prediction: A Survey [PDF]

Conference Papers

2019 ICCV

2018 ECCV

[2018,ECCV] Temporal Relational Reasoning in Videos [PDF] [code]

[2018,ECCV] Modality Distillation with Multiple Stream Networks for Action Recognition [PDF]

[2018,ECCV] Graph Distillation for Action Detection with Privileged Modalities [PDF]

above two papers, they are similar, which belong to a new hole

[2018,ECCV] Spatio-Temporal Channel Correlation Networks for Action Classification [PDF]

note: qustion:3D network cannot learn the relation between spacial and temporal .why?

[2018,ECCV] Learning Human-Object Interactions by Graph Parsing Neural Networks [PDF] [code]

[2018,ECCV] Interaction-aware Spatio-temporal Pyramid Attention Networks for Action Classification[PDF]

[2018,ECCV] Action Search: Spotting Actions in Videos and Its Application to Temporal Action Localization[PDF]

[2018,ECCV] Action Anticipation with RBF Kernelized Feature Mapping RNN [PDF]

[2018,ECCV] Skeleton-Based Action Recognition with Spatial Reasoning and Temporal Stack Learning[PDF]

[2018,ECCV] Scenes-Objects-Actions: A Multi-Task, Multi-Label Video Dataset

[2018,ECCV] End-to-End Joint Semantic Segmentation of Actors and Actions in Video [PDF]

[2018,ECCV] Scenes-Objects-Actions: A Multi-Task, Multi-Label Video Dataset [PDF]

2018 CVPR

[2018,CVPR] Optical Flow Guided Feature: A Fast and Robust Motion Representation for

[2018,CVPR] Appearance-and-Relation Networks for Video Classification [PDF] [code]

2018 NIPS

[2018,NIPS] Trajectory Convolution for Action Recognition[PDF] [code]

2018 Others

2017 ICCV

2017 CVPR

AdaScan: Adaptive Scan Pooling in Deep Convolutional Neural Networks for Human Action Recognition in Videos [PDF]

[2017,CVPR] On the Integration of Optical Flow and Action Recognition [PDF]

2017 Others

2016 CVPR

[2016,CVPR] Convolutional Two-Stream Network Fusion for Video Action Recognition[PDF]

[2016,CVPR] A Key Volume Mining Deep Framework for Action Recognition[PDF]

2016 ECCV

[2016,ECCV] Temporal Segment Networks: Towards Good Practices for Deep Action Recognition [PDF]

2016 ICCV

2016 Others

2015 CVPR

[2015,CVPR] Action Recognition with Trajectory-Pooled Deep-Convolutional Descriptors [PDF]

2015 ECCV

2015 ICCV

[2015,ICCV] Learning Spatiotemporal Features with 3D Convolutional Networks [PDF]

2015 Others

2014 CVPR

[2014,CVPR] Large-Scale Video Classification with Convolutional Neural Networks [PDF]

2014 ECCV

2014 ICCV

2014 Others

[2014,NIPS] Two-Stream Convolutional Networks for Action Recognition in Videos[PDF]

AdaScan: Adaptive Scan Pooling in Deep Convolutional Neural Networks for Human Action Recognition in Videos [PDF]

Directions

Traditional Machine Learning Methods

Deep Learning Methods

2D convolutional netwoks

AdaScan: Adaptive Scan Pooling in Deep Convolutional Neural Networks for Human Action Recognition in Videos [PDF]

3D convolutional networks

[2014,IEEE Acess:TPAMI] 3D Convolutional Neural Networks for Human Action Recognition

[2017 IEEE Access:TPAMI] Long-Term Temporal Convolutions for Action Recognition [PDF]

LSTM networks

multistream networks

[2014,NIPS] Two-Stream Convolutional Networks for Action Recognition in Videos[PDF]

[2016,ECCV] Temporal Segment Networks: Towards Good Practices for Deep Action Recognition [PDF]

[2017,ICCV] Temporal Relational Reasoning in Videos [PDF] [code]

[2016,CVPR] A Key Volume Mining Deep Framework for Action Recognition[PDF]

new feature

[2018,CVPR] Optical Flow Guided Feature: A Fast and Robust Motion Representation for

[2015,CVPR] Action Recognition with Trajectory-Pooled Deep-Convolutional Descriptors [PDF]

[2017,CVPR] On the Integration of Optical Flow and Action Recognition [PDF]

explanation deep representation

[arXiv:1712.08416] What have we learned from deep representations for action recognition?

semantic

datasets