Skip to content

Awesome Human Action Recognition Repository

Mert Caglar edited this page Apr 8, 2020 · 1 revision

awesome-human-action-recognition

list the most popular methods about human action recognition

Table of Contents

arxiv Papers

[arXiv:1808.07507] Model-based Hand Pose Estimation for Generalized Hand Shape with Appearance Normalization. [PDF]

Unaiza Ahsan,Rishi Madhok

[arXiv:1711.04161] End-to-end Video-level Representation Learning for Action Recognition. [PDF][code]

Jiagang Zhu, Wei Zou, Zheng Zhu

Journal Papers

[2017 IEEE Access:TPAMI] Long-Term Temporal Convolutions for Action Recognition [PDF]

Gul Varol , Ivan Laptev, and Cordelia Schmid, Fellow, IEEE

Review works

Human Action Recognition and Prediction: A Survey [PDF]

Yu Kong, Member, IEEE, and Yun Fu, Senior Member, IEEE

Conference Papers

2019 ICCV

Graph Convolutional Networks for Temporal Action Localization 作者:Chuang Gan 等

Action recognition with spatial-temporal discriminative filter banks 作者:Yuanjun Xiong 等

AssembleNet: Searching for Multi-Stream Neural Connectivity in Video Architectures 作者:Google Brain

neural architecture search for video understanding——大力出奇迹

DynamoNet: Dynamic Action and Motion Network 作者:Ali Diba Luc Van Gool

Reasoning About Human-Object Interactions Through Dual Attention Networks 作者:Bolei Zhou

Learning Temporal Action Proposals with Fewer Labels 作者:Stanford Feifei组 Juan Carlos Niebles

EPIC-Fusion: Audio-Visual Temporal Binding for Egocentric Action Recognition 作者:Dima Damen 等

SlowFast Networks for Video Recognition (文章链接:https://arxiv.org/abs/1812.03982) kaiming 大神 from FAIR

Video Classification with Channel-Separated Convolutional Networks (文章链接:https://arxiv.org/abs/1904.02811) Du Tran 大神 from FAIR

SCSampler: Sampling Salient Clips from Video for Efficient Action Recognition. oral (文章链接:https://arxiv.org/abs/1904.04289) Du Tran 大神 from FAIR

DistInit: Learning Video Representations without a Single Labeled Video. (文章链接:https://arxiv.org/abs/1901.09244) Du Tran 大神 from FAIR 很简单的思路

TSM: Temporal Shift Module for Efficient Video Understanding 作者:Ji Lin, Chuang Gan, Song Han 论文链接:https://arxiv.org/abs/1811.08383 Github链接:https://github.com/mit-han-lab/temporal-shift-module emmm感觉吧,就像是搞了个带Mask的固定卷积核?

BMN: Boundary-Matching Network for Temporal Action Proposal Generation (文章链接:https://arxiv.org/abs/1907.09702) 来自作者大大解读:林天威:[ICCV 2019][时序动作提名] 边界匹配网络详解 (原文链接:https://zhuanlan.zhihu.com/p/75444151)

Weakly Supervised Energy-Based Learning for Action Segmentation.oral 文章链接:https://github.com/JunLi-Galios/CDFL

Pose-aware Dynamic Attention for Human Object Interaction Detection 文章链接:https://github.com/bobwan1995/PMFNet

What Would You Expect? Anticipating Egocentric Actions With Rolling-Unrolling LSTMs and Modality Attention 项目链接:https://iplab.dmi.unict.it/rulstm/ 论文链接:https://arxiv.org/pdf/1905.09035.pdf GitHub:https://github.com/fpv-iplab/rulstm

Fine-Grained Action Retrieval Through Multiple Parts-of-Speech Embeddings 论文链接:https://arxiv.org/abs/1908.03477 项目链接:https://mwray.github.io/FGAR/

HowTo100M: Learning a Text-Video Embedding by Watching Hundred Million Narrated Video Clips 作者:Antoine Miech, Dimitri Zhukov, Jean-Baptiste Alayrac, Makarand Tapaswi, Ivan Laptev, Josef Sivic 论文链接:https://arxiv.org/abs/1906.03327 项目链接:https://github.com/antoine77340/howto100m code(链接:https://github.com/antoine77340/howto100m)

Temporal Attentive Alignment for Large-Scale Video Domain Adaptation 作者:Min-Hung Chen, Zsolt Kira, Ghassan AlRegib, Jaekwon Woo, Ruxin Chen, Jian Zheng 论文链接:https://arxiv.org/abs/1907.12743 Github链接:https://github.com/cmhungsteve/TA3N

STM- SpatioTemporal and Motion Encoding for Action Recognition from ZJU && SenseTime Group Limited 论文链接:https://arxiv.org/abs/1908.02486

2018 ECCV

[2018,ECCV] Temporal Relational Reasoning in Videos [PDF] [code]
[2018,ECCV] Modality Distillation with Multiple Stream Networks for Action Recognition [PDF]

Bolei Zhou, Alex Andonian, Aude Oliva, and Antonio Torralba

[2018,ECCV] Graph Distillation for Action Detection with Privileged Modalities [PDF]

Stanford University 2 Google Inc.

above two papers, they are similar, which belong to a new hole
[2018,ECCV] Spatio-Temporal Channel Correlation Networks for Action Classification [PDF]
note: qustion:3D network cannot learn the relation between spacial and temporal .why?
[2018,ECCV] Learning Human-Object Interactions by Graph Parsing Neural Networks [PDF] [code]

Siyuan Qi, Wenguan Wang, Baoxiong Jia, Jianbing Shen, Song-Chun Zhu

[2018,ECCV] Interaction-aware Spatio-temporal Pyramid Attention Networks for Action Classification[PDF]

Yang Du,Chunfeng Yuan, Bing Li, Lili Zhao, Yangxi Li and Weiming Hu

[2018,ECCV] Action Search: Spotting Actions in Videos and Its Application to Temporal Action Localization[PDF]

Humam Alwassel, Fabian Caba Heilbron, and Bernard Ghanem

[2018,ECCV] Action Anticipation with RBF Kernelized Feature Mapping RNN [PDF]

Yuge Shi, Basura Fernando, Richard Hartley

[2018,ECCV] Skeleton-Based Action Recognition with Spatial Reasoning and Temporal Stack Learning[PDF]

Chenyang Si, Ya Jing, Wei Wang, Liang Wang, Tieniu Tan

[2018,ECCV] Scenes-Objects-Actions: A Multi-Task, Multi-Label Video Dataset

Jamie Ray, Heng Wang, Du Tran, Yufei Wang, Matt Feiszli, Lorenzo Torresani, Manohar Paluri

[2018,ECCV] End-to-End Joint Semantic Segmentation of Actors and Actions in Video [PDF]
[2018,ECCV] Scenes-Objects-Actions: A Multi-Task, Multi-Label Video Dataset [PDF]

Jamie Ray1, Heng Wang1, Du Tran1 Yufei Wang1 ,etc

2018 CVPR

[2018,CVPR] Optical Flow Guided Feature: A Fast and Robust Motion Representation for

Video Action Recognition [PDF] Shuyang Sun, Zhanghui Kuang, Wanli Ouyang, Lu Sheng, Wei Zhang

[2018,CVPR] Appearance-and-Relation Networks for Video Classification [PDF] [code]

L. Wang, W. Li, W. Li, and L. Van Gool

2018 NIPS

[2018,NIPS] Trajectory Convolution for Action Recognition[PDF] [code]

Yue Zhao, Yuanjun,Xiong

2018 Others

2017 ICCV

2017 CVPR

AdaScan: Adaptive Scan Pooling in Deep Convolutional Neural Networks for Human Action Recognition in Videos [PDF]

Amlan Kar, Nishant Rai, Karan Sikka,Gaurav Sharma

[2017,CVPR] On the Integration of Optical Flow and Action Recognition [PDF]

Laura Sevilla-Lara, Yiyi Liao, Fatma Guney, Varun Jampani, Andreas Geiger, Michael J. Black

2017 Others

2016 CVPR

[2016,CVPR] Convolutional Two-Stream Network Fusion for Video Action Recognition[PDF]

Christoph Feichtenhofer,Axel Pinz,Andrew Zisserman

[2016,CVPR] A Key Volume Mining Deep Framework for Action Recognition[PDF]

Wangjiang Zhu,Jie Hu,Gang Sun,Xudong Cao,Yu Qiao

2016 ECCV

[2016,ECCV] Temporal Segment Networks: Towards Good Practices for Deep Action Recognition [PDF]

Limin Wang,Yuanjun XiongZhe WangYu QiaoDahua LinXiaoou TangLuc Van Gool

2016 ICCV

2016 Others

2015 CVPR

[2015,CVPR] Action Recognition with Trajectory-Pooled Deep-Convolutional Descriptors [PDF]

Limin Wang, Yu Qiao, Xiaoou Tang

2015 ECCV

2015 ICCV

[2015,ICCV] Learning Spatiotemporal Features with 3D Convolutional Networks [PDF]

D. Tran, L. Bourdev, R. Fergus, L. Torresani, and M. Paluri

2015 Others

2014 CVPR

[2014,CVPR] Large-Scale Video Classification with Convolutional Neural Networks [PDF]

A Karpathy , G Toderici , S Shetty , T Leung , R Sukthankar,L. Fei-Fei

2014 ECCV

2014 ICCV

2014 Others

[2014,NIPS] Two-Stream Convolutional Networks for Action Recognition in Videos[PDF]

Karen Simonyan, Andrew Zisserman

Two-Stream Convolutional Networks for Action Recognition in Videos

AdaScan: Adaptive Scan Pooling in Deep Convolutional Neural Networks for Human Action Recognition in Videos [PDF]

Karen Simonyan, Andrew Zisserman

Directions

Traditional Machine Learning Methods

Here we pay more attention on DL methods as follows.

Deep Learning Methods

2D convolutional netwoks

AdaScan: Adaptive Scan Pooling in Deep Convolutional Neural Networks for Human Action Recognition in Videos [PDF]

Amlan Kar, Nishant Rai, Karan Sikka,Gaurav Sharma

3D convolutional networks

[2014,IEEE Acess:TPAMI] 3D Convolutional Neural Networks for Human Action Recognition

Shuiwang Ji ,Wei Xu,Ming Yang ,Kai Yu

[2017 IEEE Access:TPAMI] Long-Term Temporal Convolutions for Action Recognition [PDF]

Gul Varol , Ivan Laptev, and Cordelia Schmid, Fellow, IEEE

LSTM networks

multistream networks

[2014,NIPS] Two-Stream Convolutional Networks for Action Recognition in Videos[PDF]
[2016,ECCV] Temporal Segment Networks: Towards Good Practices for Deep Action Recognition [PDF]
[2017,ICCV] Temporal Relational Reasoning in Videos [PDF] [code]
[2016,CVPR] A Key Volume Mining Deep Framework for Action Recognition[PDF]

new feature

[2018,CVPR] Optical Flow Guided Feature: A Fast and Robust Motion Representation for

Video Action Recognition [PDF] _Shuyang Sun, Zhanghui Kuang, Wanli Ouyang, Lu Sheng, Wei Zhang

[2015,CVPR] Action Recognition with Trajectory-Pooled Deep-Convolutional Descriptors [PDF]
[2017,CVPR] On the Integration of Optical Flow and Action Recognition [PDF]

Laura Sevilla-Lara, Yiyi Liao, Fatma Guney, Varun Jampani, Andreas Geiger, Michael J. Black

explanation deep representation

[arXiv:1712.08416] What have we learned from deep representations for action recognition?

Laura Sevilla-Lara, Yiyi Liao, Fatma Guney, Varun Jampani, Andreas Geiger, Michael J. Black

semantic

[arXiv:1802] Structured Label Inference for Visual Understanding Nelson Nauata, Hexiang Hu, Guang-Tong Zhou, Zhiwei Deng, Zicheng Liao and Greg Mori

datasets

[2018,ECCV] Scenes-Objects-Actions: A Multi-Task, Multi-Label Video Dataset [PDF]

Datasets

  • Year: publish date
  • Videos: amount of flips
  • Views: amount of view angles
  • Actions: amount of action class
  • Subjects: people in Videos
  • Modility: RGB or RGB-D
  • Env: Controlled(C) or Uncontrolled(U)
dataset papers 2017 [PDF]
2018 video benchmarks: a review[PDF]
video datasets online(html)[HTML]
compute vision datasets online[HTML]
Dataset Year Videos Views Actions Subjects Modility Env(C\U) Related Paper
KTH 2004 599 1 6 25 RGB C Recognizing human actions: A local svm approach, IEEE ICPR 2004 [PDF]
HMDB51 2011 7000 - 51 - RGB U LHmdb: A large video database for human motion recognition, ICCV 2011 [PDF]
UCF101 2012 13320 - 101 - RGB U Ucf101: A dataset of 101 human action classes from videos in the wild, 2012,cRCV-TR-12-01 [PDF]

Current Accuracy on Main Datasets

workshops

challeges

other related works

Clone this wiki locally