IKEM

This is the implementation of paper:

Elevating Skeleton-Based Action Recognition with Efficient Multi-Modality Self-Supervision[Paper]

Authors: Yiping Wei, Kunyu Peng, Alina Roitberg, Jiaming Zhang, Junwei Zheng, Ruiping Liu, Yufan Chen, Kailun Yang, Rainer Stiefelhagen.

Implementation guidance

Please refer to CrosSCLR for environment requirements and installation.

Data Preparation

We added frame numbers to the raw data for each action, so please reprocess the dataset using our code.

# Generate raw data
$ python tools/ntu_gendata.py --data_path <your path> --ignored_sample_path <your path> --out_folder <your path>

# preprocess
$ python feeder/preprocess_ntu.py --dataset_path <your path> --out_folder <your path>

Model training and evaluation

Please modify the specific .yaml file to replace the training set and test set (evaluation protocol) you use.

# Pre-training of six-modality model
$ python main.py pretrain_crossclr_6views --config config/crossclr_6views/crossclr_6views_xview.yaml

# Evaluation of six-modality model
$ python main.py linear_evaluation --config config/crossclr_6views/le_crossclr_6views_xview.yaml

# Pre-training of three-modality student model(six-modality model as teacher model)
$ python main.py pretrain_student_6views --config config/crossclr_6views/ts_crossclr_6views.yaml

# Evaluation of the three-modality student model
$ python main.py linear_evaluation --config config/crossclr_6views/le_ts_crossclr_6views.yaml

Abstract

Self-supervised representation learning for human action recognition has developed rapidly in recent years. Most of the existing works are based on skeleton data while using a multimodality setup. These works overlooked the differences in performance among modalities, which led to the propagation of erroneous knowledge between modalities while only three fundamental modalities, i.e., joints, bones, and motions are used, hence no additional modalities are explored.
In this work, we first propose an Implicit Knowledge Exchange Module (IKEM) which alleviates the propagation of erroneous knowledge between low-performance modalities. Then, we further propose three new modalities to enrich the complementary information between modalities. Finally, to maintain efficiency when introducing new modalities, we propose a novel teacher-student framework to distill the knowledge from the secondary modalities into the mandatory modalities considering the relationship constrained by anchors, positives, and negatives, named relational cross-modality knowledge distillation. The experimental results demonstrate the effectiveness of our approach, unlocking the efficient use of skeleton-based multimodality data.

Method

An overview of our pre-training model in red dashed box and teacher-student model in blue dashed box, where module (a) is the knowledge exchange module in CrosSCLR, module (b) is our proposed IKEM, and module (c) is the knowledge distillation module for our teacher-student model. All the modules in the figure use the update of the encoder from joint modality as an example.

Acknowledgement

The framework of our code is based on CrosSCLR
ST-GCN
NTURGB-D

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
config		config
feeder		feeder
net		net
processor		processor
resource/NTU-RGB-D		resource/NTU-RGB-D
tools		tools
torchlight		torchlight
.gitignore		.gitignore
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt
src.zip		src.zip

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

IKEM

Implementation guidance

Data Preparation

Model training and evaluation

Abstract

Method

Acknowledgement

About

Releases

Packages

Languages

desehuileng0o0/IKEM

Folders and files

Latest commit

History

Repository files navigation

IKEM

Implementation guidance

Data Preparation

Model training and evaluation

Abstract

Method

Acknowledgement

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages