PyTorch implementation of "Learning Multi-Granular Spatio-Temporal Graph Network for Skeleton-based Action Recognition" in ACM Multimedia 2021.
- Python >= 3.6
- PyTorch >= 1.2.0
- PyYAML, tqdm, tensorboardX
Disk usage warning: after preprocessing, the total sizes of datasets are around 38GB, 77GB, 63GB for NTU RGB+D 60, NTU RGB+D 120, and Kinetics 400, respectively. The raw/intermediate sizes may be larger.
There are 3 datasets to download:
- NTU RGB+D 60 Skeleton
- NTU RGB+D 120 Skeleton
- Kinetics 400 Skeleton
-
Request dataset here: http://rose1.ntu.edu.sg/Datasets/actionRecognition.asp
-
Download the skeleton-only datasets:
nturgbd_skeletons_s001_to_s017.zip
(NTU RGB+D 60)nturgbd_skeletons_s018_to_s032.zip
(NTU RGB+D 120, on top of NTU RGB+D 60)- Total size should be 5.8GB + 4.5GB.
-
Download missing skeletons lookup files from the authors' GitHub repo:
-
NTU RGB+D 60 Missing Skeletons:
wget https://raw.githubusercontent.com/shahroudy/NTURGB-D/master/Matlab/NTU_RGBD_samples_with_missing_skeletons.txt
-
NTU RGB+D 120 Missing Skeletons:
wget https://raw.githubusercontent.com/shahroudy/NTURGB-D/master/Matlab/NTU_RGBD120_samples_with_missing_skeletons.txt
-
Remember to remove the first few lines of text in these files!
-
- Download dataset from ST-GCN repo: https://github.com/yysijie/st-gcn/blob/master/OLD_README.md#kinetics-skeleton
- This might be useful if you want to
wget
the dataset from Google Drive
Put downloaded data into the following directory structure:
- data/
- kinetics_raw/
- kinetics_train/
...
- kinetics_val/
...
- kinetics_train_label.json
- keintics_val_label.json
- nturgbd_raw/
- nturgb+d_skeletons/ # from `nturgbd_skeletons_s001_to_s017.zip`
...
- nturgb+d_skeletons120/ # from `nturgbd_skeletons_s018_to_s032.zip`
...
- NTU_RGBD_samples_with_missing_skeletons.txt
- NTU_RGBD120_samples_with_missing_skeletons.txt
- NTU RGB+D
cd data_gen
python3 ntu_gendata.py
python3 ntu120_gendata.py
- Kinetics
python3 kinetics_gendata.py
- Generate the bone data with:
python gen_bone_data.py --dataset ntu
python gen_bone_data.py --dataset ntu120
python gen_bone_data.py --dataset kinetics
- Generate the motion data with:
python gen_motion_data.py --dataset ntu
python gen_motion_data.py --dataset ntu120
python gen_motion_data.py --dataset kinetics
To be released soon (so many files)
- The general training template command:
CUDA_VISIBLE_DEVICES=0,1,2,3 python main_dualhead.py --config config/ntu-xsub/train_joint.yaml \
--work-dir work_dir/ntu-xsub/train_joint \
--base-lr 0.05 --device 0 1 2 3 \
--step 40 60 80 \
--batch-size 64 --forward-batch-size 64 --test-batch-size 64 \
--num-epoch 300 \
--eval-interval 1 --save-interval 1
The model is evaluated every --eval-interval
iteration and saved every --save-interval
iteration.
- Template for multi-stream fusion:
python ensemble.py
--dataset <dataset to ensemble, e.g. ntu120/xsub>
--joint-dir <work_dir of your test command for joint model>
--bone-dir <work_dir of your test command for bone model>
Details are to be released.
-
Use the corresponding config files from
./config
to train/test different datasets -
Resume training from checkpoint
python3 main.py
... # Same params as before
--start-epoch <0 indexed epoch>
--weights <weights in work_dir>
--checkpoint <checkpoint in work_dir>
- Default hyper-parameters are stored in the config files; you can tune them & add extra training techniques to boost performance
- ...
This repo is based on
Thanks to the original authors for their work!
Please cite this work if you find it useful:
@inproceedings{chen2021dualhead,
title = {Learning Multi-Granular Spatio-Temporal Graph Network for Skeleton-Based Action Recognition},
author = {Chen, Tailin and Zhou, Desen and Wang, Jian and Wang, Shidong and Guan, Yu and He, Xuming and Ding, Errui},
booktitle = {Proceedings of the 29th ACM International Conference on Multimedia},
pages = {4334–4342},
year = {2021},
}