This repository contains the code necessary to replicate the experiments outlined in the paper "SWL-LSE: A Dataset of Spanish Sign Language Health Signs with an ISLR Baseline Method."
The repository is divided into two main sections, each located in different folders:
- Mediapipe_keypoints: Contains all the scripts and models used to extract feature vectors from videos and build the datasets for training and evaluation.
- Msg3d: Contains the original MSG3D model present in Github with some modifications.
We employ three datasets:
- SWL-SLE: A newly created dataset available at Zenodo.
- WLASL: Available at GitHub.
- ASL-Citizen: Available at Microsoft Research.
This process involves multiple steps in which the Mediapipe model is applied, data is normalized, and the feature vectors are generated to construct the final datasets. Below is the pipeline:
parser.add_argument('--folder_input_videos', required=True, type=str)
parser.add_argument('--pose_hands', action='store_true')
parser.add_argument('--holistic', action='store_true')
parser.add_argument('--holistic_legacy', action='store_true')
parser.add_argument('--folder_output_mediapipe', required=True, type=str)
parser.add_argument('--pose_hands', action='store_true')
parser.add_argument('--holistic', action='store_true')
parser.add_argument('--holistic_legacy', action='store_true')
parser.add_argument('--folder_input_mediapipe', default='', type=str)
parser.add_argument('--folder_output_kps', required=True, type=str)
parser.add_argument('--world', action='store_true')
The default keypoints array consists of 61 keypoints:
- Pose Keypoints: 19
- Hands (Left & Right): 21 each
parser.add_argument('--folder_in_kps', required=True, type=str)
parser.add_argument('--folder_out_features', required=True, type=str)
parser.add_argument('--type_kps', required=False, default='C4_xyzc', type=str)
parser.add_argument('--offset', action='store_true')
parser.add_argument('--normalize', action='store_true')
parser.add_argument('--noFramesLimit', action='store_true')
parser.add_argument('--jump_reset', action='store_false')
parser.add_argument('--folder_npy', required=True, type=str)
parser.add_argument('--folder_labels', required=True, type=str)
parser.add_argument('--folder_out', required=True, type=str)
A step-by-step guide to generate a dataset using the Holistic model, producing both normalized and non-normalized datasets.
- Folder:
ANNOTATIONS
with files:train_labels.csv
,val_labels.csv
,test_labels.csv
- Folder:
VIDEOS
with.mp4
files
PATH_VIDEOS=BBDD_PATH/VIDEOS
PATH_MEDIAPIPE=BBDD_PATH/MEDIAPIPE
PATH_KEYPOINTS_HL=BBDD_PATH/KEYPOINTS/HL
PATH_FEATURES_HL_NORM=BBDD_PATH/FEATURES/NORM/HL
PATH_FEATURES_HL_NO_NORM=BBDD_PATH/FEATURES/NO_NORM/HL
PATH_DATASET_HL_NORM=BBDD_PATH/DATASET/NORM/HL
PATH_DATASET_HL_NO_NORM=BBDD_PATH/DATASET/NO_NORM/HL
PATH_ANNOTATIONS=BBDD_PATH/ANNOTATIONS
python generate_mediapipe.py --pose_hands --holistic_legacy --folder_input_videos $PATH_VIDEOS --folder_output_mediapipe $PATH_MEDIAPIPE
python generate_arr_keypoints.py --holistic_legacy --folder_input_mediapipe $PATH_MEDIAPIPE --folder_output_kps $PATH_KEYPOINTS_HL
Step 3.1: Extract Features (with normalization and angle calculation using z-dimension or discarding z-dimension)
python generate_features.py --type_kps C4_xyzc --offset --normalize --folder_in_kps $PATH_KEYPOINTS_HL --folder_out_features $PATH_FEATURES_HL_NORM
python generate_features.py --type_kps C3_xyc --offset --normalize --folder_in_kps $PATH_KEYPOINTS_HL --folder_out_features $PATH_FEATURES_HL_NORM
python generate_dataset.py --folder_npy $PATH_FEATURES_HL_NORM --folder_labels $PATH_ANNOTATIONS --folder_out $PATH_DATASET_HL_NORM
BBDD_PATH
├── ANNOTATIONS
│ ├── train_labels.csv
│ ├── val_labels.csv
│ └── test_labels.csv
├── VIDEOS
│ └── *.mp4
├── MEDIAPIPE
│ └── *.pkl
├── KEYPOINTS
│ └── *.npy
├── FEATURES
│ └── *.npy
└── DATASET
├── *.npy
└── *.pkl
parser.add_argument('--work-dir', type=str, required=True, help='the work folder for storing results')
parser.add_argument('--dataset', type=str, required=True, help='Dataset used')
parser.add_argument('--stream', type=str, required=True, help='Stream used')
parser.add_argument('--num-classes', type=int, required=True, help='Stream used')
parser.add_argument('--config', default='/home/bdd/LSE_Lex40_uvigo/dataconfig/nturgbd-cross-view/test_bone.yaml', help='path to the configuration file')
parser.add_argument('--phase', default='train', help='must be train or test')
parser.add_argument('--seed', type=int, default=random.randrange(200), help='random seed')
parser.add_argument('--weights', default=None, help='the weights for network initialization')
parser.add_argument('--ignore-weights', type=str, default=[], nargs='+', help='the name of weights which will be ignored in the initialization')
parser.add_argument('--base-lr', type=float, default=0.01, help='initial learning rate')
parser.add_argument('--step', type=int, default=[20, 40, 60], nargs='+', help='the epoch where optimizer reduce the learning rate')
parser.add_argument('--device', type=int, default=0, nargs='+', help='the indexes of GPUs for training or testing')
parser.add_argument('--optimizer', default='SGD', help='type of optimizer')
parser.add_argument('--nesterov', type=str2bool, default=False, help='use nesterov or not')
parser.add_argument('--batch-size', type=int, default=32, help='training batch size')
parser.add_argument('--test-batch-size', type=int, default=256, help='test batch size')
parser.add_argument('--forward-batch-size', type=int, default=16, help='Batch size during forward pass, must be factor of --batch-size')
parser.add_argument('--num-epoch', type=int, default=80, help='stop training in which epoch')
parser.add_argument('--weight-decay', type=float, default=0.0005, help='weight decay for optimizer')
parser.add_argument('--use-tta', action='store_true', help='Activate tta - deactivated use only first element in the config file')
parser.add_argument('--tta', default=[[False, 1]], help='Config tta')
parser.add_argument('--lr-scheduler', default='MultiStepLR', help='type of LR scheduler')
parser.add_argument('--gamma', type=float, default=0.1, help='Gamma parameter MultiStepLR')
parser.add_argument('--factor', type=float, default=0.1, help='Factor parameter ReduceLROnPlateau')
parser.add_argument('--patience', type=int, default=10, help='Patience parameter ReduceLROnPlateau')
parser.add_argument('--cooldown', type=int, default=0, help='Cooldown parameter ReduceLROnPlateau')
parser.add_argument('--tmax', type=int, default=0, help='tmax parameter CosineAnnealingLR')
parser.add_argument('--eta-min', type=float, default=0.0001, help='eta_min parameter CosineAnnealingLR')
parser.add_argument('--epoch-warn', type=int, default=0, help='Epoch without scheduler steps')
parser.add_argument('--early-stopping', type=int, default=0, help='stop training if not improve in X epochs')
parser.add_argument('--use-train-normalization', type=str, default=None, help='Use normalized data and provide the folder where this data is located')
STREAM=joints_C4_xyzc
DATASET=/path/to/ASL_Citizen/DATASET/NORM/HP
DEVICE=5
NUM_CLASSES=300
SEED=42
ESTUDIO=E0
CONFIG=config/TRAIN_CUSTOM/train.yaml
EXPERIMENT=TRAIN_ASL_CITIZEN_HP/IMAGE_05/$ESTUDIO/$SEED/$STREAM-T1
nohup python main.py --work-dir work_dir/$EXPERIMENT --config $CONFIG --dataset $DATASET --stream $STREAM --num-classes $NUM_CLASSES --device $DEVICE --batch-size 32 --forward-batch-size 32 --test-batch-size 32 --nesterov true --weight-decay 0.0005 --base-lr 0.1 --seed $SEED --use-deterministic --num-worker 50 --early-stopping 30 --step 250 --num-epoch 250 --optimizer 'SGD' --lr-scheduler ReduceLROnPlateau --factor 0.5 --patience 10 --cooldown 0 &
STREAM=joints_C4_xyzc
DATASET=/path/to/ASL_Citizen/DATASET/NORM/HP
DEVICE=5
NUM_CLASSES=300
SEED=42
ESTUDIO=E0
EXPERIMENT=TRAIN_ASL_CITIZEN_HP/IMAGE_05/$ESTUDIO/$SEED/$STREAM-T1
WEIGHT=work_dir/TRAIN_ASL_CITIZEN_HP/IMAGE_05/E0/42/joints_C3_xyc-T11/weights/weights-110.pt
CONFIG=config/TRAIN_CUSTOM/val.yaml
python main_GTM.py --work-dir eval/$EXPERIMENT --config $CONFIG --weights $WEIGHT --device $DEVICE --test-batch-size 50 --seed $SEED --stream $STREAM --dataset $DATASET --num-classes $NUM_CLASSES
STREAM=joints_C4_xyzc
DATASET=/path/to/ASL_Citizen/DATASET/NORM/HP
DEVICE=5
NUM_CLASSES=300
SEED=42
ESTUDIO=E0
EXPERIMENT=TRAIN_ASL_CITIZEN_HP/IMAGE_05/$ESTUDIO/$SEED/$STREAM-T1
WEIGHT=work_dir/TRAIN_ASL_CITIZEN_HP/IMAGE_05/E0/42/joints_C3_xyc-T11/weights/weights-110.pt
CONFIG=config/TRAIN_CUSTOM/test.yaml
python main_GTM.py --work-dir eval/$EXPERIMENT --config $CONFIG --weights $WEIGHT --device $DEVICE --test-batch-size 50 --seed $SEED --stream $STREAM --dataset $DATASET --num-classes $NUM_CLASSES
This repo is based on
-
[MS-G3D] (https://github.com/kenziyuliu/MS-G3D)
@inproceedings{liu2020disentangling, title={Disentangling and Unifying Graph Convolutions for Skeleton-Based Action Recognition}, author={Liu, Ziyu and Zhang, Hongwen and Chen, Zhenghao and Wang, Zhiyong and Ouyang, Wanli}, booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition}, pages={143--152}, year={2020} }