Skip to content

Codebase for "Multimodal Distillation for Egocentric Action Recognition" (ICCV 2023)

Notifications You must be signed in to change notification settings

kkontras/multimodal-distillation

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Multimodal Distillation for Egocentric Action Recognition

This repository contains the implementation of the paper Multimodal Distillation for Egocentric Action Recognition, published at ICCV 2023.

Teaser

Reproducing the virtual environment

The main dependencies that you need to install to reproduce the virtual environment are PyTorch, and:

pip install accelerate tqdm h5py yacs timm einops

Downloading the pre-trained Swin-T model

Create a directory ./data/pretrained-backbones/ and download Swin-T from here:

wget https://github.com/SwinTransformer/storage/releases/download/v1.0.4/swin_tiny_patch244_window877_kinetics400_1k.pth  -O ./data/pretrained-backbones/

Preparing Epic-Kitchens and Something-Something

We store all data (video frames, optical flow frames, audios, etc.) is an efficient HDF5 file where each video represents a dataset within the HDF5 file, and the n-th element of the dataset contains the bytes for the n-th frame of the video. Since these files are large, drop us an email and we can give you access to them.

Ones we send you the datasets, you can place them inside ./data/ - ./data/something-something/ and ./data/EPIC-KITCHENS. Please feel free to store the data wherever you see fit, just do not forget to modify the config.yaml files with the appropriate location. In this README.md, we assume that all data is placed inside ./data/ and all experiments are placed inside ./experiments/.

Inference on Epic-Kitchens

  1. Download our Epic-Kitchens distilled model from here, and place it in ./experiments/.
  2. Run inference as indicated bellow:
python src/inference.py --experiment_path "experiments/epic-kitchens-swint-distill-flow-audio" --opts DATASET_TYPE "video"

Inference on Something-Something & Something-Else

  1. Download our Something-Else distilled model from here or the Something-Something distilled model from here, and place it in ./experiments/.
  2. Run inference as indicated bellow:
python src/inference.py --experiment_path "experiments/something-swint-distill-layout-flow" --opts DATASET_TYPE "video"

for Something-Something, and

python src/inference.py --experiment_path "experiments/something-else-swint-distill-layout-flow" --opts DATASET_TYPE "video"

for Something-Else.

Distilling from Multimodal Teachers

To reproduce the experiments (i.e., using the identical hyperparameters, where only the random seed will vary) you can do:

python src/patient_distill.py --config "experiments/something-else-swint-distill-layout-flow/config.yaml" --opts EXPERIMENT_PATH "experiments/experiments/reproducing-the-something-else-experiment"

note that this assumes access to the datasets for all modalities (video, optical flow, audio, object detections), as well as the individual (unimodal) models which constitute the multimodal ensemble teacher.

Model ZOO for unimodal models

TODOs

  • Release Something-Something pretrained teachers for each modality.
  • Test the codebase.
  • Structure the Model ZOO part of the codebase.

Citation

If you find our code useful for your own research, please use the following BibTeX entry:

@inproceedings{radevski2023multimodal,
  title={Multimodal Distillation for Egocentric Action Recognition},
  author={Radevski, Gorjan and Grujicic, Dusan and Blaschko, Matthew and Moens, Marie-Francine and Tuytelaars, Tinne},
  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
  pages={5213--5224},
  year={2023}
}

About

Codebase for "Multimodal Distillation for Egocentric Action Recognition" (ICCV 2023)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%