Skip to content

Pytorch version of DeCEMBERT: Learning from Noisy Instructional Videos via Dense Captions and Entropy Minimization (NAACL 2021)

Notifications You must be signed in to change notification settings

zinengtang/DeCEMBERT

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

22 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DECEMBERT: Learning from Noisy Instructional Videos via Dense Captions and Entropy Minimization

Implementation of NAACL2021 paper: DECEMBERT: Learning from Noisy Instructional Videos via Dense Captions and Entropy Minimization by *Zineng Tang, *Jie Lei, Mohit Bansal

Setup

# Create python environment (optional)
conda create -n decembert python=3.7

# Install python dependencies
pip install -r requirements.txt

To speed up the training, mixed precision is recommended.

git clone https://github.com/NVIDIA/apex
cd apex
pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./

Running

Running pre-training command

bash scripts/pretrain.sh 0,1,2,3

Video Features Extraction Code

The feature extraction scripts is provided in the feature_extractor folder.

We extract our 2D-level video features with ResNet152 Github Link: torchvision

We extract our 3D-level video features with 3D-ResNext Github Link: 3D-RexNext

Dense Captions Extraction Code

Following the implementation of dense captioning aided pre-training, we pre-extract dense captions with the following code.

Original Github Link: Dense Captioning with Joint Inference and Visual Context (pytorch reproduced)

Important todos are to change the framerate sampling in code implementation according to dfferent video types.

Dataset Links

Pre-training Dataset

Howto100m

Downstream Dataset

MSRVTT

MSRVTT-QA

Youcook2

(TODO: add downstream tasks)

Reference

@inproceedings{tang2021decembert,
  title={DeCEMBERT: Learning from Noisy Instructional Videos via Dense Captions and Entropy Minimization},
  author={Tang, Zineng and Lei, Jie and Bansal, Mohit},
  booktitle={Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies},
  pages={2415--2426},
  year={2021}
}

Acknowledgement

Part of the code is built based on huggingface transformers and facebook faiss and TVCaption.

About

Pytorch version of DeCEMBERT: Learning from Noisy Instructional Videos via Dense Captions and Entropy Minimization (NAACL 2021)

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published