Skip to content

Latest commit

 

History

History
65 lines (41 loc) · 3.15 KB

README.md

File metadata and controls

65 lines (41 loc) · 3.15 KB

Weakly Supervised Video Emotion Detection and Prediction via Cross-Modal Temporal Erasing Network [CVPR2023]

Zhicheng Zhang, Lijuan Wang, and Jufeng Yang

PyTorch Conference License

This is the official implementation of our CVPR 2023 paper.

News

  • Adding comments
  • reconstruct code

Publication

Weakly Supervised Video Emotion Detection and Prediction via Cross-Modal Temporal Erasing Network
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2023.
[Paper] [PDF] [Video] [Demo]

Abstract

Automatically predicting the emotions of user-generated videos (UGVs) receives increasing interest recently. However, existing methods mainly focus on a few key visual frames, which may limit their capacity to encode the context that depicts the intended emotions. To tackle that, in this paper, we propose a cross-modal temporal erasing network that locates not only keyframes but also context and audio-related information in a weakly-supervised manner. In specific, we first leverage the intra- and inter-modal relationship among different segments to accurately select keyframes. Then, we iteratively erase keyframes to encourage the model to concentrate on the contexts that include complementary information. Extensive experiments on three challenging benchmark datasets demonstrate that the proposed method performs favorably against the state-of-the-art approaches.

Running

You can easily train and evaluate the model by running the script below.

You can adjust more details such as epoch, batch size, etc. Please refer to opts.py.

$ bash run.sh

The used datasets are provided in Ekman-6, VideoEmotion-8, and CAER.

References

We referenced the repo of VAANet for the code.

Citation

If you find this repo useful in your project or research, please consider citing the relevant publication.

Bibtex Citation

@InProceedings{Zhang_2023_CVPR,
    author    = {Zhang, Zhicheng and Wang, Lijuan and Yang, Jufeng},
    title     = {Weakly Supervised Video Emotion Detection and Prediction via Cross-Modal Temporal Erasing Network},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2023},
    pages     = {18888-18897}
}