This repository contains code for our baselines, namely HCRN, ClipBERT, and All-in-one, which is migrated from their original implementations to fit our data structure.
see dataset for details.
see the HCRN, ClipBERT, and All-in-one folders for details.
The above baseline models are trained on the train
set and evaluated on the val
,test-dev
and test
sets, respectively.
model | val set | test-tiny | test-dev | test | weights |
---|---|---|---|---|---|
HCRN | 41.69 | 41.57 | 41.18 | 41.13 | ckpt |
ClipBERT | 44.34 | 43.90 | 44.00 | 43.91 | ckpt |
All-in-one | 45.44 | 44.27 | 44.57 | 44.53 | ckpt |
This project is licensed under the Apache License 2.0.
If you use ANetQA in your research, we appreciate it if you cite our paper in the following.
@inproceedings{yu2023anetqa,
title={ANetQA: A Large-scale Benchmark for Fine-grained Compositional Reasoning over Untrimmed Videos},
author={Yu, Zhou and Zheng, Lixiang and Zhao, Zhou and Wu, Fei and Fan, Jianping and Ren, Kui and Yu, Jun},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
year={2023}
}