Code for CVPR 23 Towards Fast Adaptation of Pretrained Contrastive Models for Multi-channel Video-Language Retrieval
This sample implmentation is largely based on the codebase of Just Ask. So please refer to it for setting up the environment. We will release the full codebase soon after CVPR.
After setting up, using the files under How2QA can obtain results in Table 1 for the Text+Text variant using the following command
python main_videoqa.py --checkpoint_dir=LOCATION_OF_EXPERIMENT --dataset=how2qa --lr=0.00005 --mlm_prob 0. --qmax_words 120 --baseline to --lm all-mpnet-base-v2 --epochs 30