# Upgradable Multimodal Intelligence
Code for CVPR 23 [Towards Fast Adaptation of Pretrained Contrastive Models for Multi-channel Video-Language Retrieval](https://arxiv.org/abs/2206.02082)

This sample implmentation is largely based on the codebase of [Just Ask](https://antoyang.github.io/just-ask.html). So please refer to it for setting up the environment. We will release the full codebase soon after CVPR.

## After setting up, using the files under How2QA can obtain results in Table 1 for the Text+Text variant using the following command

python main_videoqa.py --checkpoint_dir=LOCATION_OF_EXPERIMENT --dataset=how2qa --lr=0.00005 --mlm_prob 0. --qmax_words 120 --baseline to  --lm all-mpnet-base-v2    --epochs 30