This is the PyTorch implementation of the paper Efficient Multimodal Multitask Model Selector.
We propose an efficient multi-task model selector (EMMS), which transforms diverse label formats, such as categories, texts, and bounding boxes of different downstream tasks into a unified noisy label embedding. Extensive experiments on 5 downstream tasks with 24 datasets show that EMMS is fast, effective, and generic.
Follow the guide below to get started.
- Download downstream datasets to ./data/*.
Extract features of target data using pretrained models and different labels of target data. Image classification tasks and image caption tasks have different pipelines.
-
Image classification with CNN and ViT models:
python forward_feature_CNN.py
python forward_feature_ViT.py
-
Image caption:
python forward_feature_caption.py
Compute transferability scores using EMMS and assess the effectiveness using model feature and F-labels:
-
Image classification:
python evaluate_metric_cls_cpu_CNN.py
python evaluate_metric_cls_cpu_ViT.py
-
Image caption:
python evaluate_metric_caption_cpu.py
For other baselines such as LogME, use the metric parameter to replace.
For any questions, email the new owner at [email protected]