To Preserve or To Compress: An In-Depth Study of Connector Selection in Multimodal Large Language Models

Comparison of radar chart performance at 224, 336, and 448 resolutions across coarse-grained perception, fine-grained perception, and reasoning tasks on MMBench. Each task includes four sub-tasks: Image Quality, Image Scene, Image Style, and Image Topic for coarse-grained perception; Action Recognition, Celebrity Recognition, Object Localization, and OCR for fine-grained perception; and Function Reasoning, Identity Reasoning, Social Relation, and Structuralized Image-Text Understanding for reasoning tasks.

Evaluations

We use LLaVA eval scripts and lmms-eval to evaluate the benchmark performance for MLLMs. And collect the results then get fine-grained perception, coarse-grained perception and reasoning tasks scores respectively.

Preliminary

We identified there are a little differences between llava scripts and lmms-eval in the evaluation process. Our evaluation process uses:

LLaVA Official Scripts: For TextVQA, MMBench, MME, and VQAv2
LMMS-eval: For SeedBench, RefCOCO, RefCOCO+, RefCOCOg, GQA, ScienceQA, POPE, and VizWiz VQA

1. Install the `lmms-eval` Environment

cd <your_project_path>
git clone https://github.com/EvolvingLMMs-Lab/lmms-eval
cd lmms-eval
pip install -e .

And then modifiy your dataset_path in lmms-eval/lmms_eval/tasks

e.g.dataset_path in lmms-eval/lmms_eval/tasks/pope/pope.yaml. other benchmarks same as it.

./eval_pipeline.sh <your_model_path> <dataset_name>
# e.g.
./eval_pipeline.sh <your_model_path> mme

2.Evaluate Pipeline

LLaVA official Evaluations scripts For mme,mmbench,textvqa,vqav2 we follow the evaluate pipeline with haotian-liu/LLaVA.To do this here, we just simply merge scripts from haotian-liu/LLaVA for conveniency. Note: mmbench results need submit to official website to get the final reuslts

./eval_pipeline.sh <your_model_path> mme

lmms_eval for SeedBench,Refcoco,Refcoco+,Refcocog,gqa,scienceqa,vizwiz_vqa, we follow the evaluate pipeline with lmms-eval

python3 -m accelerate.commands.launch \
    --num_processes=8 \
    -m lmms_eval \
    --model llava \
    --model_args pretrained="<your_model_path>" \
    --tasks seedbench,refcoco,refcoco+,refcocog,gqa,sicenceqa,vizwiz_vqa \
    --batch_size 1 \
    --log_samples \
    --log_samples_suffix llava_v1.5 \
    --output_path ./logs/

Example

We focus on evaluating three tasks: coarse perception, fine-grained perception, and reasoning. These tasks are assessed through subtasks in MME, MMBench, and SeedBench benchmarks.

For MME benchmark,run the following command and then get the subtasks results of mme:

./eval_pipeline.sh <your_model_path> mme

For MMBench, run the following command and then get the subtasks results of mmbench:

./eval_pipeline.sh <your_model_path> mmbench

Your should notice that the mmbench results need to be submitted to the official website to get the final results.

For SeedBench benchmark,run the following command and then get the subtasks results of seedbench:

python3 -m accelerate.commands.launch \
    --num_processes=8 \
    -m lmms_eval \
    --model llava \
    --model_args pretrained="<your_model_path>" \
    --tasks seedbench \
    --batch_size 1 \
    --log_samples \
    --log_samples_suffix llava_v1.5_mme \
    --output_path ./logs/

And then you can get the subtasks results of seedbench in the ./logs/ directory. And then run the following command to get the final results:

python3 seedbench_subtasks_analysis.py --json_path EVAL/eval_results/<to_your_results_file>

After obtaining the subtask results from MME, MMBench, and SeedBench, you can calculate the final results for coarse perception, fine-grained perception, and reasoning capabilities according to the reclassification protocol detailed in our paper.

For evaluating the remaining benchmarks (RefCOCO, RefCOCO+, RefCOCOg, GQA, ScienceQA, VizWiz), please follow the standard evaluation procedures using the LLaVA evaluation scripts and the lmms-eval toolkit. All evaluation results will be stored in the following directory structure:

EVAL/eval_results/ ├── mme/ # MME benchmark results ├── mmbench/ # MMBench benchmark results ├── seedbench/ # SeedBench benchmark results

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
eval_results		eval_results
image		image
LICENSE		LICENSE
README.md		README.md
eval_all.sh		eval_all.sh
eval_pipeline.sh		eval_pipeline.sh
seedbench_subtasks_analysis.py		seedbench_subtasks_analysis.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

To Preserve or To Compress: An In-Depth Study of Connector Selection in Multimodal Large Language Models

Evaluations

Preliminary

1. Install the `lmms-eval` Environment

2.Evaluate Pipeline

Example

About

Releases

Packages

Contributors 2

Languages

License

EIT-NLP/Connector-Selection-for-MLLM

Folders and files

Latest commit

History

Repository files navigation

To Preserve or To Compress: An In-Depth Study of Connector Selection in Multimodal Large Language Models

Evaluations

Preliminary

1. Install the lmms-eval Environment

2.Evaluate Pipeline

Example

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

1. Install the `lmms-eval` Environment

Packages