Skip to content

hanqiu-hq/LongHalQA

Repository files navigation

LongHalQA: Long-Context Hallucination Evaluation for MultiModal Large Language Models

图片名称

Installation

Please refer to instruction to install Lmms-Eval.

If you have already installed Lmms-Eval, you can copy the task repository "./lmms_eval/tasks/longhallqa" to the same location of your project("./lmms_eval/tasks/").

Run Evaluation

# Running Evaluation for specific MLLM and Task
python3 -m accelerate.commands.launch \
    --num_processes=1 \
    -m lmms_eval \
    --model /MODEL/NAME \
    --model_args pretrained=/PRETRAIN/CHECKPOINTS/PARAMETERS \
    --tasks longhalqa \
    --batch_size 1 \
    --log_samples \
    --log_samples_suffix /SAVE/SUFFIX \
    --output_path ./logs/

# An example code of running LLaVA1.5-7b on Hallucination Completion task on LongHallQA is as follows:
python3 -m accelerate.commands.launch \
    --num_processes=1 \
    -m lmms_eval \
    --model llava \
    --model_args pretrained="liuhaotian/llava-v1.5-7b" \
    --tasks longhalqa \
    --batch_size 1 \
    --log_samples \
    --log_samples_suffix llava_v15_7b_lhqa_completion \
    --output_path ./logs/

The evaluated MLLMs in LonghalQA include:

MLLM model model_args (pretrained=)
MiniCPM-V-2 minicpm_v "openbmb/MiniCPM-V-2"
Qwen2-VL-2B qwen2_vl ""Qwen/Qwen2-VL-2B-Instruct"
Fuyu fuyu "adept/fuyu-8b"
LLaVA-1.5-7b llava "liuhaotian/llava-v1.5-7b"
LLaVA-1.5-13b llava "liuhaotian/llava-v1.5-13b"
LLaVA-1.6-7b llava "liuhaotian/llava-v1.6-mistral-7b,conv_template=mistral_instruct"
Qwen-VL-Chat qwen_vl_chat "Qwen/Qwen-VL-Chat"
LLaVA-1.6-34b llava "liuhaotian/llava-v1.6-34b,conv_template=mistral_direct"
Qwen2-VL-72B qwen2_vl "Qwen/Qwen2-VL-72B-Instruct"

The evaluated sub-items in LongHalQA are:

Hallucinaiton Discrimination Hallucination Completion
lhqa_discrim_object_binary lhqa_complete_description
lhqa_discrim_description_binary lhqa_complete_conversation
lhqa_discrim_conversation_binary
lhqa_discrim_description_choice
lhqa_discrim_conversation_choice

Acknowledgement

We fork and modify lmms-eval to employ LongHalQA. Thanks to this wonderful project.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published