LongHalQA: Long-Context Hallucination Evaluation for MultiModal Large Language Models

Installation

Please refer to instruction to install Lmms-Eval.

If you have already installed Lmms-Eval, you can copy the task repository "./lmms_eval/tasks/longhallqa" to the same location of your project("./lmms_eval/tasks/").

Run Evaluation

# Running Evaluation for specific MLLM and Task
python3 -m accelerate.commands.launch \
    --num_processes=1 \
    -m lmms_eval \
    --model /MODEL/NAME \
    --model_args pretrained=/PRETRAIN/CHECKPOINTS/PARAMETERS \
    --tasks longhalqa \
    --batch_size 1 \
    --log_samples \
    --log_samples_suffix /SAVE/SUFFIX \
    --output_path ./logs/

# An example code of running LLaVA1.5-7b on Hallucination Completion task on LongHallQA is as follows:
python3 -m accelerate.commands.launch \
    --num_processes=1 \
    -m lmms_eval \
    --model llava \
    --model_args pretrained="liuhaotian/llava-v1.5-7b" \
    --tasks longhalqa \
    --batch_size 1 \
    --log_samples \
    --log_samples_suffix llava_v15_7b_lhqa_completion \
    --output_path ./logs/

The evaluated MLLMs in LonghalQA include:

MLLM	model	model_args (pretrained=)
MiniCPM-V-2	minicpm_v	"openbmb/MiniCPM-V-2"
Qwen2-VL-2B	qwen2_vl	""Qwen/Qwen2-VL-2B-Instruct"
Fuyu	fuyu	"adept/fuyu-8b"
LLaVA-1.5-7b	llava	"liuhaotian/llava-v1.5-7b"
LLaVA-1.5-13b	llava	"liuhaotian/llava-v1.5-13b"
LLaVA-1.6-7b	llava	"liuhaotian/llava-v1.6-mistral-7b,conv_template=mistral_instruct"
Qwen-VL-Chat	qwen_vl_chat	"Qwen/Qwen-VL-Chat"
LLaVA-1.6-34b	llava	"liuhaotian/llava-v1.6-34b,conv_template=mistral_direct"
Qwen2-VL-72B	qwen2_vl	"Qwen/Qwen2-VL-72B-Instruct"

The evaluated sub-items in LongHalQA are:

Hallucinaiton Discrimination	Hallucination Completion
lhqa_discrim_object_binary	lhqa_complete_description
lhqa_discrim_description_binary	lhqa_complete_conversation
lhqa_discrim_conversation_binary
lhqa_discrim_description_choice
lhqa_discrim_conversation_choice

Acknowledgement

We fork and modify lmms-eval to employ LongHalQA. Thanks to this wonderful project.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.github		.github
docs		docs
lmms_eval		lmms_eval
miscs		miscs
tools		tools
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README.md		README.md
example_eval.yaml		example_eval.yaml
llava_repr_requirements.txt		llava_repr_requirements.txt
pyproject.toml		pyproject.toml
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LongHalQA: Long-Context Hallucination Evaluation for MultiModal Large Language Models

Installation

Run Evaluation

The evaluated MLLMs in LonghalQA include:

The evaluated sub-items in LongHalQA are:

Acknowledgement

About

Releases

Packages

Contributors 18

Languages

License

hanqiu-hq/LongHalQA

Folders and files

Latest commit

History

Repository files navigation

LongHalQA: Long-Context Hallucination Evaluation for MultiModal Large Language Models

Installation

Run Evaluation

The evaluated MLLMs in LonghalQA include:

The evaluated sub-items in LongHalQA are:

Acknowledgement

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 18

Languages

Packages