VHExpansion

This repository contains code for adversarial training of several multimodal language models for the paper Automatically Generating Visual Hallucination Test Cases for Multimodal Large Language Models.

Environment Setup

To set up the required environment for each model, use the corresponding YAML file from the envs directory. Each file specifies the dependencies needed for a particular model:

envs/env_blip.yml: Environment setup for InstructBLIP model.
envs/env_llava.yml: Environment setup for LLaVA model.
envs/env_llava_next.yml: Environment setup for LLaVA-Next model.
envs/env_llava_ov.yml: Environment setup for LLaVA-OneVision model.
envs/env_qwen.yml: Environment setup for Qwen model.

To create a conda environment using any of these files, run the following command (replace <environment_file> with the desired YAML file):

conda env create -f envs/<environment_file>

For example, to create an environment for the LLaVA model:

conda env create -f envs/env_llava.yml

Datasets Preparations

To prepare the required datasets, download them from the following sources:

MMVP Dataset
VHTest Dataset
POPE Dataset
- Download the benchmark from the above link.
- Download the required images from here.

MLLMs Preparations

To prepare each of the multimodal large language models (MLLMs), please refer to the respective repositories and follow their instructions:

InstructBLIP
LLaVA-1.5

In addition to the official instructions, make the following modifications:
1. Add the following line below this line:
```
config.mm_vision_tower = "openai/clip-vit-large-patch14"
```
2. Comment out the @torch.no_grad() decorator at this line.
Qwen-VL-Chat

In addition to the official instructions, modify the following:
1. Change the image_size from 448 to 224 at this line.
LLaVA-Next

In addition, add the function defined in this file to the LlavaMetaForCausalLM class here.
LLaVA-OneVision

In addition, add the function defined in this file to the LlavaMetaForCausalLM class here.

Example Usage

To train a model, you can use the provided bash script run_adversarial_training.sh to simplify the process. This script allows you to select the model and set the required parameters for training.

To run the script, simply use:

bash run_adversarial_training.sh

Make sure to modify the script to suit your requirements, such as setting the appropriate input and output directories, number of steps, and training parameters.

--use_categories: Set this flag if your images are organized into subdirectories based on categories, such as in the VHTest dataset. If images are not categorized (e.g., POPE or MMVP datasets), you can omit this flag.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
adversarial_training		adversarial_training
envs		envs
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
run_adversarial_training.sh		run_adversarial_training.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

VHExpansion

Environment Setup

Datasets Preparations

MLLMs Preparations

Example Usage

About

Releases

Packages

Languages

License

lycheeefish/VHExpansion

Folders and files

Latest commit

History

Repository files navigation

VHExpansion

Environment Setup

Datasets Preparations

MLLMs Preparations

Example Usage

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages