Skip to content

lycheeefish/VHExpansion

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

VHExpansion

This repository contains code for adversarial training of several multimodal language models for the paper Automatically Generating Visual Hallucination Test Cases for Multimodal Large Language Models.

Environment Setup

To set up the required environment for each model, use the corresponding YAML file from the envs directory. Each file specifies the dependencies needed for a particular model:

  • envs/env_blip.yml: Environment setup for InstructBLIP model.
  • envs/env_llava.yml: Environment setup for LLaVA model.
  • envs/env_llava_next.yml: Environment setup for LLaVA-Next model.
  • envs/env_llava_ov.yml: Environment setup for LLaVA-OneVision model.
  • envs/env_qwen.yml: Environment setup for Qwen model.

To create a conda environment using any of these files, run the following command (replace <environment_file> with the desired YAML file):

conda env create -f envs/<environment_file>

For example, to create an environment for the LLaVA model:

conda env create -f envs/env_llava.yml

Datasets Preparations

To prepare the required datasets, download them from the following sources:

MLLMs Preparations

To prepare each of the multimodal large language models (MLLMs), please refer to the respective repositories and follow their instructions:

  • InstructBLIP

  • LLaVA-1.5

    In addition to the official instructions, make the following modifications:

    1. Add the following line below this line:

      config.mm_vision_tower = "openai/clip-vit-large-patch14"
    2. Comment out the @torch.no_grad() decorator at this line.

  • Qwen-VL-Chat

    In addition to the official instructions, modify the following:

    1. Change the image_size from 448 to 224 at this line.
  • LLaVA-Next

    In addition, add the function defined in this file to the LlavaMetaForCausalLM class here.

  • LLaVA-OneVision

    In addition, add the function defined in this file to the LlavaMetaForCausalLM class here.

Example Usage

To train a model, you can use the provided bash script run_adversarial_training.sh to simplify the process. This script allows you to select the model and set the required parameters for training.

To run the script, simply use:

bash run_adversarial_training.sh

Make sure to modify the script to suit your requirements, such as setting the appropriate input and output directories, number of steps, and training parameters.

  • --use_categories: Set this flag if your images are organized into subdirectories based on categories, such as in the VHTest dataset. If images are not categorized (e.g., POPE or MMVP datasets), you can omit this flag.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published