🌋 INF-LLaVA: Dual-perspective Perception for High-Resolution Multimodal Large Language Model

This repository contains the Pytorch code and model weight of INF-LLaVA, a novel MLLM designed for high-resolution image perception and reasoning.

INF-LLaVA has the following features to process high-resolution images:

Dual-perspective Cropping Module(DCM) : Integrate both global and local perspectives when cropping high-resolution images into subimages. This enhances the model’s ability to capture detailed and contextual information.
Dual-perspective Enhancement Module(DEM) : An effective and efficient module for fusing dual-perspective features, resulting in dual-enhanced features that significantly improve performance.
Strong Performance : INF-LLaVA outperforms existing models on multiple benchmarks, demonstrating the effectiveness of our approach. Check out our model zoo.

News !!

🔥[2024-7-19] Release the ckpt model of INF-LLaVA on Hugging Face.
🔥[2024-7-16] Release the code of INF-LLaVA.

To-Do Lists

Release INF-LLaVA model based on Llama 3.1
Release INF-LLaVA Strong Models.
Release INF-LLaVA training code.

Install

Clone this repository and navigate to INF-LLaVA folder

git clone https://github.com/WeihuangLin/INF-LLaVA.git
cd INF-LLaVA

Install Package

conda create -n inf-llava python=3.10 -y
conda activate inf-llava
pip install --upgrade pip  # enable PEP 660 support
pip install -e .

Install additional packages for training cases

pip install -e ".[train]"
pip install flash-attn --no-build-isolation --no-cache-dir

Train

Pre-train

cd INF-LLaVA
bash INF-LLava_pretrain.sh

Note: You should replace the data_path and image_folder in the INF-LLava_pretrain.sh

Finetune

cd INF-LLaVA
bash INF-LLava_finetune.sh

Note: You should replace the data_path and image_folder in the INF-LLava_finetune.sh

You can download our pretrained weights in Model Zoo

Evaluate

We follow lmm-eval to conduct evaluations. Please refer to lmm-eval for help. We provide the same script to complete the testing.

Model Zoo

Version	Checkpoint
$INF-LLaVA$	🤗WeihuangLin/INF-LLaVA-sft
$INF^*-LLaVA$	🤗WeihuangLin/INF_star-LLaVA-sft

$INF^*-LLaVA$ means using a more diverse dataset for training.

🎫 License

This project is released under the Apache 2.0 license.

🖊️ Citation

If you find this project useful in your research, please consider cite:

@misc{ma2024infllava,
      title={INF-LLaVA: Dual-perspective Perception for High-Resolution Multimodal Large Language Model}, 
      author={Yiwei Ma and Zhibin Wang and Xiaoshuai Sun and Weihuang Lin and Qiang Zhou and Jiayi Ji and Rongrong Ji},
      journal={arXiv preprint arXiv:2407.16198},
      year={2024}
}

🙏 Acknowledgement

We are thankful to LLaVA, lmms-eval and LLama3 for releasing their models and code as open-source contributions.

In case if you face any issues or have any questions, please feel free to create an issue.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
.devcontainer		.devcontainer
.github/ISSUE_TEMPLATE		.github/ISSUE_TEMPLATE
INF_llava		INF_llava
docs		docs
images		images
playground/data/prompts		playground/data/prompts
scripts		scripts
.dockerignore		.dockerignore
.editorconfig		.editorconfig
.gitattributes		.gitattributes
.gitignore		.gitignore
INF-LLava_finetune.sh		INF-LLava_finetune.sh
INF-LLava_pretrain.sh		INF-LLava_pretrain.sh
LICENSE		LICENSE
README.md		README.md
cog.yaml		cog.yaml
predict.py		predict.py
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🌋 INF-LLaVA: Dual-perspective Perception for High-Resolution Multimodal Large Language Model

News !!

To-Do Lists

Table of Contents

Install

Train

Evaluate

Model Zoo

🎫 License

🖊️ Citation

🙏 Acknowledgement

About

Releases

Packages

Languages

License

WeihuangLin/INF-LLaVA

Folders and files

Latest commit

History

Repository files navigation

🌋 INF-LLaVA: Dual-perspective Perception for High-Resolution Multimodal Large Language Model

News !!

To-Do Lists

Table of Contents

Install

Train

Evaluate

Model Zoo

🎫 License

🖊️ Citation

🙏 Acknowledgement

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages