UMK

Code for ACM MM2024 paper: White-box Multimodal Jailbreaks Against Large Vision-Language Models

The implementation of our multimodal jailbreak code is based on the work of Visual-Adversarial-Examples-Jailbreak-Large-Language-Models . Gratitude is extended to the original authors for their valuable contributions and commitment to open source.

Basic Setup

The fundamental setup tasks (e.g., environment setup and pretrained weights preparation) can be easily accomplished by referring to the guidelines provided in the aforementioned project: Visual-Adversarial-Examples-Jailbreak-Large-Language-Models .

Attack on MiniGPT-4

After injecting toxic semantics into the adversarial image using the VAJM method, use the following multimodal attack strategy to maximize the probability of the model following the malicious instructions:

python minigpt_vlm_attack.py --cfg-path eval_configs/minigpt4_eval.yaml  --gpu-id 0 --n_iters 5000  --alpha 1 --save_dir vlm_unconstrained

Evaluation

We provide the test code for using off-the-shelf adversarial examples on two different datasets:

Evaluation on VAJM test set

python minigpt_test_manual_prompts_vlm.py --cfg-path eval_configs/minigpt4_eval.yaml  --gpu-id 0 --image_path  adversarial_images/bad_vlm_prompt.bmp

Evaluation on Advbench

python minigpt_test_advbench.py --cfg-path eval_configs/minigpt4_eval.yaml  --gpu-id 0 --image_path  adversarial_images/bad_vlm_prompt.bmp

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
.idea		.idea
adversarial_images		adversarial_images
eval_configs		eval_configs
harmful_corpus		harmful_corpus
minigpt4		minigpt4
minigpt_utils		minigpt_utils
minimal_gcg		minimal_gcg
README.md		README.md
environment.yml		environment.yml
minigpt_test_advbench.py		minigpt_test_advbench.py
minigpt_test_manual_prompts_vlm.py		minigpt_test_manual_prompts_vlm.py
minigpt_vlm_attack.py		minigpt_vlm_attack.py
model.png		model.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

UMK

Basic Setup

Attack on MiniGPT-4

Evaluation

Evaluation on VAJM test set

Evaluation on Advbench

About

Releases

Packages

Languages

roywang021/UMK

Folders and files

Latest commit

History

Repository files navigation

UMK

Basic Setup

Attack on MiniGPT-4

Evaluation

Evaluation on VAJM test set

Evaluation on Advbench

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages