Skip to content
/ UMK Public

Code for ACM MM2024 paper: White-box Multimodal Jailbreaks Against Large Vision-Language Models

Notifications You must be signed in to change notification settings

roywang021/UMK

Repository files navigation

UMK

Code for ACM MM2024 paper: White-box Multimodal Jailbreaks Against Large Vision-Language Models

image The implementation of our multimodal jailbreak code is based on the work of Visual-Adversarial-Examples-Jailbreak-Large-Language-Models . Gratitude is extended to the original authors for their valuable contributions and commitment to open source.

Basic Setup

The fundamental setup tasks (e.g., environment setup and pretrained weights preparation) can be easily accomplished by referring to the guidelines provided in the aforementioned project: Visual-Adversarial-Examples-Jailbreak-Large-Language-Models .

Attack on MiniGPT-4

After injecting toxic semantics into the adversarial image using the VAJM method, use the following multimodal attack strategy to maximize the probability of the model following the malicious instructions:

python minigpt_vlm_attack.py --cfg-path eval_configs/minigpt4_eval.yaml  --gpu-id 0 --n_iters 5000  --alpha 1 --save_dir vlm_unconstrained

Evaluation

We provide the test code for using off-the-shelf adversarial examples on two different datasets:

Evaluation on VAJM test set

python minigpt_test_manual_prompts_vlm.py --cfg-path eval_configs/minigpt4_eval.yaml  --gpu-id 0 --image_path  adversarial_images/bad_vlm_prompt.bmp

Evaluation on Advbench

python minigpt_test_advbench.py --cfg-path eval_configs/minigpt4_eval.yaml  --gpu-id 0 --image_path  adversarial_images/bad_vlm_prompt.bmp

About

Code for ACM MM2024 paper: White-box Multimodal Jailbreaks Against Large Vision-Language Models

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages