[Feature]: Set max_pixels using LLM.generate with Qwen2-VL for offline-inference #9545

mearcstapa-gqz · 2024-10-21T06:11:52Z

Your current environment

The output of `python collect_env.py`

PyTorch version: 2.4.0+cu121
Is debug build: False
CUDA used to build PyTorch: 12.1
ROCM used to build PyTorch: N/A

OS: Alibaba Cloud Linux release 3 (OpenAnolis Edition) (x86_64)
GCC version: (GCC) 10.2.1 20200825 (Alibaba 10.2.1-3.8 2.32)
Clang version: Could not collect
CMake version: Could not collect
Libc version: glibc-2.32

Python version: 3.11.8 (main, Feb 26 2024, 21:39:34) [GCC 11.2.0] (64-bit runtime)
Python platform: Linux-5.10.134-17.2.al8.x86_64-x86_64-with-glibc2.32
Is CUDA available: True
CUDA runtime version: Could not collect
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration: GPU 0: NVIDIA A10
Nvidia driver version: 555.42.06
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

CPU:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 32
On-line CPU(s) list: 0-31
Thread(s) per core: 2
Core(s) per socket: 16
Socket(s): 1
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 106
Model name: Intel(R) Xeon(R) Platinum 8369B CPU @ 2.90GHz
Stepping: 6
CPU MHz: 2899.998
BogoMIPS: 5799.99
Hypervisor vendor: KVM
Virtualization type: full
L1d cache: 48K
L1i cache: 32K
L2 cache: 1280K
L3 cache: 49152K
NUMA node0 CPU(s): 0-31
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc cpuid tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch cpuid_fault invpcid_single ssbd ibrs ibpb stibp ibrs_enhanced fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid avx512f avx512dq rdseed adx smap avx512ifma clflushopt clwb avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves wbnoinvd arat avx512vbmi avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg avx512_vpopcntdq rdpid fsrm arch_capabilities

Versions of relevant libraries:
[pip3] numpy==1.26.4
[pip3] nvidia-cublas-cu12==12.1.3.1
[pip3] nvidia-cuda-cupti-cu12==12.1.105
[pip3] nvidia-cuda-nvrtc-cu12==12.1.105
[pip3] nvidia-cuda-runtime-cu11==2022.4.25
[pip3] nvidia-cuda-runtime-cu117==11.7.60
[pip3] nvidia-cuda-runtime-cu12==12.1.105
[pip3] nvidia-cudnn-cu12==9.1.0.70
[pip3] nvidia-cufft-cu12==11.0.2.54
[pip3] nvidia-curand-cu12==10.3.2.106
[pip3] nvidia-cusolver-cu12==11.4.5.107
[pip3] nvidia-cusparse-cu12==12.1.0.106
[pip3] nvidia-ml-py==12.560.30
[pip3] nvidia-nccl-cu12==2.20.5
[pip3] nvidia-nvjitlink-cu12==12.5.40
[pip3] nvidia-nvtx-cu12==12.1.105
[pip3] nvidia-pyindex==1.0.9
[pip3] pyzmq==26.2.0
[pip3] torch==2.4.0
[pip3] torchaudio==2.4.1
[pip3] torchvision==0.19.0
[pip3] transformers==4.46.0.dev0
[pip3] transformers-stream-generator==0.0.5
[pip3] triton==3.0.0
[conda] Could not collect
ROCM Version: Could not collect
Neuron SDK Version: N/A
vLLM Version: N/A (dev)
vLLM Build Flags:
CUDA Archs: Not Set; ROCm: Disabled; Neuron: Disabled
GPU Topology:
GPU0 CPU Affinity NUMA Affinity GPU NUMA ID
GPU0 X 0-31 0 N/A

Legend:

X = Self
SYS = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
PHB = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
PXB = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
PIX = Connection traversing at most a single PCIe bridge
NV# = Connection traversing a bonded set of # NVLinks

How would you like to use vllm

https://docs.vllm.ai/en/latest/models/vlm.html#offline-inference
https://docs.vllm.ai/en/latest/getting_started/examples/offline_inference_vision_language.html

How to set max_pixels when using LLM.generate
I tried to set llm.llm_engine.model_config.hf_image_processor_config['max_pixels'] = max_pixels, but it won't work.

from PIL import Image
from vllm import LLM
import requests

llm = LLM(model="Qwen/Qwen2-VL-2B-Instruct")
# max_pixels = 224 * 224 * 3
# llm.llm_engine.model_config.hf_image_processor_config['max_pixels'] = max_pixels

prompt = (f"<|im_start|>system\nYou're a helpful assistant<|im_end|>\n"
          "<|im_start|>user\n<|vision_start|><|image_pad|><|vision_end|>"
          "{}<|im_end|>\n"
          "<|im_start|>assistant\n")

# Load the image using PIL.Image
image = Image.open(requests.get("https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-VL/assets/demo.jpeg", stream=True).raw)
outputs = llm.generate({
    "prompt": prompt.format("Describe this image."),
    "multi_modal_data": {
        "image": image},
})

sum(1 for token_id in outputs[0].prompt_token_ids if token_id == llm.llm_engine.model_config.hf_text_config.image_token_id)
# 3577 # too much image_token_id

### Before submitting a new issue...

- [X] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the [documentation page](https://docs.vllm.ai/en/latest/), which can answer lots of frequently asked questions.

The text was updated successfully, but these errors were encountered:

DarkLight1337 · 2024-10-21T09:57:14Z

I believe this isn't implemented yet. @alex-jw-brooks do you have time to take this on?

alex-jw-brooks · 2024-10-21T21:25:25Z

Yup, I'll take a look! 😄

SinanAkkoyun · 2024-10-29T15:22:27Z

Hi, does the OAI endpoint also support min/max_pixels?

DarkLight1337 · 2024-10-29T15:26:28Z

Hi, does the OAI endpoint also support min/max_pixels?

You can't set it per request, but you can set it server-side at startup time via the --mm-processor-kwargs CLI argument.

SinanAkkoyun · 2024-10-29T16:08:40Z

@DarkLight1337 Thank you!

SinanAkkoyun · 2024-10-29T16:44:17Z

@DarkLight1337
It works great but now I can't set the mm limit higher than 1 when wanting to dynamically support multiple smaller and larger images (#9805)

mearcstapa-gqz added the usage How to use vllm label Oct 21, 2024

mearcstapa-gqz changed the title ~~[Usage]: How to set max_pixels using LLM.generate with Qwen2-VL?~~ [Usage]: How to set max_pixels using LLM.generate with Qwen2-VL for offline-inference? Oct 21, 2024

DarkLight1337 changed the title ~~[Usage]: How to set max_pixels using LLM.generate with Qwen2-VL for offline-inference?~~ [Feature]: Set max_pixels using LLM.generate with Qwen2-VL for offline-inference Oct 21, 2024

DarkLight1337 added feature request and removed usage How to use vllm labels Oct 21, 2024

alex-jw-brooks mentioned this issue Oct 23, 2024

[Model] Add min_pixels / max_pixels to Qwen2VL as mm_processor_kwargs #9612

Merged

DarkLight1337 closed this as completed in #9612 Oct 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature]: Set max_pixels using LLM.generate with Qwen2-VL for offline-inference #9545

[Feature]: Set max_pixels using LLM.generate with Qwen2-VL for offline-inference #9545

mearcstapa-gqz commented Oct 21, 2024 •

edited

Loading

DarkLight1337 commented Oct 21, 2024

alex-jw-brooks commented Oct 21, 2024

SinanAkkoyun commented Oct 29, 2024

DarkLight1337 commented Oct 29, 2024 •

edited

Loading

SinanAkkoyun commented Oct 29, 2024 •

edited

Loading

SinanAkkoyun commented Oct 29, 2024

[Feature]: Set max_pixels using LLM.generate with Qwen2-VL for offline-inference #9545

[Feature]: Set max_pixels using LLM.generate with Qwen2-VL for offline-inference #9545

Comments

mearcstapa-gqz commented Oct 21, 2024 • edited Loading

Your current environment

How would you like to use vllm

DarkLight1337 commented Oct 21, 2024

alex-jw-brooks commented Oct 21, 2024

SinanAkkoyun commented Oct 29, 2024

DarkLight1337 commented Oct 29, 2024 • edited Loading

SinanAkkoyun commented Oct 29, 2024 • edited Loading

SinanAkkoyun commented Oct 29, 2024

mearcstapa-gqz commented Oct 21, 2024 •

edited

Loading

DarkLight1337 commented Oct 29, 2024 •

edited

Loading

SinanAkkoyun commented Oct 29, 2024 •

edited

Loading