How to config vllm gpu_memory_utilization? #636

zch-cc · 2023-07-18T20:19:28Z

Hi team, I am trying using codegen2.5 7b model on tgi with A100 40GB and it gives me out of memory error because of vllm. I wonder if there is any way I can config gpu_memory_utilization in the code such that the vllm does not reserve too memory beforehand

Narsil · 2023-07-18T21:37:09Z

We're working on some Quality of life to help with that : #630

otherwise, try to look at the error message, it should give you the name of the parameters you can tweak to fix the RAM issue

--max-total-tokens
--max-batch-prefill-tokens
--max-input-length
# and maybe a couple others I usually only tweak those

I'm writing from memory, so just check the error message for the correct names or text-generation-launcher --help

zch-cc · 2023-07-18T22:11:21Z

Hi @Narsil, Thanks for the help. I try to tweek the number even I set --max-batch-prefill-tokens=1, and --max-batch-total-tokens=2 but it is still out of memory. What other things can I do? For the context, when I try not to use flash attention on llamamodel, it can work. When I use flash attention and not vllm, it also works. so there must be sth going on the vllm side.

Narsil · 2023-07-20T06:52:42Z

Can you share a reproducible example ? And the full stacktrace ?

jameswu2014 · 2023-07-20T13:03:26Z

I have the same issue, kv cache warmup casue OOM

Close #649 Close #651 Close #653 Close #636

wuminghui-coder · 2024-07-04T07:32:00Z

Just lower the value of gpu_memory_utilization a bit, or reduce it further. The problem has been resolved.

OlivierDehaene mentioned this issue Jul 20, 2023

fix(server): use mem_get_info to get kv cache size #664

Merged

OlivierDehaene closed this as completed in #664 Jul 20, 2023

OlivierDehaene added a commit that referenced this issue Jul 20, 2023

fix(server): use mem_get_info to get kv cache size (#664)

bf94df3

Close #649 Close #651 Close #653 Close #636

Bald0Wang mentioned this issue Jun 20, 2024

04-GLM-4-9B-Chat vLLM 部署调用显存溢出问题 datawhalechina/self-llm#162

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to config vllm gpu_memory_utilization? #636

How to config vllm gpu_memory_utilization? #636

zch-cc commented Jul 18, 2023

Narsil commented Jul 18, 2023

zch-cc commented Jul 18, 2023 •

edited

Loading

Narsil commented Jul 20, 2023

jameswu2014 commented Jul 20, 2023

wuminghui-coder commented Jul 4, 2024

How to config vllm gpu_memory_utilization? #636

How to config vllm gpu_memory_utilization? #636

Comments

zch-cc commented Jul 18, 2023

Narsil commented Jul 18, 2023

zch-cc commented Jul 18, 2023 • edited Loading

Narsil commented Jul 20, 2023

jameswu2014 commented Jul 20, 2023

wuminghui-coder commented Jul 4, 2024

zch-cc commented Jul 18, 2023 •

edited

Loading