Feature request：support ExLlama #296

alanxmay · 2023-06-28T17:21:31Z

ExLlama (https://github.com/turboderp/exllama)

It's currently the fastest and most memory-efficient executor of models that I'm aware of.

Is there an interest from the maintainers in adding this support?

SinanAkkoyun · 2023-07-22T10:00:03Z

How do you plan on adding batched support for Exllama? I am very interested in your approach as I am trying to work on that too

iibw · 2023-12-21T19:49:07Z

ExLlamaV2 has taken over ExLlama in quantization performance for most cases. I hope we can get it implemented in vLLM because it is also an incredible quantization technique. Benchmarks between all the big quantization techniques indicate ExLlamaV2 is the best out of all of them. Have there been any new developments since it was added to the roadmap?

SinanAkkoyun · 2023-12-24T03:34:02Z

Please, having exllamav2 with paged attention and with continuous batching would be a big win for the LLM world

DaBossCoda · 2023-12-24T06:31:08Z

Also looking forward to exllamav2 support

RuntimeRacer · 2024-01-01T16:15:13Z

I was hoping this would be possible, too. I recently worked with the Mixtral-8x7b Model; AWQ 4-bit had significant OOM / Memory overhead compared to ExLlama2 in 4-Bit; also I ended up just running the model in 8-bit using ExLlama2, since that turned out to be the best compromise between model capabilities and VRAM footprint. I can run it in 8-bit on 3x3090 and use full 32k context with ExLlama2; but I need 4x3090 to be even able to load it in 16-bit within VLLM; and i reach OOM when I try to use full context.

So this would definitely be an amazing addition to have more flexibility in terms of VRAM-Resources.

theobjectivedad · 2024-01-04T04:09:51Z

+1

tolecy · 2024-01-05T04:04:41Z

+1

chricro · 2024-02-23T20:21:26Z

+1

agahEbrahimi · 2024-03-01T05:40:12Z

+1

a-creation · 2024-03-04T02:09:30Z

+1

rjmehta1993 · 2024-04-04T15:40:47Z

This will be the biggest release for vllm to support exllamav2. +1

sapountzis · 2024-04-30T20:03:37Z

+1

kulievvitaly · 2024-06-25T20:26:38Z

+1

…ct#296) SUMMARY: * update `nightly` workflow to actually use the `skip-for-nightly.txt` skip list

github-actions · 2024-10-31T02:01:09Z

This issue has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this issue should remain open. Thank you!

github-actions · 2024-11-30T02:03:33Z

This issue has been automatically closed due to inactivity. Please feel free to reopen if you feel it is still relevant. Thank you!

* Run clang-format on develop * applying the missing formatting

zhuohan123 mentioned this issue Jun 29, 2023

[Roadmap] vLLM Development Roadmap: H2 2023 #244

Closed

76 tasks

hmellor added the feature request label Mar 20, 2024

yukavio pushed a commit to yukavio/vllm that referenced this issue Jul 3, 2024

[Rel Eng] Update Nightly Workflow To Use Proper Skip List (vllm-proje…

ba91187

…ct#296) SUMMARY: * update `nightly` workflow to actually use the `skip-for-nightly.txt` skip list

github-actions bot added the stale label Oct 31, 2024

github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Nov 30, 2024

billishyahao pushed a commit to billishyahao/vllm that referenced this issue Dec 31, 2024

Run clang-format on develop (vllm-project#296)

529cefe

* Run clang-format on develop * applying the missing formatting

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature request：support ExLlama #296

Feature request：support ExLlama #296

alanxmay commented Jun 28, 2023

SinanAkkoyun commented Jul 22, 2023

iibw commented Dec 21, 2023

SinanAkkoyun commented Dec 24, 2023

DaBossCoda commented Dec 24, 2023

RuntimeRacer commented Jan 1, 2024 •

edited

Loading

theobjectivedad commented Jan 4, 2024

tolecy commented Jan 5, 2024

chricro commented Feb 23, 2024

agahEbrahimi commented Mar 1, 2024

a-creation commented Mar 4, 2024

rjmehta1993 commented Apr 4, 2024

sapountzis commented Apr 30, 2024

kulievvitaly commented Jun 25, 2024

github-actions bot commented Oct 31, 2024

github-actions bot commented Nov 30, 2024

Feature request：support ExLlama #296

Feature request：support ExLlama #296

Comments

alanxmay commented Jun 28, 2023

SinanAkkoyun commented Jul 22, 2023

iibw commented Dec 21, 2023

SinanAkkoyun commented Dec 24, 2023

DaBossCoda commented Dec 24, 2023

RuntimeRacer commented Jan 1, 2024 • edited Loading

theobjectivedad commented Jan 4, 2024

tolecy commented Jan 5, 2024

chricro commented Feb 23, 2024

agahEbrahimi commented Mar 1, 2024

a-creation commented Mar 4, 2024

rjmehta1993 commented Apr 4, 2024

sapountzis commented Apr 30, 2024

kulievvitaly commented Jun 25, 2024

github-actions bot commented Oct 31, 2024

github-actions bot commented Nov 30, 2024

RuntimeRacer commented Jan 1, 2024 •

edited

Loading