Remove exllamav1 loaders #5128

oobabooga · 2023-12-31T04:32:16Z

ExLlamav1 hasn't received a commit in 3 months and does not support Mixtral.

The downsides of ExLlamav2 relative to v1 are slightly higher VRAM usage and slightly higher perplexity for the same GPTQ model:

	v1	v2	v2 cache 8bit
VRAM	11295MiB	11653MiB	10133MiB
3200 tokens (prompt processing, seconds)	1,7	1,5	1,51
512 tokens (generation, seconds)	13,25	10,04	10,83
ppl	5,57350826	5,57457876	5,57457876

The perplexity difference is not significant and the VRAM usage can be reduced with --cache_8bit. So I see no point in keeping ExLlamav1.

Ph0rk0z · 2024-01-02T12:14:15Z

How does it compare for not having an ampere card though? Also for not using flash attention. I'm not using it often either but I'm also not really using quip/awq or HQQ at all if we're going by that.

I think that exllama 1 was also compatible with the old flash attention that ran on cards below ampere but I only have pascal and ampere so I can't really confirm. During the holidays nobody who used it is probably going to notice to complain.

ZanMax · 2024-01-07T22:30:10Z

Try to migrate from exllamav1 to exllamav2 with my AMD Instinct cards and have garbage output.
exllamav1 instead works perfectly.

kelvincht · 2024-01-08T09:51:44Z

My GPU perform much better on exllamav1 on 13b models.

Disabling cache 8 bit won't fix the performance issue

Performance is about 10x slower on exllamav2 compare to v1

Please bring back exllamav1 and exllamav1_hf

jianmomo · 2024-01-11T11:30:39Z

我体验了exllamav2，但感觉他并不是那么完美，速度确实很快，但相同参数回复短了很多

DmitryVN · 2024-01-18T15:34:00Z

Please bring back exllamav1 and exllamav1_hf! This allows you to load the 10.7B models completely, while exllamav2 gives you an out of memory for 8gb GPU.

koplenov · 2024-02-26T14:12:16Z

exllama2 sucks in some cases

and to get the previous one back, you have to downgrade the version

thanks for the new spokes in the wheels 🥰

koplenov · 2024-02-26T14:16:36Z

@oobabooga pls revert

oobabooga · 2024-02-26T14:50:45Z

If you have a performance problem with exllamav2 that was not present exllamav1, you should open an issue in the exllamav2 repository.

koplenov · 2024-02-26T14:55:38Z

in my case it's not about performance

exllama and exllama2 have different results

// we run 1mln rows daily and this is critical/noticeable for us

koplenov · 2024-02-26T18:43:19Z

damn, you can't roll back to the previous version :?

it just won't start :/
dependency hell, versions not specified, compilation errors

try to roll back to that commit and start it from scratch yourself - you'll understand why there is such a return request

Ph0rk0z · 2024-02-28T16:45:54Z

To be fair, it reverted for me fine. Need to check how well it works.

oobabooga added 6 commits December 30, 2023 20:21

Remove ExLlama/ExLlama_HF loaders (the v1 versions)

8e9dec7

Update docs

62ffeb0

Reorder loaders, update docs

208336b

Update one-click installer

dac3bf4

Update README

719e7ba

Remove obsolete code

f1e0dd0

oobabooga merged commit 0e54a09 into dev Dec 31, 2023

oobabooga deleted the remove-exllamav1 branch December 31, 2023 05:08

Trojaner mentioned this pull request Dec 31, 2023

Suggestion: remove ctransformers as well #5137

Closed

PoetOnTheRun pushed a commit to PoetOnTheRun/text-generation-webui that referenced this pull request Feb 22, 2024

Remove exllamav1 loaders (oobabooga#5128)

9ea3a23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove exllamav1 loaders #5128

Remove exllamav1 loaders #5128

oobabooga commented Dec 31, 2023 •

edited

Loading

Ph0rk0z commented Jan 2, 2024

ZanMax commented Jan 7, 2024

kelvincht commented Jan 8, 2024

jianmomo commented Jan 11, 2024

DmitryVN commented Jan 18, 2024 •

edited

Loading

koplenov commented Feb 26, 2024

koplenov commented Feb 26, 2024

oobabooga commented Feb 26, 2024

koplenov commented Feb 26, 2024

koplenov commented Feb 26, 2024

Ph0rk0z commented Feb 28, 2024

Remove exllamav1 loaders #5128

Remove exllamav1 loaders #5128

Conversation

oobabooga commented Dec 31, 2023 • edited Loading

Ph0rk0z commented Jan 2, 2024

ZanMax commented Jan 7, 2024

kelvincht commented Jan 8, 2024

jianmomo commented Jan 11, 2024

DmitryVN commented Jan 18, 2024 • edited Loading

koplenov commented Feb 26, 2024

koplenov commented Feb 26, 2024

oobabooga commented Feb 26, 2024

koplenov commented Feb 26, 2024

koplenov commented Feb 26, 2024

Ph0rk0z commented Feb 28, 2024

oobabooga commented Dec 31, 2023 •

edited

Loading

DmitryVN commented Jan 18, 2024 •

edited

Loading