Remove non-HF ExLlamaV2 loader #5431

oobabooga · 2024-02-04T04:14:48Z

Since PR #4814, the speed difference between ExLlamav2 and ExLlamav2_HF is zero. So I see no point in keeping the non-HF version, which is redundant and which samples in a way not guaranteed to be consistent with HF transformers sampling.

sgsdxzy · 2024-02-04T05:53:09Z

Won't this cause problems for #5375 ?

Ph0rk0z · 2024-02-04T15:30:35Z

In this case we can't use native sampling of exllamav2 though.

aikitoria · 2024-02-06T05:23:09Z

It's not true that there is zero speed difference. Non-HF loader is around 10% faster for goliath-120b.

aikitoria · 2024-02-06T05:31:13Z

Quick bench using ooba from before this commit and exllamav2 master branch from 5 minutes ago on runpod A100 80GB.
Using the new version here as that reverts the performance degradation that happened in 0.0.12.

HF:

Output generated in 8.76 seconds (14.50 tokens/s, 127 tokens, context 1728, seed 735928511)
Output generated in 9.22 seconds (13.77 tokens/s, 127 tokens, context 1728, seed 83286885)
Output generated in 8.99 seconds (14.13 tokens/s, 127 tokens, context 1728, seed 128023280)
Output generated in 8.78 seconds (14.47 tokens/s, 127 tokens, context 1728, seed 1418661767)

Non-HF:

Output generated in 8.17 seconds (15.67 tokens/s, 128 tokens, context 1728, seed 745431605)
Output generated in 8.18 seconds (15.65 tokens/s, 128 tokens, context 1728, seed 762707583)
Output generated in 8.18 seconds (15.64 tokens/s, 128 tokens, context 1728, seed 996129951)
Output generated in 8.18 seconds (15.64 tokens/s, 128 tokens, context 1728, seed 700382800)

aikitoria · 2024-02-06T05:34:12Z

not guaranteed to be consistent with HF transformers sampling

Why is this important, if the builtin sampling in exllamav2 works fine?

Ph0rk0z · 2024-02-06T11:35:43Z

For some stuff I like HF samplers and for some stuff the native ones. I forgot about the extra 1 t/s. It happens in llama.cpp too, a tiny difference due to overhead from HF. Not to mention seeing the actual top speeds in .cpp It also helps to troubleshoot issues with HF vs the original loader. There are like a million reasons to keep it.

This reverts commit cde000d.

aikitoria · 2024-02-06T23:42:50Z

Thanks for restoring it!

This reverts commit cde000d.

oobabooga added 2 commits February 3, 2024 19:43

Remove non-HF ExLlamaV2 loader

cdd89f4

Handle saved settings

3cf5a6c

oobabooga merged commit cde000d into dev Feb 4, 2024

oobabooga deleted the remove-exllamav2 branch February 4, 2024 04:16

oobabooga added a commit that referenced this pull request Feb 6, 2024

Revert "Remove non-HF ExLlamaV2 loader (#5431)"

2a1063e

This reverts commit cde000d.

PoetOnTheRun pushed a commit to PoetOnTheRun/text-generation-webui that referenced this pull request Feb 22, 2024

Remove non-HF ExLlamaV2 loader (oobabooga#5431)

4dc6434

PoetOnTheRun pushed a commit to PoetOnTheRun/text-generation-webui that referenced this pull request Feb 22, 2024

Revert "Remove non-HF ExLlamaV2 loader (oobabooga#5431)"

4985c6f

This reverts commit cde000d.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove non-HF ExLlamaV2 loader #5431

Remove non-HF ExLlamaV2 loader #5431

oobabooga commented Feb 4, 2024

sgsdxzy commented Feb 4, 2024

Ph0rk0z commented Feb 4, 2024

aikitoria commented Feb 6, 2024

aikitoria commented Feb 6, 2024 •

edited

Loading

aikitoria commented Feb 6, 2024

Ph0rk0z commented Feb 6, 2024 •

edited

Loading

aikitoria commented Feb 6, 2024

Remove non-HF ExLlamaV2 loader #5431

Remove non-HF ExLlamaV2 loader #5431

Conversation

oobabooga commented Feb 4, 2024

sgsdxzy commented Feb 4, 2024

Ph0rk0z commented Feb 4, 2024

aikitoria commented Feb 6, 2024

aikitoria commented Feb 6, 2024 • edited Loading

aikitoria commented Feb 6, 2024

Ph0rk0z commented Feb 6, 2024 • edited Loading

aikitoria commented Feb 6, 2024

aikitoria commented Feb 6, 2024 •

edited

Loading

Ph0rk0z commented Feb 6, 2024 •

edited

Loading