is it too much of me to ask for an MPI option like llama.cpp? #286

hiqsociety · 2023-09-13T23:15:48Z

i was always looking for the optimum (cheapest) way to run the large models.
kind of tired of going for the extremes. (coz i will need to "upgrade" and that means my other devices are "obsolete")
however, is an MPI option in the roadmap? would really hope to see it happen.

Thx in advance for the great work by the way.

turboderp · 2023-09-14T05:09:00Z

This may or may not be a stupid question, but what is MPI?

hiqsociety · 2023-09-14T07:02:45Z

@turboderp it's a stupid question if u try this on raspberry pi cluster like this:
ggerganov/llama.cpp#2164

basically mpi is enabling clustering for llama models

but it's a serious question for people like me who will be using this a lot!
do you have an explanation why u call this project exllama when it's obviously the fastest way to run llama? i was always looking at llama.cpp until i focused on speed and come across exllama. i'm asking so because ex sounds obsolete. u should change the name to prollama or eLlama or LlamaX. (when i saw Ex... i thought it was obsolete and not to be taken seriously and i skipped it previously until i was researching and exllama keeps popping up with the impressive token/s results. normally github project with anything "ex" means they are not going to maintain it anymore, abandon projects. and seriously, it sounds things done before llama was around. pre-llama)
by the way, MPI "must" be the way to go coz... who doesnt want to run falcon 180b? i guess it will always go bigger. maybe 960b by next year? pls help us not spend fortunes on cloud...

again, just my thoughts.

turboderp · 2023-09-14T07:48:31Z

I don't know, ExLlama is really focused on consumer GPUs. This would be asking for a complete rewrite so it can run on clusters of embedded devices instead. And it basically boils down to "can this project be llama.cpp instead?" So, I don't really think this is realistic.

As for the name, I didn't really give it much thought. Doesn't have those connotations to me, is all I can say I guess. Think of it as "extra" maybe?

And it's not categorically the fastest way to run Llama, either. It really depends on the use case.

hiqsociety · 2023-09-14T08:11:41Z

@turboderp

basically running solely on gpu vram is fine but ability to distribute task as "clusterized/sharded" form (on consumer gpu etc).
the speed improvement is kind of evident for long running processes. i just saw v2 and extremely impress with the direction and performance optimized! do pls consider mpi (gpu only) as part of the roadmap.
what's the fastest way to run llama for consumer then? i thought most benchmark say exllama is the way to go.

p.s. : v2 is impressive. u guys are doing great.

turboderp · 2023-09-14T09:00:50Z

what's the fastest way to run llama for consumer then? i thought most benchmark say exllama is the way to go.

Benchmarks tend to become outdated very quickly. When ExLlama first came out there was no CUDA support at all in llama.cpp at all, for instance, AutoGPTQ didn't exist and GPTQ-for-Llama was still using essentially the same kernel written for the original GPTQ paper. Since then, llama.cpp has had a huge amount of work put into it, AutoGPTQ has included the ExLlama (v1) kernel, and there's also AWQ, vLLM... something called OmniQuant..?

ExLlama is definitely a fast option, and depending on what you need to do, what your hardware setup is, etc., it may be the fastest in your case. If you want to run an inference server for an online chat service, probably you should look at TGI or vLLM or something. If you want to run on Apple Silicon, llama.cpp is (I think?) the only way to go. If you have an older NVIDIA GPU (Pascal or earlier), AutoGPTQ is probably still the best option. So it all depends.

hiqsociety closed this as completed Sep 14, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

is it too much of me to ask for an MPI option like llama.cpp? #286

is it too much of me to ask for an MPI option like llama.cpp? #286

hiqsociety commented Sep 13, 2023

turboderp commented Sep 14, 2023

hiqsociety commented Sep 14, 2023 •

edited

Loading

turboderp commented Sep 14, 2023

hiqsociety commented Sep 14, 2023 •

edited

Loading

turboderp commented Sep 14, 2023

is it too much of me to ask for an MPI option like llama.cpp? #286

is it too much of me to ask for an MPI option like llama.cpp? #286

Comments

hiqsociety commented Sep 13, 2023

turboderp commented Sep 14, 2023

hiqsociety commented Sep 14, 2023 • edited Loading

turboderp commented Sep 14, 2023

hiqsociety commented Sep 14, 2023 • edited Loading

turboderp commented Sep 14, 2023

hiqsociety commented Sep 14, 2023 •

edited

Loading

hiqsociety commented Sep 14, 2023 •

edited

Loading