Feature Requests: Use llamafile / OpenAI compatible / API? #18

quantumalchemy · 2024-08-04T14:30:39Z

quantumalchemy
Aug 4, 2024

https://github.com/Mozilla-Ocho/llamafile
faster than vanilla llama.cpp on cpu
** Better just add OpenAI compatible backend
flexibility to use any api

great rag project! - also any plan for an API?

abgulati · 2024-08-05T23:17:59Z

abgulati
Aug 5, 2024
Maintainer

I'm presently engaged with adding support for a new self-developed backend, HF-Waitress

This new backend adds support for HF-Transformer & AWQ-quantized models directly off the hub, while providing on-the-fly quantization via BitsAndBytes, HQQ and Quanto.. It also negates the need to manually download LLMs yourself, simply working off the model name to do the rest. It works OOB with no setup necessary, and provides concurrency and streaming responses all within a single platform-agnostic Python script that can be ported anywhere.

It will soon be the default LLM-loader in LARS! As Ollama is another implementation of llama.cpp, explicit support for it is not planned at this time though I recognize the benefits.

llama.cpp will be retained in LARS as a user-electable alternative to HF-Waitress for GGUF models, primarily due to their advantage of hybrid-inferencing. You'll be able to bring in your own GGUFs same as today.

OpenAI is not planned at this time as LARS remains open-source, local-deployment centric. However, code to make OpenAI work is already in the LARS codebase so if an official engagement necessitates it, I will work on enabling it.

In the meanwhile, community-contributions are absolutely welcome as always for these features!

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature Requests: Use llamafile / OpenAI compatible / API? #18

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment

{{title}}

Select a reply

Feature Requests: Use llamafile / OpenAI compatible / API? #18

quantumalchemy Aug 4, 2024

Replies: 1 comment

abgulati Aug 5, 2024 Maintainer

quantumalchemy
Aug 4, 2024

abgulati
Aug 5, 2024
Maintainer