Roadmap Apr 2023 #784
Replies: 7 comments 9 replies
-
FYI: Outside of LoRa models, Vicuna is also distributed by their creators as a diff between llama-13b and their fine tuned weights: https://huggingface.co/lmsys/vicuna-13b-delta-v0 See README as well:
|
Beta Was this translation helpful? Give feedback.
-
GG->GOAT. |
Beta Was this translation helpful? Give feedback.
-
I read your tweet yesterday @ggerganov. And it suddenly cross my mind, whether image segmentation is also part of your roadmap. Turns out its your high-prio now. Im wondering, since the first day you publish whisper.cpp, that it would definitly be amazing to combine ggml with image inference model. I have a project that's not specifically ML, also i dont have much of experience on ML, but image segmentation is the core functionality. If you decide to implement SAM, i think it will definitely useful for my little project. How can i keep in touch with the update? Thanks |
Beta Was this translation helpful? Give feedback.
-
Does that means that context can be serialized to the disk (e.g. after initial prompt evaluation), and deserialized later on? If no, could you please point me which structures are responsible for storing hidden state, so I can implement initial prompt caching. It will be useful for parameter tuning. |
Beta Was this translation helpful? Give feedback.
-
Please turn the llama.cpp into a HTTP service, as it is hard to interface with raw power without going thru python binding which is very very slow. |
Beta Was this translation helpful? Give feedback.
-
Hi, I found that Is it not in the plan anymore? I am interested because I am trying building an API service. Thanks. |
Beta Was this translation helpful? Give feedback.
-
I'm confused about the relationship of langchain with LLM, seems to me, that langchain is just talking to the model, and would be an easy fit into llama.cpp The agent's part could be done at a later date, I'm more interested in prepping the model from a document before doing the lookup. (If I'm understanding how it works correctly). |
Beta Was this translation helpful? Give feedback.
-
High-prio
Project llama : add LoRA support
Add capabilities for low-rank adaptation of LLaMA models and derivatives
Project ggml : improve integer quantization
Make the inference of quantized models faster and more accurate
Project ggml : improve threading implementation
Better utilization of the available CPU resources via improved thread management
Start implementing inference of other models and extend
ggml
operatorsFor now, I think it is best to implement basic inference examples in the ggml repo, similar to GPT-2, GPT-J, Cerebras-GPT. There is no need for dedicated repos like
llama.cpp
, unless a new very cool model appears (edit: I think it just appeared SAM)Add
llama_state
to allow parallel text generation sessions with a single modelShould be done in a similar way it is done in whisper.cpp
Low-prio
Beta Was this translation helpful? Give feedback.
All reactions