p1 : LLM-based code completion engine at the edge #1
Replies: 19 comments 24 replies
-
@ggerganov awesome idea, looking forward to seeing this! Here are just some random thoughts from a similar project I was working on using the replit-code model:
|
Beta Was this translation helpful? Give feedback.
-
I think the question we want to ask ourselves before doing all this is whether or not these models produce good code that is actually helpful. If the programmer has to spend more time fixing the generated code than writing it themselves that sort of defeats the purpose of the completion engine. As a start I think it would be easiest to simply copy-and-paste code snippets from the editor into those models (basically the local context so to speak) just to see how well they can infer our intent and generate the next lines. This is simple and can be done with existing interfaces (including llama.cpp) to these LLMs. If everyone is satisfied with the performance then we can go forwards with an editor plugin and work on usability, performance, and so forth. |
Beta Was this translation helpful? Give feedback.
-
@ggerganov you should have a look at WizardCoder 15B |
Beta Was this translation helpful? Give feedback.
-
We can use LORA models for the different languages. If LORA approach can even work well, shall we even consider training the whole code base to feed model much more context? (this is not even possible for any existing copilot product.) |
Beta Was this translation helpful? Give feedback.
-
Would you consider adding a more general "Coding assistance" to the objectives? Personally I find this more useful than Copilot-like completions because you can be more explicit about your intent. I've played with Sourcegraph Cody, which has various standard recipes as well as a chat prompt interface. Some of these recipes could fall under "LLM-assisted code transformation", but you can also do e.g. "Explain selected code" which is helpful for exploring new codebases. I find the chat interface useful to ask things like "The function my_function() results in this error , how to fix this?" |
Beta Was this translation helpful? Give feedback.
-
I was investigating this when we were doing the server refactoring. There are existing projects like fauxpilot that do this by imitating the OpenAI API. The Copilot plugin can be directed to it by some config changes. There are some things that are missing or not ideal in llama.cpp:
|
Beta Was this translation helpful? Give feedback.
-
One thing I think is under explored is language server guided sampling. Language servers can be queried for a list of legal symbols at different scopes/context. Of course, in contexts where new symbols can be introduced, it's a bit problematic. |
Beta Was this translation helpful? Give feedback.
-
hello i have actually been working on this front for a while, i have a few implementations of an autocoding system on my github, (just simple scripts in various states of working) https://lablab.ai/event/anthropic-ai-hackathon/entropic-alignment/py-interpreter this is an implementation of a similar solution using anthropics claude from my submission for the hackathon which i completed to get access to 100k context additionally ongoing and continuous development of a third system has been ongoing (i didnt do too much of the leg work on this one, but he is one of the two others involved with the hackathon project above) ive also got a community on my server which is composed almost entirely of AI researchers and developers, mostly who focus on open sourced projects. i very likely will be able to bring a few people interested who are highly skilled developers on board as an initial seed for a dev team https://discord.gg/n9hXaBPWxx heres the invite if anyones interested in joining or contributing to super cool projects :) i throw gpus at people with cool projects when i can afford it :D |
Beta Was this translation helpful? Give feedback.
-
The best model for coding is a WizardCoder-15B right now. |
Beta Was this translation helpful? Give feedback.
-
microsoft's phi-1 maybe will be open source in the near future. |
Beta Was this translation helpful? Give feedback.
-
I think it's worth checking out the internals of the Copilot: link |
Beta Was this translation helpful? Give feedback.
-
I think it's worth investigating beam search for LLaMA and StarCoder. For instruction-based one-shot code generation, or most kinds of problem solving with smaller models in general, disabling it results in a cascade of snowballed hallucinations. |
Beta Was this translation helpful? Give feedback.
-
Seems there is a new player in town, 3B
https://huggingface.co/sahil2801/replit-code-instruct-glaive |
Beta Was this translation helpful? Give feedback.
-
Hi @ggerganov, I'm the maintainer of https://github.com/TabbyML/tabby, a project aiming to create an open-source alternative to GitHub Copilot. We initially built our inference engine using https://github.com/OpenNMT/CTranslate2, as it provided the best support for CUDA + INT8 inference at that time. However, we are actively exploring options to make it suitable for on-device usage through INT8 inference and Core ML. You can find the relevant links here: Tabby itself focuses more on the engineering aspect of Copilot, such as implementing a caching strategy in the IDE extension and parsing code into Tree Sitter tags to improve suggestions. We remain neutral regarding the choice of inference engine and model. Therefore, I can envision a clear path for collaboration. Integrating ggml into Tabby's Rust inference core should be straightforward, similar to what we did with CTranslate2. |
Beta Was this translation helpful? Give feedback.
-
Hey, what is the difference between p1 with the above projects? Will p1 replace them to provide faster completion speed by a new technology? Or just use same technology but have a better engineering code to provide better user experience? What is the largest novelty of p1? |
Beta Was this translation helpful? Give feedback.
-
I’ve been looking into doing a similar project and would love to assist in any way I can. My starting point was going to be a new model fine-tuned off Llama 2 using the dataset from Code Clippy. I am new to fine-tuning LLMs, but have been doing some reading and think a PEFT QLoRA training combined with quantizing the model using ggml would get us there. I can train some PoCs locally before committing to serious cloud GPU time (I’ve been playing w/ Lambda Labs cloud, which seems reasonably priced at an A10 for $0.60 an hour). As for local inferencing, I’ve been thinking WASM might be a good fit here. Using WASM one could embed llama.cpp into VS Code. A sufficiently quantized model should fit within the 4GB RAM limit imposed by WASM (referenced in #97). Such an implementation is sufficiently performant, then it would mean the plug-in could work with minimal user config, at least to start. I believe offering an option to running llama.cpp via an external binary should also he supported, but something lower touch to start could be advantageous for users just trying it out. |
Beta Was this translation helpful? Give feedback.
-
I can give a hand in training/finetuning here. Although adding support for other architectures in llama.cpp may help us use high-quality of-the-shelf models out there, I think after LLaMA-2 it's quite plausible to finetune it for our exact use case, e.g., completion-based, instruction-based, or chat-based. We can also collect data from users who agree to share their usage information with us and iteratively re-finetune the model for better quality. |
Beta Was this translation helpful? Give feedback.
-
@ggerganov Concernig the P1 project:
Having a local copilot clone at hand would make my live so much more comfortable and exciting. |
Beta Was this translation helpful? Give feedback.
-
It's been a while, but here is some progress on this project for anyone still interested: ggerganov/llama.cpp#9787 |
Beta Was this translation helpful? Give feedback.
-
Intro
This project is an attempt to implement a local code completion engine utilizing large language models (LLM).
Think of it as an open-source alternative to Github Copliot that runs on your device.
We will explore how feasible it is to apply existing open-source LLMs for this task. We will utilize the existing ggml technology to achieve efficient on-device inference. The implementation will be community driven, similar to the llama.cpp project. Everyone is welcome to join and collaborate in the process.
The name of the project
p1
stands for the first official project developed and supported by ggml.aiPrimary objectives
Secondary objectives
Implementation strategy
The llama.cpp project has already demonstrated almost all necessary ingredients for creating an efficient code completion engine. The major effort in
p1
will be to find a way to combine all these ingredients into a single application that is easy to extend and integrate with other applications.The main ingredients are:
Based on the inference performance that we have achieved so far, we should aim at utilizing models in the range of 3B - 20B parameters in order to achieve generation speeds of about ~50 tok/s or more on modern hardware. Candidate models are:
ggml-code
(model trained by ggml.ai for source code, TBD)For speculative sampling, we will try to utilize small fine-tuned models for specific programming languages. These "draft" models can be in the order of a few tens of million of parameters and their main purpose will be to just improve the inference speed of the main model. The inference of the draft models will be very fast and can easily be done in parallel on the CPU.
The main programming language of
p1
will be C++ as usual.Collaboration
I'm hoping to utilize the
llama.cpp
momentum and receive help from existing contributors. Ideally, I wish that we manage to form a small dev team that will drive the main development ofp1
and will be able to make decisions autonomously. I think the above description sets the main vision of the project, but there are still many design decisions left to be made and I don't want to be the sole person making those decisions. I wish this to be a collaborative effort. Hopefully,p1
will become a place where capable people find ways to demonstrate their skills by creating a useful tool for everyone to use. I'm sure that in the process, we will find a lot of interesting problems to solve and potentially set the stage for new LLM applications in the future.For that purpose, I will be inviting collaborators to the ggml-org organization based on their expressed interest and past contributions. The main goal will be to eventually delegate the entire
p1
project to the newly formed dev team. In case we don't succeed in forming the dev team the way I imagine it, then I'll still try to keepp1
alive. It will just probably take a bit longer to complete. And of course, there is also the possibility that this effort turns out to be not as easy as I imagine it to be, in which casep1
will simply die :)If you are new to
ggml
andllama.cpp
, but what you read here sounds interesting, make sure to first get familiar with llama.cpp as it is the closest thing to what I imaginep1
will become.For now, lets brainstorm for a bit in the discussion below and see how things go :)
Related work
Beta Was this translation helpful? Give feedback.
All reactions