Lit-LLM implements an accessible API to operate with LLMs.
The design principle is to introduce the thinnest possible abstraction, and at the same time keep things simple and hackable.
The first implementation focuses on lit-gpt, but adding support for more is trivial.
Current features include:
- loading/downloading/converting models by specifying a string identifier (e.g.
microsoft/phi-1_5
) - preparing datasets with awareness of target models (tokenizer, etc)
- finetuning with a single command
- chatting with context
- exposing OpenAI-compatible HTTP endpoints
Take a look at main.py
for an example of finetuning and generation. The steps are as follows.
Create an instance of the model passing the model name as an argument:
model = llm.LLM("microsoft/phi-1_5")
Start a chat and send a prompt to see how the base model behaves:
with model.chat(temperature=0.2) as chat:
response = chat.generate(prompt="What do you think about pineapple pizza?")
Download and prepare the instruction-tuning dataset. To prepare the Alpaca dataset call the prepare_dataset
method of model
:
alpaca = model.prepare_dataset("alpaca")
Once you download and prepare the dataset once, you can get the dataset directly
alpaca = model.get_dataset("alpaca")
You can also prepare the Dolly dataset:
alpaca = model.prepare_dataset("dolly")
You can also bring your own CSV, in which case you can use (dataset="csv"
).
mydataset = model.prepare_csv_dataset("mydataset", csv_path="<path_to_csv>")
In the latter case, you need to provide a CSV file with the following 3 columns
instruction input output
and pass it as the csv_path=<data.csv>
argument to the function.
You can now fine-tune your model on the data. Finetuning will automatically run across all available GPUs.
To finetune, call the finetune
method on the model
, and pass the dataset
that you prepared previously.
finetuned = model.finetune(dataset=alpaca, max_iter=100)
You can pass a number of hyperparameters to finetune
in order to control the outcome.
You can chat with the resulting model just like previously, only creating the chat context using finetuned
:
with finetuned.chat(temperature=0.2) as chat:
response = chat.generate(prompt="What do you think about pineapple pizza?")
You can serve each model through an OpenAI-compatible API server this way
finetuned.serve(port=8000)
You can send a request to the server using
python client.py "What do you think about pineapple pizza?"
in a separate terminal, or equivalently make a cURL request
curl http://127.0.0.1:8000/v1/chat/completions -H "Content-Type: application/json" -H "X-API-KEY: 1234567890" -d '{
"messages": [{"role": "user", "content": "What do you think about pineapple pizza?"}],
"temperature": 0.7
}'