OpenAI-Compatible RESTful APIs & SDK

FastChat provides OpenAI-Compatible APIs for its supported models, so you can use FastChat as a local drop-in replacement for OpenAI APIs. The FastChat server is compatible with both openai-python library and cURL commands.

The following OpenAI APIs are supported:

Chat Completions. (Reference: https://platform.openai.com/docs/api-reference/chat)
Completions. (Reference: https://platform.openai.com/docs/api-reference/completions)
Embeddings. (Reference: https://platform.openai.com/docs/api-reference/embeddings)

RESTful API Server

First, launch the controller

python3 -m fastchat.serve.controller

Then, launch the model worker(s)

python3 -m fastchat.serve.model_worker --model-name 'vicuna-7b-v1.1' --model-path /path/to/vicuna/weights

Finally, launch the RESTful API server

python3 -m fastchat.serve.openai_api_server --host localhost --port 8000

Now, let us test the API server.

OpenAI Official SDK

The goal of openai_api_server.py is to implement a fully OpenAI-compatible API server, so the models can be used directly with openai-python library.

First, install openai-python:

pip install --upgrade openai

Then, interact with model vicuna:

import openai
openai.api_key = "EMPTY" # Not support yet
openai.api_base = "http://localhost:8000/v1"

model = "vicuna-7b-v1.1"
prompt = "Once upon a time"

# create a completion
completion = openai.Completion.create(model=model, prompt=prompt, max_tokens=64)
# print the completion
print(prompt + completion.choices[0].text)

# create a chat completion
completion = openai.ChatCompletion.create(
  model=model,
  messages=[{"role": "user", "content": "Hello! What is your name?"}]
)
# print the completion
print(completion.choices[0].message.content)

Streaming is also supported. See test_openai_sdk.py.

cURL

cURL is another good tool for observing the output of the api.

List Models:

curl http://localhost:8000/v1/models

Chat Completions:

curl http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "vicuna-7b-v1.1",
    "messages": [{"role": "user", "content": "Hello! What is your name?"}]
  }'

Text Completions:

curl http://localhost:8000/v1/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "vicuna-7b-v1.1",
    "prompt": "Once upon a time",
    "max_tokens": 41,
    "temperature": 0.5
  }'

Embeddings:

curl http://localhost:8000/v1/embeddings \
  -H "Content-Type: application/json" \
  -d '{
    "model": "vicuna-7b-v1.1",
    "input": "Hello world!"
  }'

Todos

Some features to be implemented:

Support more parameters like logprobs, logit_bias, user, presence_penalty and frequency_penalty
Model details (permissions, owner and create time)
Edits API
Authentication and API key
Rate Limitation Settings

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

openai_api.md

openai_api.md

OpenAI-Compatible RESTful APIs & SDK

RESTful API Server

OpenAI Official SDK

cURL

Todos

Files

openai_api.md

Latest commit

History

openai_api.md

File metadata and controls

OpenAI-Compatible RESTful APIs & SDK

RESTful API Server

OpenAI Official SDK

cURL

Todos