True OpenAI drop-in replacement by InferenceClient #2384

Wauplin · 2024-07-10T10:09:27Z

Goal is to be able to use InferenceClient exactly the same way as OpenAI client. To do so we need to:

rename model to base_url => added an alias for it
rename model_id to model
rename token to api_key => added an alias for it
add alias for client.chat.completions.create

@philschmid could you have a look at it and confirm it meets your expectations? See tests for real example (here and here)

Sync + `stream=False`

client = InferenceClient(
    base_url="https://api-inference.huggingface.co/models/meta-llama/Meta-Llama-3-8B-Instruct",
    api_key="my-api-key",
)
output = client.chat.completions.create(
    model="meta-llama/Meta-Llama-3-8B-Instruct",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Count to 10"},
    ],
    stream=False,
    max_tokens=1024,
)
assert output.choices[0].message.content == "1, 2, 3, 4, 5, 6, 7, 8, 9, 10!"

Sync + `stream=True`

client = InferenceClient()
output = client.chat.completions.create(
    model="meta-llama/Meta-Llama-3-8B-Instruct",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Count to 10"},
    ],
    stream=True,
    max_tokens=1024,
)

for chunk in output:
    print(chunk.choices[0].delta.content)

Async + `stream=False`

    client = AsyncInferenceClient(
        base_url="https://api-inference.huggingface.co/models/meta-llama/Meta-Llama-3-8B-Instruct",
        api_key="my-api-key",
    )
    output = await client.chat.completions.create(
        model="meta-llama/Meta-Llama-3-8B-Instruct",
        messages=[
            {"role": "system", "content": "You are a helpful assistant."},
            {"role": "user", "content": "Count to 10"},
        ],
        stream=False,
        max_tokens=1024,
    )
    assert output.choices[0].message.content == "1, 2, 3, 4, 5, 6, 7, 8, 9, 10!"

Async + `stream=True`

client = AsyncInferenceClient()
output = await client.chat.completions.create(
    model="meta-llama/Meta-Llama-3-8B-Instruct",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Count to 10"},
    ],
    stream=True,
    max_tokens=1024,
)

chunked_text = [chunk.choices[0].delta.content async for chunk in output]

TODO:

document this properly (see inference.md)

HuggingFaceDocBuilderDev · 2024-07-10T10:13:33Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

LysandreJik

Awesome, very clear docs! Thanks @Wauplin

docs/source/en/guides/inference.md

julien-c · 2024-07-10T17:23:23Z

@philschmid could you have a look at it and confirm it meets your expectations?

from experience, a very tough test to pass usually 🤣

EDIT: PR looks cool!

Wauplin · 2024-07-11T10:08:25Z

Thanks for the reviews!

True OpenAI drop-in replacement by InferenceClient

4bc785f

Wauplin requested review from LysandreJik and philschmid July 10, 2024 10:09

boulet

7215d61

Wauplin mentioned this pull request Jul 10, 2024

Truly openai drop-in replacement for chat completion #2369

Closed

4 tasks

Wauplin added 2 commits July 10, 2024 14:02

Merge branch 'main' into 2369-true-openai-drop-in-replacement

ba00e1e

document openai compatibility

0316248

LysandreJik approved these changes Jul 10, 2024

View reviewed changes

typo

5f71531

philschmid reviewed Jul 10, 2024

View reviewed changes

docs/source/en/guides/inference.md Outdated Show resolved Hide resolved

use diff in code snippets

3e30e6f

philschmid approved these changes Jul 10, 2024

View reviewed changes

why using us

dc97b0c

Wauplin merged commit bcef2ea into main Jul 11, 2024
16 checks passed

Wauplin deleted the 2369-true-openai-drop-in-replacement branch July 11, 2024 10:08

Wauplin mentioned this pull request Sep 13, 2024

Fix resolve chat completion URL #2540

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

True OpenAI drop-in replacement by InferenceClient #2384

True OpenAI drop-in replacement by InferenceClient #2384

Wauplin commented Jul 10, 2024 •

edited

Loading

HuggingFaceDocBuilderDev commented Jul 10, 2024

LysandreJik left a comment

julien-c commented Jul 10, 2024 •

edited

Loading

Wauplin commented Jul 11, 2024

True OpenAI drop-in replacement by InferenceClient #2384

True OpenAI drop-in replacement by InferenceClient #2384

Conversation

Wauplin commented Jul 10, 2024 • edited Loading

Sync + stream=False

Sync + stream=True

Async + stream=False

Async + stream=True

TODO:

HuggingFaceDocBuilderDev commented Jul 10, 2024

LysandreJik left a comment

Choose a reason for hiding this comment

julien-c commented Jul 10, 2024 • edited Loading

Wauplin commented Jul 11, 2024

Wauplin commented Jul 10, 2024 •

edited

Loading

Sync + `stream=False`

Sync + `stream=True`

Async + `stream=False`

Async + `stream=True`

julien-c commented Jul 10, 2024 •

edited

Loading