Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ability to execute prompts against an asyncio (non-blocking) API #507

Closed
simonw opened this issue Jun 10, 2024 · 23 comments
Closed

Ability to execute prompts against an asyncio (non-blocking) API #507

simonw opened this issue Jun 10, 2024 · 23 comments
Labels
enhancement New feature or request

Comments

@simonw
Copy link
Owner

simonw commented Jun 10, 2024

Datasette in particular needs a neat way to run different models via an await ... async method. Having a way for model plugins to provide this - especially the API based plugins like OpenAI and Claude - would be really helpful and would unblock a bunch of Datasette plugins that could benefit from an LLM abstraction layer.

@simonw simonw added the enhancement New feature or request label Jun 10, 2024
@simonw
Copy link
Owner Author

simonw commented Nov 6, 2024

Two options:

  1. Plugins have to specifically upgrade to offer this
  2. Works for all plugins - those that have not yet upgraded (like most of the local model ones) can run in a thread and LLM will provide an async interface to them anyway

I'm going to build 1. I will treat 2. as an optional nice-to-have in the future.

I expect that any of the API-driven plugins which do not use a client library (like llm-mistral) will want a mechanism where they can define how the HTTP requests work and then some abstraction makes that work with both blocking and asyncio httpx.

@simonw
Copy link
Owner Author

simonw commented Nov 6, 2024

Two options for the API design:

  1. Keep the current model classes, but add await model.aprompt(...) style methods, similar to the Django ORM's await author.books.afirst() style of API design
  2. Offer a separate async class - so model = llm.get_async_model("gpt-4o") and then async for token in model.prompt(...)

I'm torn on these two. Not sure how to decide.

@simonw
Copy link
Owner Author

simonw commented Nov 6, 2024

I used Claude as an electric bicycle for the mind and I've decided to go with the separate async class. I think having a separate class will make things like type hints more obvious. https://gist.github.com/simonw/f69e6a1fe21df5007ce038dfe91c62f4

@simonw
Copy link
Owner Author

simonw commented Nov 6, 2024

I'm going to add a llm prompt ... --async option purely for my own testing which switches to the async version of a model.

@simonw
Copy link
Owner Author

simonw commented Nov 6, 2024

I'm going to have to do this for embedding models, too.

@simonw
Copy link
Owner Author

simonw commented Nov 6, 2024

This may be tricky:

llm/llm/cli.py

Lines 361 to 363 in fe1e097

prompt_method = model.prompt
if conversation:
prompt_method = conversation.prompt

That's a reminder that sometimes we do model.prompt() and sometimes we do conversation.prompt() - which means we'll need an async mechanism for conversations, too - or some kind of thing where we can get an async alternative version of a conversation.

One option: instead of conversation.prompt() we go back to model.prompt() and add conversation= as a keyword argument.

Reminder though: the reason we have conversation.prompt() is the idea was to add tool support as a feature of conversations, since when you run a tool-function you may end up having a single prompt that executes multiple rounds against the LLM as it executes functions and sends in their outputs.

@simonw
Copy link
Owner Author

simonw commented Nov 6, 2024

From https://community.openai.com/t/asynchronous-use-of-the-library/479414/4 it looks like OpenAI used to have a .acreate() method and switched to a AsyncOpenAI() client instead.

@simonw
Copy link
Owner Author

simonw commented Nov 6, 2024

OpenAI async docs are here: https://github.com/openai/openai-python?tab=readme-ov-file#async-usageimport

import asyncio
from openai import AsyncOpenAI

client = AsyncOpenAI()


async def main() -> None:
    chat_completion = await client.chat.completions.create(
        messages=[
            {
                "role": "user",
                "content": "Say this is a test",
            }
        ],
        model="gpt-3.5-turbo",
    )

@simonw
Copy link
Owner Author

simonw commented Nov 6, 2024

I got the first prototype of this working (minus logging to the database):

llm hi --async                                                 
Hello! How can I assist you today?

The --no-stream option is currently broken though:

llm hi --async --no-stream
Error: object AsyncResponse can't be used in 'await' expression

@simonw
Copy link
Owner Author

simonw commented Nov 6, 2024

Claude helped with the refactoring of Model into AsyncModel: https://gist.github.com/simonw/3554b00117b16eb65889a747e32a96ae

@simonw
Copy link
Owner Author

simonw commented Nov 6, 2024

Before I commit further to this API, let's see how this feels from Python library code.

@simonw
Copy link
Owner Author

simonw commented Nov 6, 2024

This is interesting:

>>> import asyncio
>>> import llm
>>> model = llm.get_async_model("gpt-4o-mini")
>>> model.prompt("say hi in spanish")
/Users/simon/Dropbox/Development/llm/llm/models.py:390: RuntimeWarning: coroutine 'AsyncResponse.text' was never awaited
  return "<Response prompt='{}' text='{}'>".format(
RuntimeWarning: Enable tracemalloc to get the object allocation traceback
<Response prompt='say hi in spanish' text='<coroutine object AsyncResponse.text at 0x3133d84a0>'>

That's because the default __repr__() method for a response tries to use self.text() which is now an awaitable.

simonw added a commit that referenced this issue Nov 6, 2024
@simonw
Copy link
Owner Author

simonw commented Nov 6, 2024

Using python -m asynco again:

>>> import asyncio
>>> import llm
>>> model = llm.get_async_model("gpt-4o-mini")
>>> response = model.prompt("say hi in spanish")
>>> response
<Response prompt='say hi in spanish' text='... not yet awaited ...'>
>>> await response.text()
'¡Hola!'
>>> response
<Response prompt='say hi in spanish' text='¡Hola!'>

@simonw
Copy link
Owner Author

simonw commented Nov 6, 2024

This output tokens one at a time:

>>> import asyncio
>>> import llm
>>> model = llm.get_async_model("gpt-4o-mini")
>>> async for token in model.prompt("describe a good dog in french"):
...     print(token, end="", flush=True)
... 
Un bon chien est fidèle, affectueux et protecteur. Il aime passer du temps avec sa famille et est toujours prêt à jouer. Sa loyauté est sans égale, et il fait tout pour garder ses maîtres heureux. Un bon chien est aussi intelligent et facile à dresser, ce qui lui permet d'apprendre rapidement des commandes et des tours. Sa présence apporte réconfort et joie, et il sait être un excellent compagnon, toujours là dans les moments de bonheur comme dans les moments difficiles. En somme, un bon chien est un véritable ami pour la vie.

@simonw
Copy link
Owner Author

simonw commented Nov 6, 2024

This async API feels pretty good to me. I think this may be the right design.

Still to figure out:

  • How do plugins register their async versions? Might need a register_async_models() hook.
  • Get another plugin working - Anthropic or Gemini would be good
  • How does logging to the DB work? That's currently a method on Model, needs to move out and an async version needs to be considered.
  • How about loading FROM the database? Will that work for things like attachments too?
  • Testing for all of this
  • Maybe try getting a llm-gguf plugin to work with this? Might help confirm the API design further by showing it working with a blocking plugin (maybe via a run-in-thread hack)
  • Are there any other weird knock-on impacts of this I haven't considered yet?
  • How should embeddings work? Can I punt on that for the first release, maybe for the first alpha?

@simonw
Copy link
Owner Author

simonw commented Nov 6, 2024

Reminder: model registration currently looks like this:

@hookimpl
def register_models(register):
register(Chat("gpt-3.5-turbo"), aliases=("3.5", "chatgpt"))
register(Chat("gpt-3.5-turbo-16k"), aliases=("chatgpt-16k", "3.5-16k"))
register(Chat("gpt-4"), aliases=("4", "gpt4"))
register(Chat("gpt-4-32k"), aliases=("4-32k",))
# GPT-4 Turbo models
register(Chat("gpt-4-1106-preview"))
register(Chat("gpt-4-0125-preview"))
register(Chat("gpt-4-turbo-2024-04-09"))
register(Chat("gpt-4-turbo"), aliases=("gpt-4-turbo-preview", "4-turbo", "4t"))
# GPT-4o
register(Chat("gpt-4o", vision=True), aliases=("4o",))
register(Chat("gpt-4o-mini", vision=True), aliases=("4o-mini",))
# o1
register(Chat("o1-preview", can_stream=False, allows_system_prompt=False))
register(Chat("o1-mini", can_stream=False, allows_system_prompt=False))
# The -instruct completion model
register(
Completion("gpt-3.5-turbo-instruct", default_max_tokens=256),
aliases=("3.5-instruct", "chatgpt-instruct"),
)

If I add a new separate register_async_models() hook I need to think about things like aliases - it wouldn't make sense for a sync model and an async model to have different aliases from each other.

Given that, I think I want the existing register_models() hook and register() callback function to handle both types of model.

So what could that look like? One option:

def register_models(register):
    register(Chat("gpt-3.5-turbo"), AsyncChat("gpt-3.5-turbo"), aliases=("3.5", "chatgpt"))

So register() takes an optional second argument.

Problem there is that I may eventually have models which are available only in their async variant.

So I could do this:

def register_models(register):
    register(model=Chat("gpt-3.5-turbo"), async_model=AsyncChat("gpt-3.5-turbo"), aliases=("3.5", "chatgpt"))

That would free me up to support this:

def register_models(register):
    register(async_model=AsyncChat("gpt-3.5-turbo"), aliases=("3.5", "chatgpt"))

That's not terrible, it may be the way to go.

@simonw
Copy link
Owner Author

simonw commented Nov 6, 2024

I think the llm.get_model() and llm.get_async_model() methods should be able to spot when a model is available in sync but not async variants (and vice versa) and return a more useful error message than just "model not found".

@simonw
Copy link
Owner Author

simonw commented Nov 6, 2024

The only place that calls that plugin hook (and hence the only place that defines the register() method right now) is here:

llm/llm/__init__.py

Lines 66 to 85 in fe1e097

def get_models_with_aliases() -> List["ModelWithAliases"]:
model_aliases = []
# Include aliases from aliases.json
aliases_path = user_dir() / "aliases.json"
extra_model_aliases: Dict[str, list] = {}
if aliases_path.exists():
configured_aliases = json.loads(aliases_path.read_text())
for alias, model_id in configured_aliases.items():
extra_model_aliases.setdefault(model_id, []).append(alias)
def register(model, aliases=None):
alias_list = list(aliases or [])
if model.model_id in extra_model_aliases:
alias_list.extend(extra_model_aliases[model.model_id])
model_aliases.append(ModelWithAliases(model, alias_list))
pm.hook.register_models(register=register)
return model_aliases

@simonw
Copy link
Owner Author

simonw commented Nov 6, 2024

Got this working:

% llm -m claude-3.5-sonnet hi     
Hello! How can I help you today?
% llm -m claude-3.5-sonnet hi --async
Error: 'Unknown async model (sync model exists): claude-3.5-sonnet'

simonw added a commit that referenced this issue Nov 6, 2024
@simonw
Copy link
Owner Author

simonw commented Nov 6, 2024

I have a hunch the saving-to and loading-from the database bit may end up producing new plugin hooks. I'll need those for Datasette at any rate since it has its own mechanism for executing database writes.

Although I may not need plugin hooks for that if I stick to keeping the database persistence code in the CLI layer.

But... I already have a need for plugins like llm-cmd and llm-jq to implement their own database logging, which they currently skip out on entirely.

So it may be that it's not so much a plugin as a documented interface for blocking database logging (for use in other blocking plugins) and a documented "roll your own" policy for async database logging.

Which may grow into plugin hooks for database persistence that can be both sync and async friendly at a later date.

@simonw simonw changed the title asyncio support Ability to execute prompts against an asyncio (non-blocking) API Nov 6, 2024
@simonw
Copy link
Owner Author

simonw commented Nov 6, 2024

I'm doing this in the prototype right now and I really don't like it:

class AsyncChat(AsyncModel, Chat):

That's multiple inheritance AND I'm inheriting from the non-async model class!

I'm going to refactor those to a common base class instead.

@simonw
Copy link
Owner Author

simonw commented Nov 6, 2024

Got Claude to refactor that for me: https://gist.github.com/simonw/3872e758c129917980605eed49cbce7f

@simonw
Copy link
Owner Author

simonw commented Nov 13, 2024

simonw added a commit to simonw/llm-claude-3 that referenced this issue Nov 14, 2024
simonw added a commit that referenced this issue Nov 14, 2024
…els (#613)

- #507 (comment)

* register_model is now async aware

Refs #507 (comment)

* Refactor Chat and AsyncChat to use _Shared base class

Refs #507 (comment)

* fixed function name

* Fix for infinite loop

* Applied Black

* Ran cog

* Applied Black

* Add Response.from_row() classmethod back again

It does not matter that this is a blocking call, since it is a classmethod

* Made mypy happy with llm/models.py

* mypy fixes for openai_models.py

I am unhappy with this, had to duplicate some code.

* First test for AsyncModel

* Still have not quite got this working

* Fix for not loading plugins during tests, refs #626

* audio/wav not audio/wave, refs #603

* Black and mypy and ruff all happy

* Refactor to avoid generics

* Removed obsolete response() method

* Support text = await async_mock_model.prompt("hello")

* Initial docs for llm.get_async_model() and await model.prompt()

Refs #507

* Initial async model plugin creation docs

* duration_ms ANY to pass test

* llm models --async option

Refs #613 (comment)

* Removed obsolete TypeVars

* Expanded register_models() docs for async

* await model.prompt() now returns AsyncResponse

Refs #613 (comment)

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
simonw added a commit that referenced this issue Nov 14, 2024
simonw added a commit to simonw/llm-claude-3 that referenced this issue Nov 14, 2024
* Tip about pytest --record-mode once

Plus mechanism for setting API key during tests with PYTEST_ANTHROPIC_API_KEY

* Async support for Claude models

Closes #25
Refs simonw/llm#507
Refs simonw/llm#613

* Depend on llm>=0.18a0, refs #25
simonw added a commit to simonw/llm-claude-3 that referenced this issue Nov 14, 2024
@simonw simonw unpinned this issue Nov 17, 2024
simonw added a commit that referenced this issue Nov 17, 2024
simonw added a commit to simonw/llm-claude-3 that referenced this issue Nov 17, 2024
simonw added a commit that referenced this issue Nov 18, 2024
simonw added a commit to simonw/llm-mistral that referenced this issue Nov 19, 2024
simonw added a commit that referenced this issue Nov 20, 2024
@simonw simonw closed this as completed Nov 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant