Initial plugin design #1

simonw · 2024-12-02T17:19:33Z

Goal of this plugin is to track token usage of various LLM models when used by Datasette plugins, to help provide a mechanism for things like only allowing X tokens per day (for free demo apps) or allowing customers of SaaS platforms to purchase more token allowance.

Will use the new features from LLM 0.19:

simonw · 2024-12-02T17:20:44Z

I hacked together this prototype as a starting point:

migration = Migrations("datasette_llm")

@migration()
def create_usage_table(db):
    db["llm_usage"].create({
        "id": int,
        "created": float,
        "model": str,
        "purpose": str,
        "actor_id": str,
        "input_tokens": int,
        "output_tokens": int,
    }, pk="id")


class WrappedModel:
    def __init__(self, model, datasette, purpose=None):
        self.model = model
        self.datasette = datasette
        self.purpose = purpose

    async def prompt(self, prompt, system=None, actor_id=None, **kwargs):
        response = self.model.prompt(prompt, system=system, **kwargs)
        async def done(response):
            # Log usage against current actor_id and purpose
            usage = await response.usage()
            input_tokens = usage.input
            output_tokens = usage.output
            db = self.datasette.get_database("llm")
            await db.execute_write("""
            insert  into llm_usage (created, model, purpose, actor_id, input_tokens, output_tokens)
            values (:created, :model, :purpose, {actor_id}, :input_tokens, :output_tokens)
            """.format(actor_id = ":actor_id" if actor_id else "null"), {
                "created": time.time(),
                "model": self.model.model_id,
                "purpose": self.purpose,
                "actor_id": actor_id,
                "input_tokens": input_tokens,
                "output_tokens": output_tokens,
            })
        await response.on_done(done)
        return response

    def __repr__(self):
        return f"WrappedModel: {self.model.model_id}"


class LLM:
    def __init__(self, datasette):
        self.datasette = datasette

    def get_async_models(self):
        return [WrappedModel(model) for model in llm.get_async_models()]

    def get_async_model(self, model_id=None, purpose=None):
        return WrappedModel(
            llm.get_async_model(model_id), self.datasette, purpose=purpose
        )

Usage:

from datasette.app import Datasette
from datasette_llm import LLM
ds = Datasette()

await ds.invoke_startup()
llm = LLM(ds)
m = llm.get_async_model("gpt-4o-mini")
r = await m.prompt("hi")
await r.text()

simonw · 2024-12-02T17:37:29Z

This design stores timestamps as a float, which I haven't actually done before. I don't want to store them as strings because it's too wasteful. Integers are also an option, but I'd like to capture time more finely grained than one second, so I could use ms-since-epoch (int(time.time() * 1000)).

REAL in SQLite is always 8 bytes. Integer can be 1, 2, 3, 4, 6 or 8 depending on size: https://www.sqlite.org/datatype3.html

So I think 4 bytes for integer timestamp and 6 bytes for ms integer timestamp.

simonw · 2024-12-02T17:38:35Z

I'm going to do integer ms since epoch, which may then result in other features in the Datasette ecosystem to better support that.

simonw · 2024-12-02T17:43:57Z

To make that llm object available I considered two patterns:

async def my_view(datasette):
    model = datasette.llm.get_async_model("gpt-4o-mini")

This would work by having a startup() plugin hook that set up datasette.llm as a new property for other plugins to access.

Second option:

from datasette_llm_usage import LLM

llm = LLM(datasette)

This avoids the datasette.llm extra property trick, at the expense of a more verbose import.

I'm leaning to the second option, because of Python's optional typing. I'll implement that first, then maybe add a datasette.llm shortcut in the future if it feels right.

simonw · 2024-12-02T18:51:25Z

Did some brainstorming with Claude about how allowances should work: https://gist.github.com/simonw/5339a4bc71508e553cee73ab00b350eb

A tricky thing about allowances is that I want to have one per model family - so a user might get 100,000 free OpenAI tokens per day, 10,000 Anthropic etc - but that needs to take into account the difference in price between models. GPT-4o-mini is 2.5 / 0.15 = 16.7 times cheaper than GPT-4o.

I'm going to go with a "credits" system where we count credits internally that then map to token allowances, but implement a UI feature that shows you things like "16,000 GPT-4o-mini tokens (1,000 GPT-4o tokens) left" - so users never have to think about those raw credit numbers out of context.

simonw · 2024-12-02T18:54:23Z

Minor problem: where does the information about the relative prices of the model families live?

I could put it in the LLM plugins but I've avoided baking in pricing information so far because I don't want to ship a new plugin version any time the prices change.

I think for the moment that stuff goes in this plugin, and can be over-ridden in configuration.

simonw · 2024-12-02T18:54:58Z

We can only track usage and allowances against a model that we've got pricing information for, so we can specify that a model won't be available via this plugin unless it has been configured.

simonw · 2024-12-02T19:06:28Z

I'm tempted to track credits as floating point numbers. I know that's bad practice for real money accounting, but here I don't think there's any harm in the occasional floating point inaccuracy creeping in.

simonw · 2024-12-02T19:08:43Z

If I track usage as integers, maybe I do it in the equivalent of thousandths-of-a-cent?

Gemini 1.5 Flash 8B is so cheap that even a 100 token input costs less than 1/1000th of a cent though. Round that up to 1? I think that would be OK.

simonw · 2024-12-02T19:11:16Z

I just got Flash 8B to write me a Haiku and it cost:

6 input, 22 output:
$0.000004 or 0.0004 cents

simonw · 2024-12-02T19:14:55Z

Wow Gemini Flash 8B is cheap.

llm -m gemini-1.5-flash-8b-latest 'describe image in detail' -a https://static.simonwillison.net/static/2024/recraft-ai.jpg -u

The image is a digital illustration of a cartoon raccoon.

The raccoon is light grayish-brown with distinctive white stripes on its tail and body. It has large, expressive eyes and a cheerful, slightly open-mouthed expression. It is holding a sign that says "I LOVE TRASH" in a simple, bold font.

The background is a light, neutral beige or gray color.

Small, light brown hearts are scattered lightly around the raccoon.

The image is presented within a digital interface, likely for design or creation purposes, as there are controls and options for resolution, file format (PNG, JPG, SVG, Lottie), style diversity settings, visibility, a Christmas customization option, and a "re-craft" button. There are also color settings showing hex code colors and a count of 7 colors.
Token usage: 263 input, 169 output

https://static.simonwillison.net/static/2024/recraft-ai.jpg

That's:

Total cost: $0.000035
Total cost: 0.0035 cents

simonw · 2024-12-02T19:27:43Z

How do I map a user to an allowance?

Maybe I punt on that for the moment, and just support global allowances for a specific Datasette instance. I can add per-user allowances later on.

simonw · 2024-12-02T19:30:44Z

The credit mechanism would make it possible to have an allowance that spans multiple models, which is probably better overall. I'm going to implement that.

So the simple initial version of allowances says that:

An allowance is enforced over the entire instance
An allowance counts credits, different models then cost different credits per token to use
Only models that are explicitly configured to be usable (with pricing information provided) can be used with an allowance
Allowances can optionally be configured to reset at midnight UTC

I'm also going to have a little bit of a denormalization where the number of remaining credits in the allowance is stored on that table.

simonw · 2024-12-02T19:38:48Z

Another challenge: the Gemini models charge differently for <128,000 tokens v.s. >128,000 tokens.

simonw · 2024-12-02T19:41:04Z

Actually I do want to support multiple allowances - if a user goes wild with the enrichments feature and burns through all their tokens I'd still like them to be able to use the MUCH cheaper query assistant out of a separate budget.

simonw · 2024-12-02T19:41:39Z

I think allowances have an optional purpose which, if present, causes that allowance to be used instead.

simonw · 2024-12-02T19:58:37Z

@dataclass
class Price:
    name: str
    model_id: str
    size_limit: Optional[int]
    input_token_cost_10000th_cent: int
    output_token_cost_10000th_cent: int

    def cost_in_cents(self, input_tokens: int, output_tokens: int):
        return (
            input_tokens * self.input_token_cost_10000th_cent
            + output_tokens * self.output_token_cost_10000th_cent
        ) / 1000000


PRICES = [
    Price("gemini-1.5-flash", "gemini-1.5-flash", 128000, 7, 30),
    Price("gemini-1.5-flash-128k", "gemini-1.5-flash", None, 15, 60),
    Price("gemini-1.5-flash-8b", "gemini-1.5-flash-8b", 128000, 3, 15),
    Price("gemini-1.5-flash-8b-128k", "gemini-1.5-flash-8b", None, 7, 30),
    Price("gemini-1.5-pro", "gemini-1.5-pro", 128000, 125, 500),
    Price("gemini-1.5-pro-128k", "gemini-1.5-pro", None, 250, 1000),
    Price("claude-3.5-sonnet", "claude-3.5-sonnet", None, 300, 1500),
    Price("claude-3-opus", "claude-3-opus", None, 1500, 7500),
    Price("claude-3-haiku", "claude-3-haiku", None, 25, 125),
    Price("claude-3.5-haiku", "claude-3.5-haiku", None, 100, 500),
    Price("gpt-4o", "gpt-4o", None, 250, 1000),
    Price("gpt-4o-mini", "gpt-4o-mini", None, 15, 60),
    Price("o1-preview", "o1-preview", None, 1500, 6000),
    Price("o1-mini", "o1-mini", None, 300, 1200),
]

Using prices from https://tools.simonwillison.net/llm-prices

>>> from datasette_llm_usage import PRICES
>>> for price in PRICES:
...     print(price.model_id, price.cost_in_cents(1000, 100))
... 
gemini-1.5-flash 0.01
gemini-1.5-flash 0.021
gemini-1.5-flash-8b 0.0045
gemini-1.5-flash-8b 0.01
gemini-1.5-pro 0.175
gemini-1.5-pro 0.35
claude-3.5-sonnet 0.45
claude-3-opus 2.25
claude-3-haiku 0.0375
claude-3.5-haiku 0.15
gpt-4o 0.35
gpt-4o-mini 0.021
o1-preview 2.1
o1-mini 0.42

Refs #1

simonw · 2024-12-02T20:40:48Z

I pushed an initial alpha, mainly to reserve the name on PyPI. Still a bunch more needed:

Mechanism for cutting off users if they run out of credits
Mechanism for populating the credit allowance
The daily refresh thing

Added /-/llm-usage-simple-prompt and /-/llm-usage-credits pages Documented new TokensExhausted exception Refs #1, #2

simonw added the enhancement New feature or request label Dec 2, 2024

simonw added a commit that referenced this issue Dec 2, 2024

First working prototype, refs #1

0490790

simonw added a commit that referenced this issue Dec 2, 2024

Release 0.1a0

5d5db30

Refs #1

simonw added a commit that referenced this issue Jan 9, 2025

Huge refactor and schema redesign

43c817e

Added /-/llm-usage-simple-prompt and /-/llm-usage-credits pages Documented new TokensExhausted exception Refs #1, #2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Initial plugin design #1

Initial plugin design #1

simonw commented Dec 2, 2024

simonw commented Dec 2, 2024 •

edited

Loading

simonw commented Dec 2, 2024

simonw commented Dec 2, 2024

simonw commented Dec 2, 2024

simonw commented Dec 2, 2024

simonw commented Dec 2, 2024

simonw commented Dec 2, 2024

simonw commented Dec 2, 2024

simonw commented Dec 2, 2024

simonw commented Dec 2, 2024

simonw commented Dec 2, 2024 •

edited

Loading

simonw commented Dec 2, 2024

simonw commented Dec 2, 2024

simonw commented Dec 2, 2024

simonw commented Dec 2, 2024

simonw commented Dec 2, 2024

simonw commented Dec 2, 2024 •

edited

Loading

simonw commented Dec 2, 2024

Initial plugin design #1

Initial plugin design #1

Comments

simonw commented Dec 2, 2024

simonw commented Dec 2, 2024 • edited Loading

simonw commented Dec 2, 2024

simonw commented Dec 2, 2024

simonw commented Dec 2, 2024

simonw commented Dec 2, 2024

simonw commented Dec 2, 2024

simonw commented Dec 2, 2024

simonw commented Dec 2, 2024

simonw commented Dec 2, 2024

simonw commented Dec 2, 2024

simonw commented Dec 2, 2024 • edited Loading

simonw commented Dec 2, 2024

simonw commented Dec 2, 2024

simonw commented Dec 2, 2024

simonw commented Dec 2, 2024

simonw commented Dec 2, 2024

simonw commented Dec 2, 2024 • edited Loading

simonw commented Dec 2, 2024

simonw commented Dec 2, 2024 •

edited

Loading

simonw commented Dec 2, 2024 •

edited

Loading

simonw commented Dec 2, 2024 •

edited

Loading