Abstract out token usage numbers #610

simonw · 2024-11-06T02:15:22Z

Most APIs return the number of input and output tokens used (and sometimes a more detailed breakdown of those categories). I want to provide an abstraction over those to make it easier to implement tools on top of LLM that do token accounting.

simonw · 2024-11-06T04:05:25Z

The most complex form of token accounting right now is OpenAI - their "usage" blocks look like this as-of yesterday when they added Predicted Outputs:

{
  "completion_tokens": 9,
  "prompt_tokens": 8,
  "total_tokens": 17,
  "prompt_tokens_details": {
    "cached_tokens": 0,
    "audio_tokens": 0
  },
  "completion_tokens_details": {
    "reasoning_tokens": 0,
    "audio_tokens": 0,
    "accepted_prediction_tokens": 0,
    "rejected_prediction_tokens": 0
  }
}

simonw · 2024-11-06T04:06:46Z

If my eventual goal is to support code that can calculate dollar spends I'll also need to consider how I store the fact that a prompt was executed in batch mode, which often provides a 50% discount. LLM doesn't have a mechanism for batch mode yet but it's definitely a potentially useful feature.

simonw · 2024-11-06T04:08:46Z

Given the complexity of the OpenAI response I'm tempted to add a JSON column to store this. But the above example would actually be a waste of JSON, since the only things that actually matter in there are completion_tokens=9 and prompt_tokens=8 - everything else is safe to ignore.

I could have columns for input_tokens and output_tokens (better names than "completion" and "prompt" in my opinion) and a nullable JSON column for token_details which I would leave blank in this case but populate for cases when those other token categories are worth recording.

simonw · 2024-11-06T04:10:36Z

Gemini 1.5 Pro usage looks like this (according to llm logs -m gemini-1.5-pro-latest --json):

"usageMetadata": {
  "promptTokenCount": 14,
  "candidatesTokenCount": 26,
  "totalTokenCount": 40
}

Anthropic is even simpler:

"usage": {
  "input_tokens": 8,
  "output_tokens": 18
}

simonw · 2024-11-06T04:12:41Z

xAI/grok-beta:

"usage": {
  "completion_tokens": 349,
  "prompt_tokens": 12,
  "total_tokens": 361
}

lambdalabs/hermes3-405b:

"usage": {
  "prompt_tokens": 15,
  "completion_tokens": 1,
  "total_tokens": 16,
  "prompt_tokens_details": null,
  "completion_tokens_details": null
}

simonw · 2024-11-14T22:30:02Z

Maybe start with something like this:

@dataclass
class Usage:
    model_id: str
    input_tokens: int
    output_tokens: int
    details: Dict[str, int]

(Update: I ditched this idea)

simonw · 2024-11-18T22:54:46Z

I'm going to add a method to the Response class called .set_usage() with arguments input= and output= and details=.

def set_usage(input: int, output: int, details: dict = None) -> None:

simonw · 2024-11-20T02:17:43Z

OpenAI detailed usage blocks are pretty lengthy:

      {
        "completion_tokens": 462,
        "prompt_tokens": 11,
        "total_tokens": 473,
        "prompt_tokens_details": {
          "cached_tokens": 0,
          "audio_tokens": 0
        },
        "completion_tokens_details": {
          "reasoning_tokens": 0,
          "audio_tokens": 0,
          "accepted_prediction_tokens": 0,
          "rejected_prediction_tokens": 0
        }
      }

I'm going to trim out any keys that have a value of 0, and any nested blocks where everything is a zero.

I'm also going to pull out total_tokens and completion_tokens and prompt_tokens because they are stored separately.

simonw · 2024-11-20T02:20:13Z

{
  "completion_tokens": 421,
  "prompt_tokens": 30791,
  "total_tokens": 31212,
  "prompt_tokens_details": {
    "cached_tokens": 30592,
    "audio_tokens": 0
  },
  "completion_tokens_details": {
    "reasoning_tokens": 0,
    "audio_tokens": 0,
    "accepted_prediction_tokens": 0,
    "rejected_prediction_tokens": 0
  }
}

Becomes:

{
  "prompt_tokens_details": {"cached_tokens": 30592}
}

Using code I got Code Interpreter to write and test for me: https://chatgpt.com/share/673d4727-a148-8006-a11a-485bbe2822d0

def simplify_usage_dict(d):
    # Recursively remove keys with value 0 and empty dictionaries
    def remove_empty_and_zero(obj):
        if isinstance(obj, dict):
            cleaned = {
                k: remove_empty_and_zero(v)
                for k, v in obj.items()
                if v != 0 and v != {}
            }
            return {k: v for k, v in cleaned.items() if v is not None and v != {}}
        return obj

    return remove_empty_and_zero(d) or {}

simonw · 2024-11-20T02:34:41Z

Idea: add -u/--usage option to llm prompt so you can see usage information directly after running a prompt.

simonw · 2024-11-20T02:38:42Z

Got this working:

Refs #610 (comment)

Refs #642 (comment)

Refs #610, #641

Refs simonw/llm#610

Refs #29, simonw/llm#610

Refs simonw/llm#610

Refs #25, simonw/llm#610

Refs simonw/llm#610

Refs #15, simonw/llm#610

shivakanthsujit · 2024-11-22T08:24:48Z

Oh I was working on hacking something together with Claude just for this. I'm doing something very similar to this with the only change that I had wanted this to pop up as status indicator in ZSH similar to how it does for execution time. Was only modifying llm-gemini since that's what I use. Specifically

@dataclass
class ResponseMetadata:
    msg: str
    prompt_token_count: int = None
    candidate_token_count: int = None
    total_token_count: int = None

    def __str__(self):
        return self.msg
    
    def __repr__(self):
        return self.msg

Then in cli.py I write the relevant info to a txt file in the user_dir() which zsh can check for

# in the cli function
if hasattr(chunk, 'prompt_token_count'):
            input_tokens = chunk.prompt_token_count
            output_tokens = chunk.candidate_token_count
            total_tokens = chunk.total_token_count
            write_token_info(input_tokens, output_tokens)

def write_token_info(input_tokens, output_tokens):
    """Write token counts to a temporary file"""
    token_file = os.path.join(user_dir() / "llm_last_tokens")
    with open(token_file, "w") as f:
        f.write(f"{input_tokens}:{output_tokens}")

Claude suggested this for hooking into zsh, I don't know if this is the recommended way. Haven't used hooks in zsh before

# Function to capture and display token information
function llm_token_hook() {
    # Get the last command
    local last_cmd="$(fc -ln -1)"
    # Check if 'llm' appears anywhere in the command
    if [[ $last_cmd =~ "llm" ]]; then
        local token_file='$HOME/Library/Application Support/io.datasette.llm/llm_last_tokens'
        if [[ -f "$token_file" ]]; then
            local token_info="$(cat $token_file)"
            if [[ $token_info =~ "([0-9]+):([0-9]+)" ]]; then
                local input_tokens=$match[1]
                local output_tokens=$match[2]
                local total_tokens=$((input_tokens + output_tokens))
                print -P "%F{cyan}tokens%f %F{yellow}${total_tokens}%f (in: ${input_tokens}, out: ${output_tokens})"
                rm "$token_file"
            fi
        fi
    fi
}

# Add the hook to ZSH
autoload -U add-zsh-hook
add-zsh-hook precmd llm_token_hook

Produces an output like so

Refs #495, #610, #640, #641, #644, #653

simonw added the enhancement New feature or request label Nov 6, 2024

simonw mentioned this issue Nov 6, 2024

OpenAI token usage stored incorrectly #614

Closed

This was referenced Nov 14, 2024

Report estimated cost / token usage in the end of response #625

Open

llm.get_async_model(), llm.AsyncModel base class and OpenAI async models #613

Merged

simonw pinned this issue Nov 18, 2024

simonw added a commit that referenced this issue Nov 20, 2024

Store input_tokens, output_tokens, token_details on Response, refs #610

80956da

simonw mentioned this issue Nov 20, 2024

Log input tokens, output tokens and token details #642

Merged

5 tasks

simonw added a commit that referenced this issue Nov 20, 2024

Update schema with cog, refs #610

3306f3a

simonw added a commit that referenced this issue Nov 20, 2024

llm prompt -u/--usage option

08d4376

Refs #610 (comment)

simonw added a commit that referenced this issue Nov 20, 2024

llm logs -u/--usage option, refs #610

d7eb138

Refs #642 (comment)

simonw added a commit that referenced this issue Nov 20, 2024

Docs on tracking token usage in plugins, refs #610

fff6a92

simonw closed this as completed in cfb10f4 Nov 20, 2024

simonw added a commit that referenced this issue Nov 20, 2024

Release 0.19a0

02852fe

Refs #610, #641

simonw mentioned this issue Nov 20, 2024

Track input/output token counts simonw/llm-claude-3#29

Closed

simonw added a commit to simonw/llm-claude-3 that referenced this issue Nov 20, 2024

Use response.set_usage(), closes #29

fd898ff

Refs simonw/llm#610

simonw added a commit to simonw/llm-claude-3 that referenced this issue Nov 20, 2024

Release 0.10a0

ec5b3bd

Refs #29, simonw/llm#610

simonw mentioned this issue Nov 20, 2024

Track input/output token counts simonw/llm-gemini#25

Closed

simonw added a commit to simonw/llm-gemini that referenced this issue Nov 20, 2024

Use response.set_usage(), closes #25

c702761

Refs simonw/llm#610

simonw added a commit to simonw/llm-gemini that referenced this issue Nov 20, 2024

Release 0.5a0

a65881e

Refs #25, simonw/llm#610

simonw unpinned this issue Nov 20, 2024

simonw mentioned this issue Nov 20, 2024

Utility method for getting usage details from Response #644

Closed

simonw mentioned this issue Nov 20, 2024

Track input/output token counts simonw/llm-mistral#15

Closed

simonw added a commit to simonw/llm-mistral that referenced this issue Nov 20, 2024

Use response.set_usage(), closes #15

8da8e35

Refs simonw/llm#610

simonw added a commit to simonw/llm-mistral that referenced this issue Nov 20, 2024

Release 0.9a0

775ea0a

Refs #15, simonw/llm#610

simonw added a commit that referenced this issue Dec 1, 2024

Release 0.19

c018104

Refs #495, #610, #640, #641, #644, #653

simonw mentioned this issue Dec 2, 2024

Initial plugin design datasette/datasette-llm-usage#1

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Abstract out token usage numbers #610

Abstract out token usage numbers #610

simonw commented Nov 6, 2024

simonw commented Nov 6, 2024 •

edited

Loading

simonw commented Nov 6, 2024

simonw commented Nov 6, 2024 •

edited

Loading

simonw commented Nov 6, 2024 •

edited

Loading

simonw commented Nov 6, 2024

simonw commented Nov 14, 2024 •

edited

Loading

simonw commented Nov 18, 2024

simonw commented Nov 20, 2024

simonw commented Nov 20, 2024

simonw commented Nov 20, 2024

simonw commented Nov 20, 2024

shivakanthsujit commented Nov 22, 2024

Abstract out token usage numbers #610

Abstract out token usage numbers #610

Comments

simonw commented Nov 6, 2024

simonw commented Nov 6, 2024 • edited Loading

simonw commented Nov 6, 2024

simonw commented Nov 6, 2024 • edited Loading

simonw commented Nov 6, 2024 • edited Loading

simonw commented Nov 6, 2024

simonw commented Nov 14, 2024 • edited Loading

simonw commented Nov 18, 2024

simonw commented Nov 20, 2024

simonw commented Nov 20, 2024

simonw commented Nov 20, 2024

simonw commented Nov 20, 2024

shivakanthsujit commented Nov 22, 2024

simonw commented Nov 6, 2024 •

edited

Loading

simonw commented Nov 6, 2024 •

edited

Loading

simonw commented Nov 6, 2024 •

edited

Loading

simonw commented Nov 14, 2024 •

edited

Loading