Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Abstract out token usage numbers #610

Closed
simonw opened this issue Nov 6, 2024 · 12 comments
Closed

Abstract out token usage numbers #610

simonw opened this issue Nov 6, 2024 · 12 comments
Labels
enhancement New feature or request

Comments

@simonw
Copy link
Owner

simonw commented Nov 6, 2024

Most APIs return the number of input and output tokens used (and sometimes a more detailed breakdown of those categories). I want to provide an abstraction over those to make it easier to implement tools on top of LLM that do token accounting.

@simonw simonw added the enhancement New feature or request label Nov 6, 2024
@simonw
Copy link
Owner Author

simonw commented Nov 6, 2024

The most complex form of token accounting right now is OpenAI - their "usage" blocks look like this as-of yesterday when they added Predicted Outputs:

{
  "completion_tokens": 9,
  "prompt_tokens": 8,
  "total_tokens": 17,
  "prompt_tokens_details": {
    "cached_tokens": 0,
    "audio_tokens": 0
  },
  "completion_tokens_details": {
    "reasoning_tokens": 0,
    "audio_tokens": 0,
    "accepted_prediction_tokens": 0,
    "rejected_prediction_tokens": 0
  }
}

@simonw
Copy link
Owner Author

simonw commented Nov 6, 2024

If my eventual goal is to support code that can calculate dollar spends I'll also need to consider how I store the fact that a prompt was executed in batch mode, which often provides a 50% discount. LLM doesn't have a mechanism for batch mode yet but it's definitely a potentially useful feature.

@simonw
Copy link
Owner Author

simonw commented Nov 6, 2024

Given the complexity of the OpenAI response I'm tempted to add a JSON column to store this. But the above example would actually be a waste of JSON, since the only things that actually matter in there are completion_tokens=9 and prompt_tokens=8 - everything else is safe to ignore.

I could have columns for input_tokens and output_tokens (better names than "completion" and "prompt" in my opinion) and a nullable JSON column for token_details which I would leave blank in this case but populate for cases when those other token categories are worth recording.

@simonw
Copy link
Owner Author

simonw commented Nov 6, 2024

Gemini 1.5 Pro usage looks like this (according to llm logs -m gemini-1.5-pro-latest --json):

"usageMetadata": {
  "promptTokenCount": 14,
  "candidatesTokenCount": 26,
  "totalTokenCount": 40
}

Anthropic is even simpler:

"usage": {
  "input_tokens": 8,
  "output_tokens": 18
}

@simonw
Copy link
Owner Author

simonw commented Nov 6, 2024

xAI/grok-beta:

"usage": {
  "completion_tokens": 349,
  "prompt_tokens": 12,
  "total_tokens": 361
}

lambdalabs/hermes3-405b:

"usage": {
  "prompt_tokens": 15,
  "completion_tokens": 1,
  "total_tokens": 16,
  "prompt_tokens_details": null,
  "completion_tokens_details": null
}

@simonw
Copy link
Owner Author

simonw commented Nov 14, 2024

Maybe start with something like this:

@dataclass
class Usage:
    model_id: str
    input_tokens: int
    output_tokens: int
    details: Dict[str, int]

(Update: I ditched this idea)

@simonw simonw pinned this issue Nov 18, 2024
@simonw
Copy link
Owner Author

simonw commented Nov 18, 2024

I'm going to add a method to the Response class called .set_usage() with arguments input= and output= and details=.

def set_usage(input: int, output: int, details: dict = None) -> None:

@simonw
Copy link
Owner Author

simonw commented Nov 20, 2024

OpenAI detailed usage blocks are pretty lengthy:

      {
        "completion_tokens": 462,
        "prompt_tokens": 11,
        "total_tokens": 473,
        "prompt_tokens_details": {
          "cached_tokens": 0,
          "audio_tokens": 0
        },
        "completion_tokens_details": {
          "reasoning_tokens": 0,
          "audio_tokens": 0,
          "accepted_prediction_tokens": 0,
          "rejected_prediction_tokens": 0
        }
      }

I'm going to trim out any keys that have a value of 0, and any nested blocks where everything is a zero.

I'm also going to pull out total_tokens and completion_tokens and prompt_tokens because they are stored separately.

@simonw
Copy link
Owner Author

simonw commented Nov 20, 2024

{
  "completion_tokens": 421,
  "prompt_tokens": 30791,
  "total_tokens": 31212,
  "prompt_tokens_details": {
    "cached_tokens": 30592,
    "audio_tokens": 0
  },
  "completion_tokens_details": {
    "reasoning_tokens": 0,
    "audio_tokens": 0,
    "accepted_prediction_tokens": 0,
    "rejected_prediction_tokens": 0
  }
}

Becomes:

{
  "prompt_tokens_details": {"cached_tokens": 30592}
}

Using code I got Code Interpreter to write and test for me: https://chatgpt.com/share/673d4727-a148-8006-a11a-485bbe2822d0

def simplify_usage_dict(d):
    # Recursively remove keys with value 0 and empty dictionaries
    def remove_empty_and_zero(obj):
        if isinstance(obj, dict):
            cleaned = {
                k: remove_empty_and_zero(v)
                for k, v in obj.items()
                if v != 0 and v != {}
            }
            return {k: v for k, v in cleaned.items() if v is not None and v != {}}
        return obj

    return remove_empty_and_zero(d) or {}

@simonw
Copy link
Owner Author

simonw commented Nov 20, 2024

Idea: add -u/--usage option to llm prompt so you can see usage information directly after running a prompt.

@simonw
Copy link
Owner Author

simonw commented Nov 20, 2024

Got this working:

CleanShot 2024-11-19 at 18 38 26@2x

simonw added a commit that referenced this issue Nov 20, 2024
simonw added a commit that referenced this issue Nov 20, 2024
@simonw simonw closed this as completed in cfb10f4 Nov 20, 2024
simonw added a commit that referenced this issue Nov 20, 2024
simonw added a commit to simonw/llm-claude-3 that referenced this issue Nov 20, 2024
simonw added a commit to simonw/llm-claude-3 that referenced this issue Nov 20, 2024
simonw added a commit to simonw/llm-gemini that referenced this issue Nov 20, 2024
simonw added a commit to simonw/llm-gemini that referenced this issue Nov 20, 2024
@simonw simonw unpinned this issue Nov 20, 2024
simonw added a commit to simonw/llm-mistral that referenced this issue Nov 20, 2024
simonw added a commit to simonw/llm-mistral that referenced this issue Nov 20, 2024
@shivakanthsujit
Copy link

Oh I was working on hacking something together with Claude just for this. I'm doing something very similar to this with the only change that I had wanted this to pop up as status indicator in ZSH similar to how it does for execution time. Was only modifying llm-gemini since that's what I use. Specifically

@dataclass
class ResponseMetadata:
    msg: str
    prompt_token_count: int = None
    candidate_token_count: int = None
    total_token_count: int = None

    def __str__(self):
        return self.msg
    
    def __repr__(self):
        return self.msg

Then in cli.py I write the relevant info to a txt file in the user_dir() which zsh can check for

# in the cli function
if hasattr(chunk, 'prompt_token_count'):
            input_tokens = chunk.prompt_token_count
            output_tokens = chunk.candidate_token_count
            total_tokens = chunk.total_token_count
            write_token_info(input_tokens, output_tokens)

def write_token_info(input_tokens, output_tokens):
    """Write token counts to a temporary file"""
    token_file = os.path.join(user_dir() / "llm_last_tokens")
    with open(token_file, "w") as f:
        f.write(f"{input_tokens}:{output_tokens}")          

Claude suggested this for hooking into zsh, I don't know if this is the recommended way. Haven't used hooks in zsh before

# Function to capture and display token information
function llm_token_hook() {
    # Get the last command
    local last_cmd="$(fc -ln -1)"
    # Check if 'llm' appears anywhere in the command
    if [[ $last_cmd =~ "llm" ]]; then
        local token_file='$HOME/Library/Application Support/io.datasette.llm/llm_last_tokens'
        if [[ -f "$token_file" ]]; then
            local token_info="$(cat $token_file)"
            if [[ $token_info =~ "([0-9]+):([0-9]+)" ]]; then
                local input_tokens=$match[1]
                local output_tokens=$match[2]
                local total_tokens=$((input_tokens + output_tokens))
                print -P "%F{cyan}tokens%f %F{yellow}${total_tokens}%f (in: ${input_tokens}, out: ${output_tokens})"
                rm "$token_file"
            fi
        fi
    fi
}

# Add the hook to ZSH
autoload -U add-zsh-hook
add-zsh-hook precmd llm_token_hook

Produces an output like so

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants