Skip to content

Runner in charge of collecting metrics from LLM inference endpoints for the Unify Hub

License

Notifications You must be signed in to change notification settings

unifyai/aibench-llm-endpoints

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 

Repository files navigation

AIBench LLM Endpoints

Overview

This code provides a benchmarking runner, AIBench-LLM, for evaluating the performance of a large language model (LLM) inference endpoint. The benchmark measures various metrics such as Time to First Token (TTFT), End to End Latency, Inter-Token Latency (ITL), Output Tokens per Second, and more.

The AIBench Runner is in charge of collecting metrics from LLM inference endpoints for the Unify Hub. More information about the full methodology is available here 📑

Contributions and discussions around the methodology and the runner are definitely welcome, you can join the Unify Discord if this sounds interesting!

Metrics

The benchmark runner collects the following metrics:

  • load: Number of concurrent requests.
  • input_policy: Input policy used (short or long).
  • ttft: Time-to-first-token for each request.
  • e2e_latency: End-to-end latency for each request.
  • itl: Inter-token Latency.
  • cold_start: Cold start time (if applicable).
  • prompt_tokens: Number of tokens in the input prompt.
  • output_tokens: Number of tokens in the LLM output.
  • total_tokens: Total number of tokens (input + output).
  • output_tks_per_sec: Output tokens per second.
  • failed_queries: Number of failed queries.

Usage and Examples

To be added this week!

About

Runner in charge of collecting metrics from LLM inference endpoints for the Unify Hub

Topics

Resources

License

Stars

Watchers

Forks

Contributors 3

  •  
  •  
  •  

Languages