Add LLM model server metrics #1103

achandrasekar · 2024-05-31T18:19:08Z

This change adds common model server metrics that we want to standardize on. It starts of with two common latency metrics - time per output token and time to first token.

Fixes #1102

Changes

Please provide a brief description of the changes here.

Note: if the PR is touching an area that is not listed in the existing areas, or the area does not have sufficient domain experts coverage, the PR might be tagged as experts needed and move slowly until experts are identified.

This change adds common model server metrics that we want to standardize on. It starts of with two common latency metrics - time per output token and time to first token.

Merge requirement checklist

CONTRIBUTING.md guidelines followed.
Change log entry added, according to the guidelines in When to add a changelog entry.
- If your PR does not need a change log, start the PR title with [chore]
schema-next.yaml updated with changes to existing conventions.

achandrasekar · 2024-05-31T18:30:01Z

cc @lmolkova @SergeyKanzhelev

docs/gen-ai/gen-ai-metrics.md

model/metrics/gen-ai.yaml

.chloggen/1102.yaml

docs/gen-ai/gen-ai-metrics.md

model/metrics/gen-ai.yaml

docs/gen-ai/gen-ai-metrics.md

model/metrics/gen-ai.yaml

achandrasekar · 2024-06-06T19:03:12Z

cc @jsuereth to take a look as well

docs/gen-ai/gen-ai-metrics.md

This change adds common model server metrics that we want to standardize on. It starts of with two common latency metrics - time per output token and time to first token.

Co-authored-by: Liudmila Molkova <[email protected]>

model/metrics/gen-ai.yaml

Co-authored-by: Drew Robbins <[email protected]> Co-authored-by: Liudmila Molkova <[email protected]>

drewby

LGTM

model/metrics/gen-ai.yaml

achandrasekar requested review from a team May 31, 2024 18:19

github-actions bot assigned joaopgrassi May 31, 2024

achandrasekar requested a review from a team May 31, 2024 18:27

SergeyKanzhelev reviewed May 31, 2024

View reviewed changes

docs/gen-ai/gen-ai-metrics.md Outdated Show resolved Hide resolved

SergeyKanzhelev reviewed May 31, 2024

View reviewed changes

model/metrics/gen-ai.yaml Outdated Show resolved Hide resolved

SergeyKanzhelev reviewed May 31, 2024

View reviewed changes

model/metrics/gen-ai.yaml Outdated Show resolved Hide resolved

SergeyKanzhelev reviewed May 31, 2024

View reviewed changes

model/metrics/gen-ai.yaml Outdated Show resolved Hide resolved

lmolkova reviewed Jun 3, 2024

View reviewed changes

achandrasekar mentioned this pull request Jun 4, 2024

[Feature]: Additional metrics to enable better autoscaling / load balancing of vLLM servers in Kubernetes vllm-project/vllm#5041

Closed

drewby self-requested a review June 13, 2024 06:39

drewby reviewed Jun 13, 2024

View reviewed changes

docs/gen-ai/gen-ai-metrics.md Show resolved Hide resolved

docs/gen-ai/gen-ai-metrics.md Outdated Show resolved Hide resolved

docs/gen-ai/gen-ai-metrics.md Outdated Show resolved Hide resolved

gyliu513 reviewed Jun 19, 2024

View reviewed changes

docs/gen-ai/gen-ai-metrics.md Show resolved Hide resolved

lmolkova mentioned this pull request Jun 20, 2024

GenAI (LLM): how to capture streaming #1170

Open

achandrasekar and others added 11 commits June 20, 2024 21:27

Add LLM model server metrics

c277263

This change adds common model server metrics that we want to standardize on. It starts of with two common latency metrics - time per output token and time to first token.

Add changelog

3b81d8f

Address typos and clarify description

57f9e89

Add request_duration metric and fix description of other ones

479d2dc

Add markdown toc

1d2341f

Update .chloggen/1102.yaml to fix typo

b3117f4

Co-authored-by: Liudmila Molkova <[email protected]>

Update docs/gen-ai/gen-ai-metrics.md metric description

8e10b93

Co-authored-by: Liudmila Molkova <[email protected]>

Drop latency from the metric names

c250635

Render attribute table for the metrics

993411a

Addressed error type, buckets and descriptions

28548d8

Fix formatting after addressing conflicts

2873d9f

achandrasekar force-pushed the model-server-metrics branch from 4036f71 to 2873d9f Compare June 20, 2024 21:49

drewby requested changes Jun 21, 2024

View reviewed changes

model/metrics/gen-ai.yaml Outdated Show resolved Hide resolved

model/metrics/gen-ai.yaml Outdated Show resolved Hide resolved

lmolkova reviewed Jun 21, 2024

View reviewed changes

model/metrics/gen-ai.yaml Outdated Show resolved Hide resolved

lmolkova reviewed Jun 21, 2024

View reviewed changes

model/metrics/gen-ai.yaml Outdated Show resolved Hide resolved

achandrasekar and others added 2 commits June 20, 2024 20:16

Apply suggestions from code review

2c8c835

Co-authored-by: Drew Robbins <[email protected]> Co-authored-by: Liudmila Molkova <[email protected]>

Fix formatting and naming

9cd3b0d

drewby approved these changes Jun 21, 2024

View reviewed changes

lmolkova approved these changes Jun 21, 2024

View reviewed changes

Merge branch 'main' into model-server-metrics

a955cde

joaopgrassi approved these changes Jun 26, 2024

View reviewed changes

model/metrics/gen-ai.yaml Show resolved Hide resolved

jsuereth approved these changes Jun 26, 2024

View reviewed changes

model/metrics/gen-ai.yaml Show resolved Hide resolved

Merge branch 'main' into model-server-metrics

0cd2bdd

joaopgrassi added the area:gen-ai label Jun 27, 2024

joaopgrassi merged commit a328d73 into open-telemetry:main Jun 27, 2024
13 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add LLM model server metrics #1103

Add LLM model server metrics #1103

achandrasekar commented May 31, 2024 •

edited

Loading

achandrasekar commented May 31, 2024

achandrasekar commented Jun 6, 2024

drewby left a comment

Add LLM model server metrics #1103

Add LLM model server metrics #1103

Conversation

achandrasekar commented May 31, 2024 • edited Loading

Changes

Merge requirement checklist

achandrasekar commented May 31, 2024

achandrasekar commented Jun 6, 2024

drewby left a comment

Choose a reason for hiding this comment

achandrasekar commented May 31, 2024 •

edited

Loading