New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

ORCA Format KV Cache Utilization in Inference Response Header #7839

Draft

BenjaminBraunDev wants to merge 7 commits into triton-inference-server:r24.10 from BenjaminBraunDev:r24.10

Commits on Nov 22, 2024

pulled metrics in inference request class SetResponseHeader to pull m…
```
…etrics and add kv_utilization to the response header
```
BenjaminBraunDev committed Nov 22, 2024
Configuration menu
View commit details

Copy full SHA for 07071c1

Browse repository at this point
Copy the full SHA

07071c1 View commit details

Browse the repository at this point in the history

Commits on Nov 27, 2024

pulled metrics in inference request class HandleGenerate to pull metr…
```
…ics and add kv_utilization and max_token_capacity to the inference request response header.
```
BenjaminBraunDev committed Nov 27, 2024
Configuration menu
View commit details

Copy full SHA for cb6434d

Browse repository at this point
Copy the full SHA

cb6434d View commit details

Browse the repository at this point in the history
Merge branch 'r24.10' of https://github.com/BenjaminBraunDev/server-f…
```
…ork-orca-header into r24.10
```
BenjaminBraunDev committed Nov 27, 2024
Configuration menu
View commit details

Copy full SHA for f228fcc

Browse repository at this point
Copy the full SHA

f228fcc View commit details

Browse the repository at this point in the history
Moved kv-cache metrics from SetResponseHeader() to HandleGenerate(), …
```
…verified functionality.
```
BenjaminBraunDev committed Nov 27, 2024
Configuration menu
View commit details

Copy full SHA for 2a89d03

Browse repository at this point
Copy the full SHA

2a89d03 View commit details

Browse the repository at this point in the history
Merge branch 'r24.10' of https://github.com/BenjaminBraunDev/server-f…
```
…ork-orca-header into HEAD
```
BenjaminBraunDev committed Nov 27, 2024
Configuration menu
View commit details

Copy full SHA for 6361175

Browse repository at this point
Copy the full SHA

6361175 View commit details

Browse the repository at this point in the history
Merge branch 'r24.10' of https://github.com/BenjaminBraunDev/server-f…
```
…ork-orca-header into r24.10
```
BenjaminBraunDev committed Nov 27, 2024
Configuration menu
View commit details

Copy full SHA for 049917f

Browse repository at this point
Copy the full SHA

049917f View commit details

Browse the repository at this point in the history
Merge branch 'r24.10' of https://github.com/BenjaminBraunDev/server-f…
```
…ork-orca-header into r24.10
```
BenjaminBraunDev committed Nov 27, 2024
Configuration menu
View commit details

Copy full SHA for ef2f7c6

Browse repository at this point
Copy the full SHA

ef2f7c6 View commit details

Browse the repository at this point in the history

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ORCA Format KV Cache Utilization in Inference Response Header #7839

ORCA Format KV Cache Utilization in Inference Response Header #7839

Commits on Nov 22, 2024

Commits on Nov 27, 2024

ORCA Format KV Cache Utilization in Inference Response Header #7839

Are you sure you want to change the base?

ORCA Format KV Cache Utilization in Inference Response Header #7839

Commits on Nov 22, 2024

Commits on Nov 27, 2024