Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ORCA Format KV Cache Utilization in Inference Response Header #7839

Draft
wants to merge 7 commits into
base: r24.10
Choose a base branch
from

Commits on Nov 22, 2024

  1. pulled metrics in inference request class SetResponseHeader to pull m…

    …etrics and add kv_utilization to the response header
    BenjaminBraunDev committed Nov 22, 2024
    Configuration menu
    Copy the full SHA
    07071c1 View commit details
    Browse the repository at this point in the history

Commits on Nov 27, 2024

  1. pulled metrics in inference request class HandleGenerate to pull metr…

    …ics and add kv_utilization and max_token_capacity to the inference request response header.
    BenjaminBraunDev committed Nov 27, 2024
    Configuration menu
    Copy the full SHA
    cb6434d View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    f228fcc View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    2a89d03 View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    6361175 View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    049917f View commit details
    Browse the repository at this point in the history
  6. Configuration menu
    Copy the full SHA
    ef2f7c6 View commit details
    Browse the repository at this point in the history