[MISC] Add lora requests to metrics #9477

coolkp · 2024-10-17T22:39:30Z

This PR adds lora requests to log metrics. The metrics will be logged only when the lora is enabled. Here is an example:

# HELP vllm:lora_requests_info Running stats on lora requests waiting and under process.
# TYPE vllm:lora_requests_info gauge
vllm:lora_requests_info{max_lora="1",running_adapters="",waiting_adapters=""} 1.0

We plan to leverage this information for routing decisions in https://github.com/kubernetes-sigs/llm-instance-gateway

github-actions · 2024-10-17T22:39:43Z

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

Add ready label to the PR
Enable auto-merge.

🚀

comaniac

Otherwise LGTM

vllm/engine/metrics.py

vllm/engine/llm_engine.py

comaniac · 2024-10-17T23:17:47Z

vllm/engine/llm_engine.py

+        max_lora_stat = "0"
+        if self.lora_config:
+            max_lora_stat = str(self.lora_config.max_loras)


This seems always fixed? In this case can we don't dump this value?

across multiple deployments its hard to get this value, helps determine how many loras can be fitted on the server. You are right, its definitely static right now, initialised at runtime and thats it. I considered moving it to separate info metric like the cache config info. But I think in future there maybe value in enabling dynamic adjustment of max lora, like base_model which is static right now.

comaniac

LGTM

Co-authored-by: Kunjan Patel <kunjanp_google_com@vllm.us-central1-a.c.kunjanp-gke-dev-2.internal> Signed-off-by: charlifu <[email protected]>

Co-authored-by: Kunjan Patel <kunjanp_google_com@vllm.us-central1-a.c.kunjanp-gke-dev-2.internal> Signed-off-by: Vinay Damodaran <[email protected]>

Co-authored-by: Kunjan Patel <kunjanp_google_com@vllm.us-central1-a.c.kunjanp-gke-dev-2.internal> Signed-off-by: Alvant <[email protected]>

Co-authored-by: Kunjan Patel <kunjanp_google_com@vllm.us-central1-a.c.kunjanp-gke-dev-2.internal> Signed-off-by: Amit Garg <[email protected]>

Co-authored-by: Kunjan Patel <kunjanp_google_com@vllm.us-central1-a.c.kunjanp-gke-dev-2.internal> Signed-off-by: qishuai <[email protected]>

Co-authored-by: Kunjan Patel <kunjanp_google_com@vllm.us-central1-a.c.kunjanp-gke-dev-2.internal> Signed-off-by: Sumit Dubey <[email protected]>

Kunjan Patel added 2 commits October 17, 2024 21:54

Add lora information metrics

80f57dc

Add lora information metrics

22758be

coolkp requested review from WoosukKwon, zhuohan123, youkaichao, alexm-neuralmagic, comaniac and njhill as code owners October 17, 2024 22:39

Add lora information metrics formatting

1fae9e8

comaniac reviewed Oct 17, 2024

View reviewed changes

Kunjan Patel added 3 commits October 18, 2024 00:01

Add lora information metrics Resolve comments I

7541aa4

Add lora information metrics Resolve comments II

3b37cb4

Add lora information metrics Formatting

5e9418b

coolkp changed the title ~~[MISC] Add lora requestsd to metrics~~ [MISC] Add lora requests to metrics Oct 18, 2024

Formatting: sort imports

15e703b

coolkp requested a review from comaniac October 18, 2024 16:50

Kunjan Patel added 3 commits October 18, 2024 17:17

Add lora information metrics Formatting

5b63981

Add lora information metric, max-lora metric reason

c5ff381

Add lora information metric, max-lora metric reason

5bac26d

comaniac approved these changes Oct 18, 2024

View reviewed changes

comaniac added the ready ONLY add when PR is ready to merge/full CI is needed label Oct 18, 2024

comaniac enabled auto-merge (squash) October 18, 2024 18:51

comaniac merged commit 9bb10a7 into vllm-project:main Oct 18, 2024
70 of 71 checks passed

liu-cong mentioned this pull request Nov 6, 2024

[Feature]: Enhance integration with advanced LB/gateways with better load/cost reporting and LoRA management #10086

Open

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MISC] Add lora requests to metrics #9477

[MISC] Add lora requests to metrics #9477

coolkp commented Oct 17, 2024

github-actions bot commented Oct 17, 2024

comaniac left a comment

comaniac Oct 17, 2024

coolkp Oct 18, 2024

comaniac left a comment

[MISC] Add lora requests to metrics #9477

[MISC] Add lora requests to metrics #9477

Conversation

coolkp commented Oct 17, 2024

github-actions bot commented Oct 17, 2024

comaniac left a comment

Choose a reason for hiding this comment

comaniac Oct 17, 2024

Choose a reason for hiding this comment

coolkp Oct 18, 2024

Choose a reason for hiding this comment

comaniac left a comment

Choose a reason for hiding this comment