FEAT: Prometheus metrics exporter #906

codingl2k1 · 2024-01-17T07:31:40Z

Add record_metrics method to SupervisorActor, WorkerActor and ModelActor.
Start metrics exporter server at worker.
Expose --metrics-exporter-host and --metrics-exporter-port to cmdline.

For prometheus scraping:

Each worker launches a metrics export server, the host and port can be specified by --metrics-exporter-host and --metrics-exporter-port.
Supervisor {endpoint}/metrics is also a metrics export server, it collects the metrics of RESTful API.

Known issue:

Some backends do not have tokens information.

Metrics exporter server example:

# HELP xinference:exceptions_total_counter Total number of requested which generated an exception.
# TYPE xinference:exceptions_total_counter counter
# HELP xinference:generate_tokens_per_s Generate throughput in tokens/s.
# TYPE xinference:generate_tokens_per_s gauge
xinference:generate_tokens_per_s{format="pytorch",model="qwen-chat",node="127.0.0.1:47981",quantization="none",type="LLM"} 0.2784720574189427
# HELP xinference:input_tokens_total_counter Total number of input tokens.
# TYPE xinference:input_tokens_total_counter counter
xinference:input_tokens_total_counter{format="pytorch",model="qwen-chat",node="127.0.0.1:47981",quantization="none",type="LLM"} 20
# HELP xinference:output_tokens_total_counter Total number of output tokens.
# TYPE xinference:output_tokens_total_counter counter
xinference:output_tokens_total_counter{format="pytorch",model="qwen-chat",node="127.0.0.1:47981",quantization="none",type="LLM"} 7
# HELP xinference:requests_total_counter Total number of requests received.
# TYPE xinference:requests_total_counter counter
xinference:requests_total_counter{method="GET",path="/ui"} 1
xinference:requests_total_counter{method="POST",path="/v1/models"} 1
xinference:requests_total_counter{method="GET",path="/v1/models/"} 2
xinference:requests_total_counter{method="GET",path="/v1/models"} 2
xinference:requests_total_counter{method="HEAD",path="/qwen-chat"} 1
xinference:requests_total_counter{method="POST",path="/v1/ui/{model_uid}"} 1
xinference:requests_total_counter{method="GET",path="/qwen-chat"} 4
xinference:requests_total_counter{method="POST",path="/qwen-chat"} 4
xinference:requests_total_counter{method="None",path="/qwen-chat"} 1
xinference:requests_total_counter{method="GET",path="/v1/cluster/auth"} 1
xinference:requests_total_counter{method="GET",path="/v1/models/{model_uid}"} 1
xinference:requests_total_counter{method="POST",path="/v1/chat/completions"} 1
# HELP xinference:responses_total_counter Total number of responses sent.
# TYPE xinference:responses_total_counter counter
xinference:responses_total_counter{method="GET",path="/v1/model_registrations/{model_type}"} 1
xinference:responses_total_counter{method="GET",path="/v1/cluster/devices"} 1
xinference:responses_total_counter{method="GET",path="/ui"} 1
xinference:responses_total_counter{method="POST",path="/v1/models"} 1
xinference:responses_total_counter{method="GET",path="/v1/models/"} 2
xinference:responses_total_counter{method="GET",path="/v1/models"} 2
xinference:responses_total_counter{method="HEAD",path="/qwen-chat"} 1
xinference:responses_total_counter{method="POST",path="/v1/ui/{model_uid}"} 1
xinference:responses_total_counter{method="GET",path="/qwen-chat"} 4
xinference:responses_total_counter{method="POST",path="/qwen-chat"} 4
xinference:responses_total_counter{method="GET",path="/v1/cluster/auth"} 1
xinference:responses_total_counter{method="GET",path="/v1/models/{model_uid}"} 1
xinference:responses_total_counter{method="POST",path="/v1/chat/completions"} 1
# HELP xinference:status_codes_counter Total number of response status codes.
# TYPE xinference:status_codes_counter counter
xinference:status_codes_counter{method="GET",path="/v1/model_registrations/{model_type}",status_code="200"} 1
xinference:status_codes_counter{method="GET",path="/v1/cluster/devices",status_code="200"} 1
xinference:status_codes_counter{method="GET",path="/ui",status_code="404"} 1
xinference:status_codes_counter{method="POST",path="/v1/models",status_code="200"} 1
xinference:status_codes_counter{method="GET",path="/v1/models/",status_code="307"} 2
xinference:status_codes_counter{method="GET",path="/v1/models",status_code="200"} 2
xinference:status_codes_counter{method="HEAD",path="/qwen-chat",status_code="404"} 1
xinference:status_codes_counter{method="POST",path="/v1/ui/{model_uid}",status_code="200"} 1
xinference:status_codes_counter{method="GET",path="/qwen-chat",status_code="307"} 1
xinference:status_codes_counter{method="GET",path="/qwen-chat",status_code="200"} 3
xinference:status_codes_counter{method="POST",path="/qwen-chat",status_code="200"} 4
xinference:status_codes_counter{method="GET",path="/v1/cluster/auth",status_code="200"} 1
xinference:status_codes_counter{method="GET",path="/v1/models/{model_uid}",status_code="200"} 1
xinference:status_codes_counter{method="POST",path="/v1/chat/completions",status_code="200"} 1
# HELP xinference:time_to_first_token_ms First token latency in ms.
# TYPE xinference:time_to_first_token_ms gauge
xinference:time_to_first_token_ms{format="pytorch",model="qwen-chat",node="127.0.0.1:47981",quantization="none",type="LLM"} 20076.820135116577

codingl2k1 added 4 commits January 16, 2024 17:33

dev

773c83e

dev

f9a0402

dev

2d30372

dev

1240c82

XprobeBot added the feature label Jan 17, 2024

XprobeBot added this to the v0.8.1 milestone Jan 17, 2024

codingl2k1 added 5 commits January 17, 2024 15:36

Fix

2fc137a

Fix

7bb03b1

Fix ut

56472e9

Remove middleware

96e4952

Add RESTful API metrics

72e2277

qinxuye mentioned this pull request Jan 18, 2024

希望未来能导出一些Prometheus指标，监控worker或模型的负载 #908

Closed

codingl2k1 added 8 commits January 18, 2024 10:55

Merge remote-tracking branch 'origin/main' into feat/prometheus_exporter

12b4840

dev

10f7b48

dev

14a78f0

Fix lint

9a4d10e

Fix lint

8106e37

Fix

91ed921

Fix

7ade7fc

Fix lint

24e8f73

codingl2k1 marked this pull request as ready for review January 18, 2024 07:26

codingl2k1 added 7 commits January 18, 2024 15:59

Fix generation throughput

68645d9

Fix

e0c6965

Rename first_token_latency to time_to_first_token

3813ff6

Add metrics to supervisor

80d09b9

Fix lint

089c6b3

Fix middleware error if port conflict

381f619

Fix registry

d9ee400

aresnow1 approved these changes Jan 19, 2024

View reviewed changes

aresnow1 merged commit c1e1c5a into xorbitsai:main Jan 19, 2024
12 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FEAT: Prometheus metrics exporter #906

FEAT: Prometheus metrics exporter #906

codingl2k1 commented Jan 17, 2024 •

edited

Loading

FEAT: Prometheus metrics exporter #906

FEAT: Prometheus metrics exporter #906

Conversation

codingl2k1 commented Jan 17, 2024 • edited Loading

codingl2k1 commented Jan 17, 2024 •

edited

Loading