Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FEAT: Prometheus metrics exporter #906

Merged
merged 24 commits into from
Jan 19, 2024

Conversation

codingl2k1
Copy link
Contributor

@codingl2k1 codingl2k1 commented Jan 17, 2024

  • Add record_metrics method to SupervisorActor, WorkerActor and ModelActor.
  • Start metrics exporter server at worker.
  • Expose --metrics-exporter-host and --metrics-exporter-port to cmdline.

For prometheus scraping:

  • Each worker launches a metrics export server, the host and port can be specified by --metrics-exporter-host and --metrics-exporter-port.
  • Supervisor {endpoint}/metrics is also a metrics export server, it collects the metrics of RESTful API.

Known issue:

  • Some backends do not have tokens information.

Metrics exporter server example:

# HELP xinference:exceptions_total_counter Total number of requested which generated an exception.
# TYPE xinference:exceptions_total_counter counter
# HELP xinference:generate_tokens_per_s Generate throughput in tokens/s.
# TYPE xinference:generate_tokens_per_s gauge
xinference:generate_tokens_per_s{format="pytorch",model="qwen-chat",node="127.0.0.1:47981",quantization="none",type="LLM"} 0.2784720574189427
# HELP xinference:input_tokens_total_counter Total number of input tokens.
# TYPE xinference:input_tokens_total_counter counter
xinference:input_tokens_total_counter{format="pytorch",model="qwen-chat",node="127.0.0.1:47981",quantization="none",type="LLM"} 20
# HELP xinference:output_tokens_total_counter Total number of output tokens.
# TYPE xinference:output_tokens_total_counter counter
xinference:output_tokens_total_counter{format="pytorch",model="qwen-chat",node="127.0.0.1:47981",quantization="none",type="LLM"} 7
# HELP xinference:requests_total_counter Total number of requests received.
# TYPE xinference:requests_total_counter counter
xinference:requests_total_counter{method="GET",path="/ui"} 1
xinference:requests_total_counter{method="POST",path="/v1/models"} 1
xinference:requests_total_counter{method="GET",path="/v1/models/"} 2
xinference:requests_total_counter{method="GET",path="/v1/models"} 2
xinference:requests_total_counter{method="HEAD",path="/qwen-chat"} 1
xinference:requests_total_counter{method="POST",path="/v1/ui/{model_uid}"} 1
xinference:requests_total_counter{method="GET",path="/qwen-chat"} 4
xinference:requests_total_counter{method="POST",path="/qwen-chat"} 4
xinference:requests_total_counter{method="None",path="/qwen-chat"} 1
xinference:requests_total_counter{method="GET",path="/v1/cluster/auth"} 1
xinference:requests_total_counter{method="GET",path="/v1/models/{model_uid}"} 1
xinference:requests_total_counter{method="POST",path="/v1/chat/completions"} 1
# HELP xinference:responses_total_counter Total number of responses sent.
# TYPE xinference:responses_total_counter counter
xinference:responses_total_counter{method="GET",path="/v1/model_registrations/{model_type}"} 1
xinference:responses_total_counter{method="GET",path="/v1/cluster/devices"} 1
xinference:responses_total_counter{method="GET",path="/ui"} 1
xinference:responses_total_counter{method="POST",path="/v1/models"} 1
xinference:responses_total_counter{method="GET",path="/v1/models/"} 2
xinference:responses_total_counter{method="GET",path="/v1/models"} 2
xinference:responses_total_counter{method="HEAD",path="/qwen-chat"} 1
xinference:responses_total_counter{method="POST",path="/v1/ui/{model_uid}"} 1
xinference:responses_total_counter{method="GET",path="/qwen-chat"} 4
xinference:responses_total_counter{method="POST",path="/qwen-chat"} 4
xinference:responses_total_counter{method="GET",path="/v1/cluster/auth"} 1
xinference:responses_total_counter{method="GET",path="/v1/models/{model_uid}"} 1
xinference:responses_total_counter{method="POST",path="/v1/chat/completions"} 1
# HELP xinference:status_codes_counter Total number of response status codes.
# TYPE xinference:status_codes_counter counter
xinference:status_codes_counter{method="GET",path="/v1/model_registrations/{model_type}",status_code="200"} 1
xinference:status_codes_counter{method="GET",path="/v1/cluster/devices",status_code="200"} 1
xinference:status_codes_counter{method="GET",path="/ui",status_code="404"} 1
xinference:status_codes_counter{method="POST",path="/v1/models",status_code="200"} 1
xinference:status_codes_counter{method="GET",path="/v1/models/",status_code="307"} 2
xinference:status_codes_counter{method="GET",path="/v1/models",status_code="200"} 2
xinference:status_codes_counter{method="HEAD",path="/qwen-chat",status_code="404"} 1
xinference:status_codes_counter{method="POST",path="/v1/ui/{model_uid}",status_code="200"} 1
xinference:status_codes_counter{method="GET",path="/qwen-chat",status_code="307"} 1
xinference:status_codes_counter{method="GET",path="/qwen-chat",status_code="200"} 3
xinference:status_codes_counter{method="POST",path="/qwen-chat",status_code="200"} 4
xinference:status_codes_counter{method="GET",path="/v1/cluster/auth",status_code="200"} 1
xinference:status_codes_counter{method="GET",path="/v1/models/{model_uid}",status_code="200"} 1
xinference:status_codes_counter{method="POST",path="/v1/chat/completions",status_code="200"} 1
# HELP xinference:time_to_first_token_ms First token latency in ms.
# TYPE xinference:time_to_first_token_ms gauge
xinference:time_to_first_token_ms{format="pytorch",model="qwen-chat",node="127.0.0.1:47981",quantization="none",type="LLM"} 20076.820135116577

@XprobeBot XprobeBot added this to the v0.8.1 milestone Jan 17, 2024
@codingl2k1 codingl2k1 marked this pull request as ready for review January 18, 2024 07:26
@aresnow1 aresnow1 merged commit c1e1c5a into xorbitsai:main Jan 19, 2024
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants