Skip to content
This repository has been archived by the owner on Mar 17, 2024. It is now read-only.

Add /health endpoint #49

Closed
analytically opened this issue Aug 12, 2019 · 14 comments
Closed

Add /health endpoint #49

analytically opened this issue Aug 12, 2019 · 14 comments
Assignees
Labels
enhancement New feature or request good first issue Good for newcomers

Comments

@analytically
Copy link

Currently / and /metrics return the full metrics. It'd be handy if there was a simple /health endpoint returning 200 OK to indicate service availability instead of returning metrics at all time.

@seglo seglo added enhancement New feature or request good first issue Good for newcomers labels Aug 14, 2019
@seglo
Copy link
Owner

seglo commented Aug 14, 2019

Yes, it's an expensive health check when there are lots of metrics to return. I'll give this some thought. Thanks for creating the issue.

@anbarasantr
Copy link
Contributor

@seglo I would like to work on this. Can you assign to me ?

@seglo
Copy link
Owner

seglo commented Sep 14, 2019

@anbarasantr I've assigned it to you. Thanks for volunteering to work on it!

@seglo
Copy link
Owner

seglo commented Sep 14, 2019

Another idea is to write a file to the container and use a shell command to check for its presence. Then we wouldn't need to add a library to create an endpoint.

@anbarasantr
Copy link
Contributor

@seglo I thought that the health end point will be useful to configure liveness and readiness probes for Containers. So should not we return success only when the prometheus client is exposing metrics and application is polling from Kafka successfully?

@seglo
Copy link
Owner

seglo commented Sep 14, 2019

@anbarasantr Yes, that's the reason for the health check. Right now the check will return a full scrape of all metrics, which is not ideal. I would like to avoid adding another HTTP library and endpoint to the project, or adding too much logic to assert "healthiness".

I think touching a file every poll interval, and updating the health checks to assert that the file exists, or has been updated recently is good enough.

Another idea would be to add a kafka-lag-exporter meta metric. For example in #46 I describe adding a metric that tracks how long poll intervals take, and exporting that to the prometheus endpoint. Then the container health check could make an HTTP call to filter just for that metric (https://github.com/lightbend/kafka-lag-exporter#filtering-metrics-without-prometheus-server), and optionally fail the health check if it's greater than some value (longer than the poll interval itself?)

@anbarasantr
Copy link
Contributor

@seglo Thanks for the suggestion. I personally like the second idea as it gives more visibility. I would like to pick #46 first which unblocks the /health endpoint task.

@seglo
Copy link
Owner

seglo commented Sep 14, 2019

@anbarasantr Ok, great. I'll assign that issue to you as well. Thanks.

@analytically
Copy link
Author

analytically commented Nov 13, 2019

Can this issue get some attention? We use the lag-exporter with Consul which is sensitive to health check size (snapshots) and thus this introduces unnecessary load for checking health.

@anbarasantr
Copy link
Contributor

@analytically sure. I will raise a pull request in few days.

@seglo
Copy link
Owner

seglo commented Nov 14, 2019

Thanks @anbarasantr !

@seglo
Copy link
Owner

seglo commented Jan 26, 2020

The new metadata poll timer metric should serve this purpose nicely (#105). It's been released in 0.6.0

@seglo seglo closed this as completed Jan 26, 2020
@alonisser
Copy link

Thanks, so what is the canonical way to use this as an http health check?

@analytically
Copy link
Author

/metrics?name[]=kafka_consumergroup_poll_time_ms

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement New feature or request good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

4 participants