Add /health endpoint #49

analytically · 2019-08-12T19:28:41Z

Currently / and /metrics return the full metrics. It'd be handy if there was a simple /health endpoint returning 200 OK to indicate service availability instead of returning metrics at all time.

seglo · 2019-08-14T02:25:55Z

Yes, it's an expensive health check when there are lots of metrics to return. I'll give this some thought. Thanks for creating the issue.

anbarasantr · 2019-09-14T06:29:44Z

@seglo I would like to work on this. Can you assign to me ?

seglo · 2019-09-14T13:22:54Z

@anbarasantr I've assigned it to you. Thanks for volunteering to work on it!

seglo · 2019-09-14T16:13:13Z

Another idea is to write a file to the container and use a shell command to check for its presence. Then we wouldn't need to add a library to create an endpoint.

anbarasantr · 2019-09-14T16:21:02Z

@seglo I thought that the health end point will be useful to configure liveness and readiness probes for Containers. So should not we return success only when the prometheus client is exposing metrics and application is polling from Kafka successfully?

seglo · 2019-09-14T16:37:56Z

@anbarasantr Yes, that's the reason for the health check. Right now the check will return a full scrape of all metrics, which is not ideal. I would like to avoid adding another HTTP library and endpoint to the project, or adding too much logic to assert "healthiness".

I think touching a file every poll interval, and updating the health checks to assert that the file exists, or has been updated recently is good enough.

Another idea would be to add a kafka-lag-exporter meta metric. For example in #46 I describe adding a metric that tracks how long poll intervals take, and exporting that to the prometheus endpoint. Then the container health check could make an HTTP call to filter just for that metric (https://github.com/lightbend/kafka-lag-exporter#filtering-metrics-without-prometheus-server), and optionally fail the health check if it's greater than some value (longer than the poll interval itself?)

anbarasantr · 2019-09-14T16:49:23Z

@seglo Thanks for the suggestion. I personally like the second idea as it gives more visibility. I would like to pick #46 first which unblocks the /health endpoint task.

seglo · 2019-09-14T18:03:01Z

@anbarasantr Ok, great. I'll assign that issue to you as well. Thanks.

analytically · 2019-11-13T20:37:10Z

Can this issue get some attention? We use the lag-exporter with Consul which is sensitive to health check size (snapshots) and thus this introduces unnecessary load for checking health.

anbarasantr · 2019-11-14T06:21:37Z

@analytically sure. I will raise a pull request in few days.

seglo · 2019-11-14T08:42:10Z

Thanks @anbarasantr !

seglo · 2020-01-26T18:50:42Z

The new metadata poll timer metric should serve this purpose nicely (#105). It's been released in 0.6.0

alonisser · 2020-05-03T16:58:02Z

Thanks, so what is the canonical way to use this as an http health check?

analytically · 2020-05-11T18:54:30Z

/metrics?name[]=kafka_consumergroup_poll_time_ms

seglo added enhancement New feature or request good first issue Good for newcomers labels Aug 14, 2019

seglo assigned anbarasantr Sep 14, 2019

anbarasantr mentioned this issue Sep 14, 2019

Metadata poll timer metric #46

Closed

anbarasantr mentioned this issue Nov 24, 2019

Add Metadata poll timer metric #105

Merged

seglo closed this as completed Jan 26, 2020

This was referenced May 7, 2021

Use kafka_consumergroup_poll_time_ms metric as healthcheck #230

Closed

Use kafka_consumergroup_poll_time_ms metric as healthcheck #231

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add /health endpoint #49

Add /health endpoint #49

analytically commented Aug 12, 2019

seglo commented Aug 14, 2019

anbarasantr commented Sep 14, 2019

seglo commented Sep 14, 2019

seglo commented Sep 14, 2019

anbarasantr commented Sep 14, 2019

seglo commented Sep 14, 2019

anbarasantr commented Sep 14, 2019

seglo commented Sep 14, 2019

analytically commented Nov 13, 2019 •

edited

Loading

anbarasantr commented Nov 14, 2019

seglo commented Nov 14, 2019

seglo commented Jan 26, 2020

alonisser commented May 3, 2020

analytically commented May 11, 2020

Add /health endpoint #49

Add /health endpoint #49

Comments

analytically commented Aug 12, 2019

seglo commented Aug 14, 2019

anbarasantr commented Sep 14, 2019

seglo commented Sep 14, 2019

seglo commented Sep 14, 2019

anbarasantr commented Sep 14, 2019

seglo commented Sep 14, 2019

anbarasantr commented Sep 14, 2019

seglo commented Sep 14, 2019

analytically commented Nov 13, 2019 • edited Loading

anbarasantr commented Nov 14, 2019

seglo commented Nov 14, 2019

seglo commented Jan 26, 2020

alonisser commented May 3, 2020

analytically commented May 11, 2020

analytically commented Nov 13, 2019 •

edited

Loading