Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use upstream Prometheus client and Spring integration #1609

Merged
merged 5 commits into from
Oct 10, 2017

Conversation

abesto
Copy link
Member

@abesto abesto commented Jun 10, 2017

This change replaces our custom logic for exposing metrics with the implementation in the upstream Prometheus client, as noted in #1144 (comment). The notable improvement (other than less code) is the addition of histogram data of response times, using a servlet filter (the data looks something like this: https://gist.github.com/abesto/46aeb18ab45e5126ccebc515eb3fc99f)

Metrics comparison: before / after

There were two issues that blocked us from merging this for a while:

@abesto
Copy link
Member Author

abesto commented Jun 11, 2017

The Spring counter metrics don't become, and shouldn't become Prometheus counter metrics, since in Spring Boot, counters can decrease. Current practice is to just make metrics like this gauges in Prometheus (prometheus/client_java#254 (comment)).

We should also see more JVM metrics exposed, in a way more Prometheus friendly way, letting up improve the dashboard further.

<scope>test</scope>
<groupId>io.prometheus</groupId>
<artifactId>simpleclient</artifactId>
<version>0.0.23</version>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems ok to have an unstable dep here because the server-side dependency is optional (which makes it a part of the all-jar, but not a strict dep for customizers)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I don't speak fluent Maven. By unstable dep, do you mean we shouldn't pin the version of io.prometheus:simpleclient? Or that I shouldn't remove org.springframework:spring-test?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unstable I meant the patch-only version number :)

nit: maybe use a property to coordinate these? ex.

in the properties section
<simpleclient.version>0.0.23</simpleclient.version>

then in places like here:
${simpleclient.version}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see.

Yep, will set up a property for the version number.

@codefromthecrypt
Copy link
Member

might be worth making some sample traffic against this to ensure that collector metrics show up (they are probably the most important metric). ex run one of the example projects like https://github.com/openzipkin/pyramid_zipkin-example

@abesto
Copy link
Member Author

abesto commented Jun 12, 2017

The upstream client exposes all the Spring Boot Actuator metrics, in almost exactly the same way as the previous Zipkin implementation. Still, due diligence is due, I'll check.

@abesto
Copy link
Member Author

abesto commented Jun 12, 2017

Metrics after sending a few spans using pyramid_zipkin-example: https://gist.github.com/abesto/93bd70bf907d16d3b7ef4e408408f69f

Methinks we're fine:

# HELP counter_zipkin_collector_messages_http counter_zipkin_collector_messages_http
# TYPE counter_zipkin_collector_messages_http gauge
counter_zipkin_collector_messages_http 13.0

they are probably the most important metric

Oh! Then I guess I should add them to the Grafana dashboard.

@codefromthecrypt
Copy link
Member

codefromthecrypt commented Jun 12, 2017 via email

@abesto
Copy link
Member Author

abesto commented Jun 12, 2017

Important note, just realized: this change does actually break the currently used metric names. The Zipkin implementation trims the counter_ and gauge_ prefixes (and sets the Prometheus metric type accordingly), while the upstream one does not trim those prefixes, and makes everything a gauge in Prometheus (see #1609 (comment)).

Assumption: not a lot of people are currently using these metrics. Proposal: let's drop as much custom logic as we can, and break the metric names now, before publishing the dashboard using the metric names.

@codefromthecrypt
Copy link
Member

codefromthecrypt commented Jun 12, 2017 via email

@abesto
Copy link
Member Author

abesto commented Jun 12, 2017

@klette Having implemented the original Zipkin Prometheus exporter, what are your thoughts on this PR? Especially #1609 (comment)?

(That exhausts the list of people who contributed to the Prometheus exporter 😄 )

@@ -27,6 +27,7 @@

<properties>
<main.basedir>${project.basedir}/../..</main.basedir>
<prometheus_simpleclient.version>0.0.23</prometheus_simpleclient.version>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the fix to the aforementioned bug is in master now, so guessing 0.0.24 will sort it out

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0.0.26 is out, so probably time to see if we're sorted?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😨 Sad response time is sad. Yep, I'll check it out today.

Copy link
Member Author

@abesto abesto Aug 16, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Confirmed, Prometheus can successfully scrape data with 0.0.26. Pushed a commit; I'll start updating the Grafana dashboard for the new metric names, upload once this is merged, then we can hopefully also merge openzipkin-attic/docker-zipkin#135

abesto added a commit to openzipkin-attic/docker-zipkin that referenced this pull request Aug 16, 2017
@codefromthecrypt
Copy link
Member

@klette nagtime!

Copy link
Member

@codefromthecrypt codefromthecrypt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will merge after testing against dashboard

codefromthecrypt pushed a commit to openzipkin-attic/docker-zipkin that referenced this pull request Oct 10, 2017
@codefromthecrypt codefromthecrypt merged commit ed2a05d into master Oct 10, 2017
@codefromthecrypt codefromthecrypt deleted the prometheus-use-upstream branch October 10, 2017 08:31
@codefromthecrypt
Copy link
Member

gracias!

abesto added a commit to abesto/zipkin that referenced this pull request Sep 10, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants