-
Notifications
You must be signed in to change notification settings - Fork 329
Fix #121: Add example prometheus setup #135
Conversation
On second thought, maybe this doesn't satisfy the original intention, ie. to show what these metrics mean. A good way to solve that could be to release a Zipkin dashboard on https://grafana.net/dashboards. |
unless both are required in one step, probably good to merge this (as at least it helps people understand and test the integration of metrics regardless of how they are interpreted) |
While this should work fine without a dashboard, I went ahead and drafted a dashboard that includes all the currently exported data. I tried to organize it to the best of my ability: https://grafana.net/dashboards/1598 (currently draft, meaning it's viewable via this link, but doesn't show up in searches. At least I think that's what it means) I also created an OpenZipkin org on grafana.net so that we can share write access: https://grafana.net/openzipkin. Drop me your grafana.net username to get access. So I'd say: let's do one round of iteration on the dashboard (either by review or direct edit). I'll go ahead and update the README accordingly; once done, we can release the dashboard and merge this. |
@abesto Looks good! Could we pre-configure the grafana docker container with the Zipkin dashboard you made on grafana.net? I can have a look at the dashboard. My grafana.net username is kristofa. |
@kristofa Initially I thought the only way to do that would be to commit the database file into the repo, but on second reading there may be a better way ( Update: that doesn't work for the dashboard format used by grafana.net, and it doesn't support adding data sources. Created a small shell script to set up the data source and the dashboard on startup (pulling the dashboard from grafana.net) |
I just signed up in grafana as adriancole
…On Tue, Feb 28, 2017 at 2:43 AM, Zoltán Nagy ***@***.***> wrote:
While this should work fine without a dashboard, I went ahead and drafted
a dashboard that includes all the currently exported data. I tried to
organize it to the best of my ability: https://grafana.net/dashboards/1598
(currently draft, meaning it's viewable via this link, but doesn't show up
in searches. At least I think that's what it means)
I also created an OpenZipkin org on grafana.net so that we can share
write access: https://grafana.net/openzipkin. Drop me your grafana.net
username to get access.
So I'd say: let's do one round of iteration on the dashboard (either by
review or direct edit). I'll go ahead and update the README accordingly;
once done, we can release the dashboard and merge this.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#135 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAD61zSLtxFeOopJxqz71knKipvqJUP1ks5rgydrgaJpZM4MMWtf>
.
|
@adriancole added you on grafana.net to the org. |
I noticed we don't seem to expose any metrics that indicate failures in the dashboard. Also, the If we want to make the dashboard production ready we might also
|
@kristofa I'm pretty sure we can make our api 500 :) (then answer the question about metrics path). It is standard spring boot metrics.. At any rate, we ought to think about testing this integration at some point. For example, we have some docker tests already for storage. It would be neat to have a test containers .. test.. to ensure whatever we say here actually works moving forward. |
@adriancole shouldn't be hard :) I'm a bit out of a context, but let me know if you need any help to configure TestContainers for it :) |
@kristofa Totally agree on the Side-note on doing a |
@abesto We stepped away from having metric names which include dimensions that are interested for filtering like statusClass and use labels instead. So the We don't typically do aggregation at the client / grafana side and we often define Prometheus recording rules. These are pre-calculated at the server side to prevent that expensive and often used queries are calculated for every client request. Triggering expensive queries and getting more detailed data to the client might work depending on how many time series your Prometheus server has to maintain and serve. But indeed, the more you aggregate the more likely it is you have to invoke a separate query to get more details for example to find a single bad behaving instance. |
@kristofa So the way Zipkin exposes metrics currently is not following the established best practices of the Prometheus community / devs. Thanks for educating! Let me test my understanding. Even after we restructure the metrics as you described (for instance, to have one metric called Is that correct? Is this, do you think, the idiomatic way to approach this? |
README.md
Outdated
Zipkin comes with a built-in Prometheus metric exporter. The main | ||
`docker-compose.yml` file starts Prometheus configured to scrape Zipkin, exposes | ||
it on port `9090`. You can open `$DOCKER_HOST_IP:9090` and start exploring the | ||
metrics (which are available on the `/promethes` endpoint of Zipkin). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo there. Fixing, thanks!
Sorry for the late response. We can use an expression like The rate function calculates the per-second average rate. See rate explanation. The recording rules I talked about make sense when the expression is expensive to calculate and so the rule moves the calculation to the server side where it will be calculated at scheduled intervals and avoid calculation for every client request. Here are the Prometheus metric / label name conventions: Metric and label naming |
I didn't put one and one together, thanks for explaining things to a Prometheus newbie! I've just understood how the whole aggregation / I see two roads ahead at this point, looking mostly to @adriancole for advice:
|
I would get the dashboard out as it is now. It is already useful to show the integration and existing metrics. We can always iterate later. |
fyi I noticed that since we added prometheus to zipkin, upstream formalized it.. maybe we should consider their metrics endpoint before formalizing this (or maybe we do it after) |
Rebased on top of master. Re. replacing metrics collection with the upstream client: that won't change the format of the metrics exposed; the official client pretty much does the same as what our current exporter does. It does add some new metrics, with which we can extend the dashboard later (see openzipkin/zipkin#1609) We can also do some more magic on the Prometheus server side to get nicer response count metrics, will try to do that now (see prometheus/client_java#255 (comment)) |
With the relabeling rules the dashboard can now be significantly smarter, with automatically populated response code count and response time graphs. Updated the dashboard on grafana.net, looks something like this: @kristofa @adriancole Some time has passed, and there are new changes. I think this is ready to merge, waiting for a nod from you :) |
http://grafana:3000/api/dashboards/import | ||
echo '{"dashboard": ' > data.json | ||
curl -s https://grafana.com/api/dashboards/${dashboard_id}/revisions/${last_revision}/download >> data.json | ||
echo ', "inputs": [{"name": "DS_PROM", "pluginId": "prometheus", "type": "datasource", "value": "prom"}], "overwrite": false}' >> data.json |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This rewrite was needed because as the dashboard JSON grows (more graphs added), we ran into this issue:
xargs: argument line too long
After updating the rewrite rules for the changes in openzipkin/zipkin#1609, with https://gist.github.com/abesto/642cd049cc75643213b6e4c23bad7734, here's the current state. Things to note:
|
Meaning that we should name it the same way as the relevant metric from Prometheus itself.
63bfa19
to
1caa146
Compare
very nice.. going out for zipkin 2.2 |
Cool! |
uploaded!
|
Add Prometheus, Grafana to
docker-compose.yml
. Document basic usage and setup. See #121 for reasoning.I attempted to "ship" with a state where Grafana is already configured, but it's not completely trivial (where do you store the initial state? How do you get it to Grafana) leading to complexity. It would also remove some of the learning we expect to provide - automating it here means users will need to re-discover it later.