roachtest: grafana API for annotations #119821

srosenberg · 2024-03-01T19:21:27Z

Grafana annotations [1] are typically used to superimpose contextual information onto existing dashboard(s). E.g., a roachtest executing some workload, say TPCC, could create one annotation to denote the range of time during which data is loaded, and another annotation to denote the benchmarking phase. This new contextual information facilitates presentation of the (visual) data; specifically, it delineates unique events which may or may not be correlated. Thus, it makes it easier to find event-local patterns.

There are many other use-cases for annotations. However, our intended primary use-case is to facilitate "chaos" events during failure injection. Note that the DRT team is already using a makeshift script to create such annotations [2].

The centralized grafana instance is behind GCE's identity-aware proxy (IDP). Thus, the internal API will need to generate a fresh identity (oauth2) token; project-wide default limits the expiration time for these tokens to 1 hour. Also note that the internal grafana instance (i.e., one that's optionally provisioned for the duration of a roachtest) doesn't require authentication. Thus, the roachtest grafana API for annotations should support both methods of authentication.

[1] https://grafana.com/docs/grafana/latest/developers/http_api/annotations/#annotations-api
[2] https://gist.github.com/srosenberg/8e2758d3a9f071b72cd55116c2128a9e

Jira issue: CRDB-36333

blathers-crl · 2024-03-01T19:21:29Z

cc @cockroachdb/test-eng

DarrylWong · 2024-03-01T20:21:24Z

One detail that makes things a bit trickier is that grafana annotations can only be applied over the entire org, dashboard, or panel. However, we set up our metrics such that a single roachtest will be split up throughout different dashboards based on the type of metric (i.e SQL/Queues/Storage/).

~~Adding the annotation over the entire org would be easiest, but we have non roachtest dashboards like the aforementioned DRT cluster that we wouldn't want to add annotations to.~~

Actually I had a misunderstanding about how adding to the entire org worked. I thought that it would add it to every dashboard but instead it adds it to no dashboard and you have to query them in. Which is the same amount of work as adding to a dashboard but it doesn't live on a dashboard, which seems better.

I think the best way to do it is to add annotations to the overview dashboard, and then for the other relevant dashboards, we can add an annotation query that shows annotations by filtering for tags with the current test_run_id and test_name.

The main drawback to this is that we now need to keep track of test_run_id in the roachtest so we can create the annotation with those tags in the first place. This is currently a test_runner detail only, and we'd need to pass this somewhere.

test.Test makes the most sense to me, since we already need to know test.Name(). The unfortunate part about this is that it then ties creating grafana annotations with knowing the test spec, which doesn't seem right to me since they don't really seem related. Maybe not the biggest deal in the world though 🤷 Open to other ideas though.

srosenberg · 2024-03-01T23:05:30Z

Adding the annotation over the entire org would be easiest, but we have non roachtest dashboards like the aforementioned DRT cluster that we wouldn't want to add annotations to.

What if we just rely on tags to filter out irrelevant org-wide annotations? E.g., for roachtest specific annotations, we could filter on roachtest, for DRT annotations, using drt tag, etc.

The main drawback to this is that we now need to keep track of test_run_id in the roachtest

What if we use cluster (name) in conjunction with test_name instead of exposing test_run_id? (Recall that no two tests can time-share the same cluster.) It seems that the tags can be filtered by using template variables, and most of our dashboards already use the cluster variable.

[1] https://grafana.com/docs/grafana/latest/dashboards/build-dashboards/annotate-visualizations/#filter-queries-by-tag

DarrylWong · 2024-03-02T00:25:07Z

What if we just rely on tags to filter out irrelevant org-wide annotations?

Ahh, thinking about it more, I don't think this is really an issue. We're gonna need filters anyway so we don't see the annotations from every tests at once.

The way annotation filtering works in grafana seems a bit clunky - you can't really filter out annotations from what I can tell, only query them in. As in you can't add a query to filter out all annotations that don't match a tag, you have to query for all annotations that do match it. Which is why the approach I mentioned of adding to just one dashboard wouldn't be any extra work than adding org wide now that I think about it.

What if we use cluster (name) in conjunction with test_name instead of exposing test_run_id?

It looks like we have two sets of dashboards? One that's filtered by cluster and one that's filtered by test run and name? I'm not sure which one people use more but I'd guess the latter since that's what the roachtest output spits out:

https://github.com/cockroachdb/cockroach/blob/b4b889347fb12215bb09307fab89296916c85c1f/pkg/cmd/roachtest/test_runner.go#L982-L983

I'd need to test it, but I don't think those dashboards have the cluster varriable:

srosenberg added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) A-testing Testing tools and infrastructure T-testeng TestEng Team labels Mar 1, 2024

DarrylWong mentioned this issue Mar 6, 2024

roachprod, roachtest: add grafana annotations API #120019

Merged

craig bot closed this as completed in 0b6de9c Mar 17, 2024

github-project-automation bot added this to Test Engineering Aug 28, 2024

github-project-automation bot moved this to Done in Test Engineering Aug 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

roachtest: grafana API for annotations #119821

roachtest: grafana API for annotations #119821

srosenberg commented Mar 1, 2024 •

edited by cockroach-jira-scripts

Loading

blathers-crl bot commented Mar 1, 2024

DarrylWong commented Mar 1, 2024 •

edited

Loading

srosenberg commented Mar 1, 2024

DarrylWong commented Mar 2, 2024

roachtest: grafana API for annotations #119821

roachtest: grafana API for annotations #119821

Comments

srosenberg commented Mar 1, 2024 • edited by cockroach-jira-scripts Loading

blathers-crl bot commented Mar 1, 2024

DarrylWong commented Mar 1, 2024 • edited Loading

srosenberg commented Mar 1, 2024

DarrylWong commented Mar 2, 2024

srosenberg commented Mar 1, 2024 •

edited by cockroach-jira-scripts

Loading

DarrylWong commented Mar 1, 2024 •

edited

Loading