Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

roachtest: name clusters after running test #98658

Closed
tbg opened this issue Mar 15, 2023 · 2 comments · Fixed by #107965
Closed

roachtest: name clusters after running test #98658

tbg opened this issue Mar 15, 2023 · 2 comments · Fixed by #107965
Labels
C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-testeng TestEng Team

Comments

@tbg
Copy link
Member

tbg commented Mar 15, 2023

Is your feature request related to a problem? Please describe.

The clusters get scraped by our internal Prom/Grafana instance, so it's helpful if the cluster name "means something".

Describe the solution you'd like

Switch from $USER-1678829488-01-n10cpu8 to something like$USER-nameoftestbutsanitizedandshortenedifnecessary-YYMMDD-nonce

One problem with this approach is that we'd have to rethink cluster reuse in roachtest; it would be confusing if a cluster for test A were to be reused by test B. It's unclear if reuse is something we need to keep. (I'm uneasy about cross-pollution between tests because we increasingly do random systemd-run stuff that roachprod wipe won't clear up).

Describe alternatives you've considered

We could also try to export the name of the running test as a label. But I'm not sure how feasible this is.

Additional context

Slack (internal)

Jira issue: CRDB-25424

@tbg tbg added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-testeng TestEng Team labels Mar 15, 2023
@blathers-crl
Copy link

blathers-crl bot commented Mar 15, 2023

cc @cockroachdb/test-eng

@tbg
Copy link
Member Author

tbg commented Jul 3, 2023

#105894 makes this obsolete (which in turn requires #104366)

@tbg tbg closed this as completed Jul 3, 2023
smg260 pushed a commit to smg260/cockroach that referenced this issue Aug 2, 2023
This commit will add a `test_name` label to each VM when a
particular roachtest is about to be executed on the cluster, with
the label being removed at the end of the roachtest.

The `test_name` label is being scraped by Prometheus to allow
filtering of dashboards based on the roachtest name. GCE labelling
rules mean that test names are sanitised to match `[a-zA-Z-]`.

Epic: none
Fixes: cockroachdb#98658

Release note: None
smg260 pushed a commit to smg260/cockroach that referenced this issue Aug 10, 2023
This commit will add a `test_name` label to each VM when a
particular roachtest is about to be executed on the cluster, with
the label being removed at the end of the roachtest.

The `test_name` label is being scraped by Prometheus to allow
filtering of dashboards based on the roachtest name. GCE labelling
rules mean that test names are sanitised to match `[a-zA-Z-]`.

Epic: none
Fixes: cockroachdb#98658

Release note: None
craig bot pushed a commit that referenced this issue Aug 14, 2023
107965: roachtest: roachprod: add test name and run id vm labels for metrics r=herkolategan a=smg260

These 2 commits add labels to clusters running roachtests, so that metrics can be better filtered in various dashboards. 

1. Adds `test_name` label to each cluster, and removes the label at the end of the test. Thus, each cluster would have this label updated for each test that it runs during a particular roachtest invocation. The test name will be simplified to conform to cloud labelling rules `[a-zA-Z-]`

2. Adds `test_run_id` label to each VM, *once*, for the duration of the run. Thus, each cluster would have this label added once at the beginning of a roachtest run (which would include multiple tests), and removed only after deregistration at the end.
\
In TeamCity this would take the form `<TC_USER>-<TC_BUILD_ID>`, and run locally `<USER>-<UNIX_TS>`

These 2 labels combined will allow it easy for a user to find metrics for a particular run of roachtest. (e.g. a specific GCE nightly)

Here is a [copy of an existing dashboard](https://grafana.testeng.crdb.io/d/qdkBruq4k/crdb-console-runtime-by-test?orgId=1&from=now-3h&to=now), modified to utilise the new labels.

Epic: None
Fixes: #98658
Release note: None

108037: server: return authoritative span statistics for db details endpoint r=THardy98 a=THardy98

Resolves: #96163

This change makes the admin API endpoint getting database statistics scan KV for span statistics instead of using the range descriptor cache. This provides authoritative output, helping deflake `TestMultiRegionDatabaseStats`.

Release note (sql change): admin API database details endpoint now returns authoritative range statistics.

108711: upgrades: deflake TestRoleMembersIDMigration1500Users r=rafiss a=rafiss

TeamCity has a new machine type where this test has started to time out more, so this change will make it take less time.

fixes #108539
Release note: None

Co-authored-by: Miral Gadani <[email protected]>
Co-authored-by: Thomas Hardy <[email protected]>
Co-authored-by: Rafi Shamim <[email protected]>
smg260 pushed a commit to smg260/cockroach that referenced this issue Aug 21, 2023
This commit will add a `test_name` label to each VM when a
particular roachtest is about to be executed on the cluster, with
the label being removed at the end of the roachtest.

The `test_name` label is being scraped by Prometheus to allow
filtering of dashboards based on the roachtest name. GCE labelling
rules mean that test names are sanitised to match `[a-zA-Z-]`.

Epic: none
Fixes: cockroachdb#98658

Release note: None
smg260 pushed a commit to smg260/cockroach that referenced this issue Sep 18, 2023
This commit will add a `test_name` label to each VM when a
particular roachtest is about to be executed on the cluster, with
the label being removed at the end of the roachtest.

The `test_name` label is being scraped by Prometheus to allow
filtering of dashboards based on the roachtest name. GCE labelling
rules mean that test names are sanitised to match `[a-zA-Z-]`.

Epic: none
Fixes: cockroachdb#98658

Release note: None
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-testeng TestEng Team
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant