Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update Grafana to version 11 and provision dashboards from usegalaxy-eu/grafana-dashboards #1235

Merged
merged 12 commits into from
Jul 23, 2024

Conversation

kysrpex
Copy link
Contributor

@kysrpex kysrpex commented Jun 17, 2024

Update Grafana to version 11 and switch from deprecated role cloudalchemy.grafana to the official Grafana role from the grafana.grafana collection. Update usegalaxy_eu.grafana_matrix_forwarder. Instead of disabling firewalld on the Grafana host, open the nginx ports.

Before merging this PR, the Grafana database must be migrated from SQLite to PostgresSQL. This is rather simple using pgloader and following this guide from Jamie Ly. The postgres database information can be found here. It can be accessed from stats, sn06 and maintenance (see usegalaxy-eu/infrastructure-playbook#1235).

Migration in a nutshell:

  1. Make sure the grafana database exists in Postgres and the grafana user has all privileges on it. You may run reset.sql as postgres to do this ( ⚠️ this drops the grafana database).
  2. Run schema.sql as grafana to create all tables Grafana v9.2.10 (c37dcaf0da) needs. Grafana v11 will run database migrations on top of it.
  3. Install pg_loader. Luckily, there are packages available in the Postgres community repository. To enable the repo, you just need the right repository package. You may (and should) delete the repository package once you have completed this migration.
  4. Open grafana.load and adjust the path to /data/monitoring/grafana_data/grafana.db (if running the migration steps on stats.galaxyproject.eu) and the PostgreSQL connection string if needed.
  5. Run pgloader grafana.load. Warnings and even errors are expected, not an issue.
  6. Merge this PR and run the grafana.yml playbook.
  7. Enable the stats-grafana Jenkins project.
  8. Some dashboards (the provisioned ones) will be in the wrong folder. Move them to the correct folder (see pictures below). All the remaining dashboards that do not belong to an existing folder should be moved to a new folder "General".

Bildschirmfoto vom 2024-07-15 15-21-52
Bildschirmfoto vom 2024-07-15 15-22-19
Bildschirmfoto vom 2024-07-15 15-22-36

Closes usegalaxy-eu/issues#558.

@kysrpex kysrpex self-assigned this Jun 17, 2024
Update Grafana to version 11 and switch from deprecated role `cloudalchemy.grafana` to the official Grafana role from the `grafana.grafana` collection. Update `usegalaxy_eu.grafana_matrix_forwarder`.

Instead of disabling firewalld on the Grafana host, open the nginx ports.
@kysrpex kysrpex force-pushed the grafana_update_version_11 branch from 33ddf43 to acc9b9a Compare June 17, 2024 10:41
Add usegalaxy-eu/grafana-dashboards as a submodule so that the Ansible role `grafana.grafana.grafana` can provision the dashboards from this folder.
@kysrpex kysrpex force-pushed the grafana_update_version_11 branch from 8de8a08 to ec5391c Compare June 17, 2024 12:50
@kysrpex kysrpex changed the title Update Grafana to version 11 Update Grafana to version 11 and provision dashboards from usegalaxy-eu/grafana-dashboards Jun 17, 2024
@kysrpex kysrpex force-pushed the grafana_update_version_11 branch 2 times, most recently from 18c801b to 514808d Compare June 21, 2024 13:47
@kysrpex kysrpex force-pushed the grafana_update_version_11 branch from 514808d to bbe09a8 Compare July 3, 2024 10:30
@kysrpex kysrpex force-pushed the grafana_update_version_11 branch from 365c7b4 to 00e8fc1 Compare July 15, 2024 13:29
@kysrpex kysrpex marked this pull request as ready for review July 15, 2024 13:30
@@ -1,3 +1,6 @@
[submodule "mounts"]
path = mounts
url = https://github.com/usegalaxy-eu/mounts
[submodule "files/grafana"]
path = files/grafana
url = https://github.com/usegalaxy-eu/grafana-dashboards.git
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure if the .git could lead to problems, but I am also not an expert, was just noticing, that mount doesn't have it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be fine afaik, that's even what GitHub suggests

grafik

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok :)
I just saw it at git handbook without and was not sure about this

group_vars/grafana/vars.yml Show resolved Hide resolved
grafana_matrix_forwarder_port: 6000
grafana_matrix_forwarder_username: "{{ vault_grafana_matrix_forwarder_username }}"
grafana_matrix_forwarder_password: "{{ vault_grafana_matrix_forwarder_password }}"
grafana_matrix_forwarder_resolve_mode: reaction
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for updating / improving this!

Copy link
Contributor

@mira-miracoli mira-miracoli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

@bgruening
Copy link
Member

Cool beans! Thanks a lot!

I have a technical question, but please deploy.
Old links to dashboards will probably not work anymore, I assume, is there a way in grafana to keep stable links? Or should we use bitly or even better gx.io to create stable links dashboards?

@kysrpex
Copy link
Contributor Author

kysrpex commented Jul 16, 2024

Cool beans! Thanks a lot!

I have a technical question, but please deploy. Old links to dashboards will probably not work anymore, I assume, is there a way in grafana to keep stable links? Or should we use bitly or even better gx.io to create stable links dashboards?

With this migration path all dashboards keep the same uids, and Grafana has not changed how it forms the urls, so links do not break (see screenshots below).

Bildschirmfoto vom 2024-07-16 09-01-57
Bildschirmfoto vom 2024-07-16 09-02-13

@bgruening
Copy link
Member

Oh nice!!!

@bgruening
Copy link
Member

Can you please also sum that all up in a nice blog post and in addition in a operations update.

Thanks a lot!

Workaround for certbot error below.

```
The server will not issue certificates for the identifier :: Invalid identifiers requested :: Cannot issue for "stats-galaxyproject-eu.novalocal": Domain name does not end with a valid public suffix (TLD)", "Ask for help or search for solutions at https://community.letsencrypt.org. See the logfile /var/log/letsencrypt/letsencrypt.log or re-run Certbot with -v for more details.
```
@kysrpex kysrpex merged commit a22f19d into master Jul 23, 2024
2 checks passed
@kysrpex kysrpex deleted the grafana_update_version_11 branch July 23, 2024 13:42
@mira-miracoli
Copy link
Contributor

oh I guess my github was stuck

@kysrpex
Copy link
Contributor Author

kysrpex commented Jul 23, 2024

@sanjaysrikakulam @mira-miracoli Important detail. You may find that some panels do not work. You can very easily fix them like this:

  1. Find a panel that does not work.
    Bildschirmfoto vom 2024-07-23 15-59-20

  2. Click "Edit".
    Bildschirmfoto vom 2024-07-23 15-59-34

  3. Click the pencil on the bottom right (raw query mode). The preview will start working instantly.
    Bildschirmfoto vom 2024-07-23 16-00-01
    Bildschirmfoto vom 2024-07-23 16-00-20

  4. Save the dashboard (the screenshot is wrong, use "Save", not "Apply").
    Bildschirmfoto vom 2024-07-23 16-00-34

Don't ask me why this works. Actually if you switch the panel back to visual editor mode it breaks again.

@bgruening
Copy link
Member

This one https://stats.galaxyproject.eu/d/000000004/galaxy?orgId=1&refresh=10s&viewPanel=7 does not seem to work. Maybe wrong host?

@kysrpex
Copy link
Contributor Author

kysrpex commented Jul 24, 2024

This one https://stats.galaxyproject.eu/d/000000004/galaxy?orgId=1&refresh=10s&viewPanel=7 does not seem to work. Maybe wrong host?

The list of hosts is populated by the InfluxQL query SHOW TAG VALUES FROM "cluster.queue" WITH KEY = "host", but the only possible outcome of that query right now is "maintenance.galaxyproject.eu", see the InfluxDB measurement query below.

> SELECT * FROM "cluster.queue" ORDER BY time desc LIMIT 10
name: cluster.queue
time                 count engine host                         schedd                state
----                 ----- ------ ----                         ------                -----
2024-07-24T12:38:00Z 0     condor maintenance.galaxyproject.eu sn06.galaxyproject.eu suspended
2024-07-24T12:38:00Z 539   condor maintenance.galaxyproject.eu sn06.galaxyproject.eu running
2024-07-24T12:38:00Z 426   condor maintenance.galaxyproject.eu sn06.galaxyproject.eu idle
2024-07-24T12:38:00Z 2     condor maintenance.galaxyproject.eu sn06.galaxyproject.eu held
2024-07-24T12:38:00Z 0     condor maintenance.galaxyproject.eu sn06.galaxyproject.eu removed
2024-07-24T12:38:00Z 0     condor maintenance.galaxyproject.eu sn06.galaxyproject.eu completed
2024-07-24T12:37:00Z 0     condor maintenance.galaxyproject.eu sn06.galaxyproject.eu removed
2024-07-24T12:37:00Z 0     condor maintenance.galaxyproject.eu sn06.galaxyproject.eu completed
2024-07-24T12:37:00Z 0     condor maintenance.galaxyproject.eu sn06.galaxyproject.eu suspended
2024-07-24T12:37:00Z 543   condor maintenance.galaxyproject.eu sn06.galaxyproject.eu running

Changing to SHOW TAG VALUES FROM "cluster.queue" WITH KEY = "schedd" allows choosing sn06.galaxyproject.eu too, and then the route timings work.

This more likely has to do with the setup of the maintenance node and the HTCondor migration than with the Grafana update.

Note to all: if you see fixable problems like these on the dashboards go ahead and fix them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants