fix(grafana): repair node selection & metrics name #158

aslafy-z · 2024-03-27T15:57:59Z

Update 'dns' & 'clusters' Grafana dashboards to fix node selection
Update 'pod-level' Grafana dashboard to fix metrics names, pod selection and datasource templating
Update datasource variable to DS_PROMETHEUS convention

Note: The pod-level grafana dashboard still have some old metrics to update.

vakalapa · 2024-03-27T15:59:56Z

@huntergregory to review.

github-actions · 2024-05-03T00:17:27Z

This PR will be closed in 7 days due to inactivity.

aslafy-z · 2024-05-03T01:20:10Z

Please have a look @vakalapa @huntergregory @rbtr

rbtr · 2024-05-03T16:55:07Z

hey @aslafy-z, thanks for working on this fix. I see that you have the DCO "signed-off-by" on all your commits, but we also need a cryptographic sig to be able to guarantee origin. Would you update these with a signature? Here's how: https://docs.github.com/en/authentication/managing-commit-signature-verification/signing-commits
I'm looking for this "Verified" tag commits in the PR once you've done that:

Signed-off-by: Zadkiel AHARONIAN <[email protected]>

aslafy-z · 2024-05-03T18:06:07Z

@rbtr I just rebased, squashed and signed my commits :)

rbtr

lgtm, just need final sign-off from @huntergregory

huntergregory

Hi @aslafy-z, sorry for the delay. I missed the notifications from March 😕. Thanks for your PR and the interest in improving these dashboards.

Do you mind updating the PR description with details/examples as needed for the bugs/fixes? Also, if you have a working pod-level dashboard, could you help fix #271?

Added more details in the comment, but I don't think it would make sense to filter by node in the "Fleet View". I'm also not sure that we can/should change datasource to DS_PROMETHEUS.

huntergregory · 2024-05-03T23:08:43Z

deploy/grafana/dashboards/clusters.json

          },
          "editorMode": "code",
-          "expr": "sum(rate(networkobservability_forward_count{direction=\"egress\", cluster=\"$cluster\", instance=~\"$Nodes\"}[$__rate_interval]))",
+          "expr": "sum(rate(networkobservability_forward_count{direction=\"egress\", cluster=\"$cluster\", instance=~\"($Nodes):[0-9]+\"}[$__rate_interval]))",


what does the : do in this regex? Also, could you help me understand scenarios where the node selection is broken?

The $Nodes variable has one or multiple node ips formated like 1.1.1.1 or 1.1.1.1|2.2.2.2.
The instance label however has the node:port 1.1.1.1:1234.
This edit makes it possible to select multiple nodes.

huntergregory · 2024-05-03T23:24:33Z

deploy/grafana/dashboards/clusters.json

          },
          "editorMode": "code",
-          "expr": "sum by (cluster) (rate(networkobservability_drop_count[$__rate_interval]))",
+          "expr": "sum by (cluster) (rate(networkobservability_drop_count{instance=~\"($Nodes):[0-9]+\"}[$__rate_interval]))",


This and above queries are part of the "Fleet View" panels, where the dashboard summarizes metrics across clusters. I'm not sure it makes sense to filter based on node here, since the Nodes variable only contains nodes for the selected cluster (there is always exactly one cluster selected):

retina/deploy/grafana/dashboards/clusters.json

Lines 3727 to 3732 in 6946bab

"name": "Nodes",

"options": [],

"query": {

"query": "label_values(kube_node_info{cluster=\"$cluster\"},node)",

"refId": "PrometheusVariableQueryEditor-VariableQuery"

},

There are some analogous panels below where someone can filter by node.

huntergregory · 2024-05-04T00:05:42Z

deploy/grafana/dashboards/pod-level.json

Turns out that this dashboard is broken and not importable #271 (at least on my grafana setup). If you have a working version, would you actually be able to export it for sharing externally?

I moved out from my previous job and has no access to a cluster where I can install retina right now. I'll try on a kind when back with my personal laptop in a few days and see how it goes.

huntergregory · 2024-05-04T00:10:38Z

deploy/grafana/dashboards/pod-level.json

          "refId": "StandardVariableQuery"
        },
        "refresh": 1,
-        "regex": "/.*_podname=\"([^\"]*).*/",
+        "regex": "/.*podname=\"([^\"]*).*/",


nice catch. You must have been using the advanced local-context metric mode. Just noting how this prompted initial thoughts on #344

huntergregory · 2024-05-04T00:21:36Z

deploy/grafana/dashboards/clusters.json

@@ -107,7 +107,7 @@
    {
      "datasource": {
        "type": "prometheus",
-        "uid": "${datasource}"
+        "uid": "${DS_PROMETHEUS}"


Update datasource variable to DS_PROMETHEUS convention

Do you have a link to this convention? I just glanced at the top dashboard on Grafana.com, and it uses datasource rather than DS_PROMETHEUS as the variable. Seems the same for the built-in dashboards in Azure's managed Grafana.

I'm also afraid that this might be a breaking change to someone's existing dashboard setup.

I've observed this "naming convention" widely used in recent years. If the dashboard needs to incorporate another type of datasource in the future, the name will clearly indicate the type.

I can revert the change if you prefer.

I see. Added some thoughts here #158 (review)

github-actions · 2024-06-04T00:18:04Z

This PR will be closed in 7 days due to inactivity.

github-actions · 2024-07-07T00:20:39Z

This PR will be closed in 7 days due to inactivity.

cmergenthaler · 2024-07-17T06:57:05Z

deploy/grafana/dashboards/clusters.json

        "hide": 0,
        "includeAll": true,
        "label": "Nodes",
        "multi": true,
        "name": "Nodes",
        "options": [],
        "query": {
-          "query": "label_values(kube_node_info{cluster=\"$cluster\"},node)",
+          "query": "label_values(kube_node_info{cluster=\"$cluster\"},internal_ip)",


I think we could use named capture groups here to separate displayed value and used value, e.g.

"query": { "qryType": 3, "query": "query_result(kube_node_info)", "refId": "PrometheusVariableQueryEditor-VariableQuery" }, "refresh": 2, "regex": "/node=\"(?<text>[^\"]+)|internal_ip=\"(?<value>[^\"]+)/g",

This would allow users to select nodes based on names but filter panels by underlying IPs

huntergregory

The dashboard files have moved since #432. Could you please apply the changes in the new files? Also, I think @cmergenthaler makes a good suggestion about displaying the node names.

If changing datasource.uid would break someone's dashboard when updating the dashboard to the new version, then I would prefer we not change it. Either way, it would be nice to keep this PR's scope smaller and make a change to datasource.uid in another PR (there are also test files that depend on that value).

github-actions · 2024-09-07T00:20:00Z

This PR will be closed in 7 days due to inactivity.

github-actions · 2024-09-14T00:20:20Z

Pull request closed due to inactivity.

aslafy-z · 2024-09-26T16:08:11Z

@rbtr @huntergregory @cmergenthaler
My priorities are shifting, I'm now working on another project without any Azure clusters on hand. Feel free to take over if you're able.

rbtr · 2024-09-26T16:30:36Z

@ibezrukavyi

…e groups in clusters dash Signed-off-by: Simone Rodigari <[email protected]>

…re groups in clusters dash Signed-off-by: Simone Rodigari <[email protected]>

@aslafy-z

…dash (#797) # Description This PR is to fix #158 * reduce scope of PR * [make it possible to select multiple nodes on clusters dash](#158 (comment)) * [fix pod-level regex](#158 (comment)) * [~~use named capture groups here to separate displayed value and used value in clusters dash~~](#158 (comment)) >NOTE: I have reverted the change to DS_PROMETHEUS not to break existing deployments and tests. This was requested in [this comment](#158 (review)) ## Related Issue fix #271 If this pull request is related to any issue, please mention it here. Additionally, make sure that the issue is assigned to you before submitting this pull request. ## Checklist - [x] I have read the [contributing documentation](https://retina.sh/docs/contributing). - [x] I signed and signed-off the commits (`git commit -S -s ...`). See [this documentation](https://docs.github.com/en/authentication/managing-commit-signature-verification/about-commit-signature-verification) on signing commits. - [x] I have correctly attributed the author(s) of the code. - [x] I have tested the changes locally. - [x] I have followed the project's style guidelines. - [x] I have updated the documentation, if necessary. - [x] I have added tests, if applicable. ## Screenshots (if applicable) or Testing Completed Please add any relevant screenshots or GIFs to showcase the changes made. ### All dashboards ![Screenshot 2024-10-01 152822](https://github.com/user-attachments/assets/6b15f10d-dc12-4405-9898-7da59b2fcdd9) ![Screenshot 2024-10-01 152846](https://github.com/user-attachments/assets/5e1763ce-2a48-4dd9-b4c5-f2b52a7cb3d5) ![Screenshot 2024-10-01 152917](https://github.com/user-attachments/assets/3e4aab9d-7b44-4357-a709-d137e3bb8e47) ### Node selection fix ![Screenshot 2024-10-02 103738](https://github.com/user-attachments/assets/5b61ce34-6a1e-414b-8c9e-1b35f89f7efb) ![Screenshot 2024-10-02 103802](https://github.com/user-attachments/assets/529d6b9f-85a9-48e6-be52-252ecadd066b) ## Additional Notes Thanks to @aslafy-z for the original PR #158 --- Please refer to the [CONTRIBUTING.md](../CONTRIBUTING.md) file for more information on how to contribute to this project. Signed-off-by: Simone Rodigari <[email protected]>

aslafy-z marked this pull request as ready for review March 27, 2024 15:58

aslafy-z requested a review from a team as a code owner March 27, 2024 15:58

aslafy-z changed the title ~~fix(grafana): repair node selection~~ fix(grafana): repair node selection & metrics name Mar 27, 2024

aslafy-z force-pushed the patch-3 branch from 29c8a12 to 1c9e567 Compare March 27, 2024 16:19

rbtr requested a review from huntergregory March 27, 2024 16:23

rbtr added type/fix Fixes something area/infra Test, Release, or CI Infrastructure labels Mar 27, 2024

aslafy-z force-pushed the patch-3 branch from 65a5b62 to 7d2fddf Compare March 28, 2024 08:38

rbtr added the priority/1 P1 label Mar 28, 2024

aslafy-z force-pushed the patch-3 branch 3 times, most recently from 8a7d8aa to 2416abb Compare April 2, 2024 07:53

github-actions bot added the meta/waiting-for-author Blocked and waiting on the author label May 3, 2024

aslafy-z force-pushed the patch-3 branch from 2416abb to 92e2934 Compare May 3, 2024 01:20

rbtr assigned huntergregory May 3, 2024

aslafy-z force-pushed the patch-3 branch from 92e2934 to 2ae4bef Compare May 3, 2024 18:04

fix(grafana): repair node selection & metrics name

f03f0ef

Signed-off-by: Zadkiel AHARONIAN <[email protected]>

aslafy-z force-pushed the patch-3 branch from 2ae4bef to f03f0ef Compare May 3, 2024 18:05

rbtr approved these changes May 3, 2024

View reviewed changes

github-actions bot removed the meta/waiting-for-author Blocked and waiting on the author label May 4, 2024

huntergregory requested changes May 4, 2024

View reviewed changes

github-actions bot added the meta/waiting-for-author Blocked and waiting on the author label Jun 4, 2024

nddq removed the meta/waiting-for-author Blocked and waiting on the author label Jun 6, 2024

github-actions bot added meta/waiting-for-author Blocked and waiting on the author and removed meta/waiting-for-author Blocked and waiting on the author labels Jul 7, 2024

aslafy-z requested a review from huntergregory July 9, 2024 18:25

cmergenthaler reviewed Jul 17, 2024

View reviewed changes

huntergregory requested changes Aug 7, 2024

View reviewed changes

huntergregory added the area/data-ingestion-and-visualization label Aug 7, 2024

github-actions bot added the meta/waiting-for-author Blocked and waiting on the author label Sep 7, 2024

github-actions bot closed this Sep 14, 2024

aslafy-z deleted the patch-3 branch September 26, 2024 16:08

ibezrukavyi self-assigned this Sep 26, 2024

aslafy-z restored the patch-3 branch September 26, 2024 20:06

SRodi reopened this Sep 30, 2024

SRodi added a commit to SRodi/retina that referenced this pull request Sep 30, 2024

(fix microsoft#158)rebase, revert datasource ref and use named captur…

8a25e8f

…e groups in clusters dash Signed-off-by: Simone Rodigari <[email protected]>

SRodi added a commit to SRodi/retina that referenced this pull request Sep 30, 2024

(fix microsoft#158)rebase, revert datasource ref and use named captur…

72c16bd

…e groups in clusters dash Signed-off-by: Simone Rodigari <[email protected]>

SRodi mentioned this pull request Sep 30, 2024

fix(grafana/PR#158): fix node selection, metrics name dns, pod-level dash #797

Merged

7 tasks

SRodi added a commit to SRodi/retina that referenced this pull request Sep 30, 2024

fix(microsoft#158): rebase, revert datasource ref and use named captu…

9fb1fa1

…re groups in clusters dash Signed-off-by: Simone Rodigari <[email protected]>

aslafy-z closed this Sep 30, 2024

aslafy-z deleted the patch-3 branch September 30, 2024 17:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(grafana): repair node selection & metrics name #158

fix(grafana): repair node selection & metrics name #158

aslafy-z commented Mar 27, 2024 •

edited

Loading

vakalapa commented Mar 27, 2024

github-actions bot commented May 3, 2024

aslafy-z commented May 3, 2024

rbtr commented May 3, 2024

aslafy-z commented May 3, 2024

rbtr left a comment

huntergregory left a comment

huntergregory May 3, 2024

aslafy-z Jul 8, 2024

huntergregory May 3, 2024

huntergregory May 4, 2024

aslafy-z Jul 8, 2024

huntergregory May 4, 2024

huntergregory May 4, 2024

aslafy-z Jul 8, 2024

aslafy-z Jul 8, 2024

huntergregory Aug 7, 2024

github-actions bot commented Jun 4, 2024

github-actions bot commented Jul 7, 2024

cmergenthaler Jul 17, 2024

huntergregory left a comment

github-actions bot commented Sep 7, 2024

github-actions bot commented Sep 14, 2024

aslafy-z commented Sep 26, 2024 •

edited

Loading

rbtr commented Sep 26, 2024

	"name": "Nodes",
	"options": [],
	"query": {
	"query": "label_values(kube_node_info{cluster=\"$cluster\"},node)",
	"refId": "PrometheusVariableQueryEditor-VariableQuery"
	},

fix(grafana): repair node selection & metrics name #158

fix(grafana): repair node selection & metrics name #158

Conversation

aslafy-z commented Mar 27, 2024 • edited Loading

vakalapa commented Mar 27, 2024

github-actions bot commented May 3, 2024

aslafy-z commented May 3, 2024

rbtr commented May 3, 2024

aslafy-z commented May 3, 2024

rbtr left a comment

Choose a reason for hiding this comment

huntergregory left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

github-actions bot commented Jun 4, 2024

github-actions bot commented Jul 7, 2024

Choose a reason for hiding this comment

huntergregory left a comment

Choose a reason for hiding this comment

github-actions bot commented Sep 7, 2024

github-actions bot commented Sep 14, 2024

aslafy-z commented Sep 26, 2024 • edited Loading

rbtr commented Sep 26, 2024

aslafy-z commented Mar 27, 2024 •

edited

Loading

aslafy-z commented Sep 26, 2024 •

edited

Loading