-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New Dashboard: Simple Cluster Status Overview #2343
base: main
Are you sure you want to change the base?
Conversation
Tweaks the probe success timeline to be easier to read Some Pod status that may be relevant Last 24h status based on feedback Viktor
57e5c10
to
dac57df
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding this. I'm in contact with a user as well to get more feedback.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question: I realized that this will show the probes that are present in the datasource one has selected. For user grafana this means that it will default to the probes in wc. That might be good and it is showing most of the important services in sc as well. But I wonder if it will be obvious to users that you could find more/other probes by switching to the service cluster datasource.
Do you have any thoughts about this? Should there be a text box explaining this (and potentially other things about the dashboard)?
I got word that the user is on vacation and will be back in 2 weeks. So expect some feedback from them after that. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some user feedback (adding it as a file comment to make it a thread):
I took a look at the dashboard when I was running some database jobs last week. I think I came in with the wrong expectations, because the dashboard is clearly not targeting that use-case.
The latency-graphs for the systems that are in the dashboard seems really good for pinpointing/eliminating those systems as culprits for performance problems. The most immediate panel “in the same spirit” that I felt was missing was S3. I don’t know how you probe so it’s possible a misbehaving storage solution could be seen through spiking times for Harbor or similar, but it would be better to have the status for the storage solution in the same way as for the other systems so one doesn’t have to infer the health of that.
A small thing is that the namespace selector at the top doesn’t seem to have any effect. I don’t know how hard it is to remove that and other UI elements that aren’t used, but I think that would make it slightly easier to use the dashboard.
Warning
This is a public repository, ensure not to disclose:
What kind of PR is this?
Required: Mark one of the following that is applicable:
Optional: Mark one or more of the following that are applicable:
Important
Breaking changes should be marked
kind/admin-change
orkind/dev-change
depending on typeCritical security fixes should be marked with
kind/security
What does this PR do / why do we need this PR?
Adds a new dashboard intended to provide a better user-facing overview of the overall state of the cluster.
It should answer the question "Is the cluster working?" and "Has there been recent problems?"
It's based on the Prometheus Blackbox Exporter Dashboard, with some panels at the top added and tweaked.
Information to reviewers
Does it tell you if the cluster is working?
Does it tell you if the cluster is has had recent issues?
Are there any other indicators
Should this be merged back into the Prometheus Blackbox Exporter?
Checklist
NetworkPolicy Dashboard