-
Notifications
You must be signed in to change notification settings - Fork 456
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rework system metrics overview and host overview #3630
Conversation
🌐 Coverage report
|
Added @joshdover as reviewer to get someone with more Kibana experience to look at this, Josh feel free to assign someone else to look at this if needed. |
@flash1293 This is a huge improvement in visual layout and use of new Kibana features. I exported the dashboard to my personal cluster and I found it a lot easier to use. 🎉
I think being able to find the top memory consumers is important, regardless of CPU usage. If we have to do a separate table for this we probably should or we should add a similar heatmap for Top Hosts by Memory Usage.
For this comment + all others regarding last value mode: I think this is tricky because there are two use cases:
The previous behavior showed the "realtime" value. My guess is the most helpful metric is the "what is happening right now" case, but makes the tables useless for the other use case. Should we have separate dashboards or visualizations for these different use cases? I'm guessing this is a problem we need to help solve for all integrations. IMO this is the biggest outstanding problem with this PR. Question about the drilldowns: it seems that clicking on the table rows doesn't open the host drilldown, but clicking the context menu works. Is there a bug here, because I see the drilldown is configured for "table row click" but that doesn't seem to work? Related to that, it'd be great for us to expand on this in the future and have a process dashboard that can be drilled down into from the host overview board. |
Also, can the CPU heatmap on the overview dashboard also drilldown to the host dashboard on click? |
Thanks for the review @joshdover . I think I can address most of your points somehow - I will report back on these once I have a solution. about
An important note is that the tsvb vis would show the average of the last few seconds/minutes, but still order the list by the overall average, so the shown entries even in the current dashboard are not necessarily the top consumers. We have a feature on our near term list (8.5 or even 8.4 if we hurry) to allow to do the same thing in Lens (order by overall time range, show the last few minutes in the table). However I’m not sure whether it’s the best behavior as it’s hard to understand for a user - it’s a bit averaged over the full time range and a bit “current state”. to make it consistent there are two approaches:
What do you think? Should we keep the current tsvb behavior or go one of the other ways? |
Thanks for clarifying how this currently works and I do think this likely matches what we want, but the downside is we'd need to bump the minimum package version to 8.4 or 8.5 which isn't super desirable. I'd be ok keeping the current behavior in the PR over increasing the minimum version right now. As you mentioned the user can still get the latest value using a shorter time span, which I think is better than only solving one of the use cases or increasing the minimum version right now. We should track some of the improvements we'd like to make the next time we increase the minimum version. I think we'll want to try supporting at least 2-3 minor releases back from the latest public one. Today the latest release is 8.3.x, so bumping the minimum to 8.1 seems fine. |
actually that’s an important point - in the original pr description I listed things that can be improved in future versions (most notably new metric vis) . When discussing this with @akshay-saraswat we concluded bumping to the latest minor would he justifiable (as users of older stack versions can still use the existing legacy assets). Do you think we should make a lag of 2 minors a rule? |
Another important point I just realized - the url drilldown used to link from the overview to the detail dashboard is not part of the basic subscription. I guess we need to make sure to only use basic features, is that assumption correct? |
We should probably have some policy on this, but I don't think we do. I think the owning team of this package should chime in on how often we make bug fixes to this package that need to be backported to previous minors. cc @cmacknz
Hmm that will be a breaking change of sorts since the current dashboard does support this (is it also only on basic?). I still think it'd be good to include it assuming that there is some graceful fallback when not on basic and nothing breaks. May need to update the copy to not mention the drilldown feature though. |
The problem with drilldown is that I can’t link the filter up with the input control on the details dashboard. But in 8.3 there’s a new input controls feature that might not have this problem (I have to check though) |
Addressed some points:
The biggest open question is
@cmacknz @joshdover could you have a look and check wether it makes sense like this? |
@flash1293 What does this mean exactly?
A couple questions on this feature:
|
When navigating from one dashboard to another via filter drilldown, then the filter will be set on the target dashboard, but if there is an input control on the same field it won't be picked up as the "current value". So the user ends up with a regular filter pill and an empty select box - if they select something from the select box, a second filter pill is added, effectively filtering out everything. The user would need to remove the filter pill first and reselect which is pretty confusing.
I merged it yesterday, so it will be in 8.5 We can show the overall average and the very last value (top metric of the field sorted by timestamp descending) - would that make sense? |
I think if there were two separate columns and clearly labeled it would make sense ("Average CPU" and "Last CPU"?). We could compare this to what other visualizations are doing in other packages but I think we're trying to define what should be the best practice pattern here and I'm not sure looking at what we've done elsewhere is helpful. WDYT? |
Agreed. I like "average" vs "last" - it's less vague than "average over the last few seconds" which is what TSVB is doing at the moment. Gonna update the PR |
@joshdover Split up all the metrics in the tables into "Average" and "Last value": It seems helpful to me - the space is there to show another value and it's better information than just the current value or just the average |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM - did not manually test the latest iteration. Thanks for all your work on this @flash1293! 🎉
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for these changes. I haven't been able to personally view the changes but based on the discussions looks like a great improvement. Once I have access to 8.5, i'll provide more feedback if required.
@nimarezainia As mentioned, we can have some folks test this before we merge these changes. This can be done by downloading these two files and then importing them into Kibana from Stack Management > Saved Objects. I'd recommend creating a test space to do this in since it will override the dashboards from the integration that is currently installed: https://raw.githubusercontent.com/flash1293/integrations/system-dashboard-rework/packages/system/kibana/dashboard/system-79ffd6e0-faa0-11e6-947f-177f697178b8.json |
Slight correction - the files can’t be imported directly as they aren’t in the right format (elastic-package is doing some transformations) I will provide a proper export tomorrow and send it around. |
OK, as discussed offline I reverted the Lens tables on the system overview dashboard back to the TSVB top n visualizations for one-click-drilldown functionality (with adjusted color schemes): @joshdover @nimarezainia The latest state as importable file can be found here: https://gist.githubusercontent.com/flash1293/d3a8b167ad91576f9c9a770d163e1b20/raw/505cfe2a74083f957f29e260a435bc14be0560b3/export.ndjson Save this link as an |
@joshdover @cmacknz @nimarezainia @jlind23 It would be great to get this over the line. This is not only about the system dashboard itself which is a huge improvement but it will also serve as an example for many other integrations on how we should build the dashboards. |
I'm good with this being merged. @nimarezainia did you still want to get additional feedback from SAs or are you happy with the modifications @flash1293 made? |
@joshdover sorry I haven't been able to take care of this. if you are all happy let's merge and I will try and find SAs to review it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚀 Benchmarks reportPackage
|
Data stream | Previous EPS | New EPS | Diff (%) | Result |
---|---|---|---|---|
syslog |
16393.44 | 9433.96 | -6959.48 (-42.45%) | 💔 |
To see the full report comment with /test benchmark fullreport
@flash1293 I say we ship this. If we get user complaints, it's not hard to revert and release another update. |
Alright, I just removed the unused visualizations - if the build goes green I'm going to merge. |
@cmacknz could you take it from here in terms of releasing? |
Yes I can promote the integration, thanks! |
What does this PR do?
This PR reworks the system overview and host overview dashboards, following the best practices documented here: https://docs.google.com/document/d/1uyyFGx6xA5Kvl8c-ZdvXdvBGrHTylxU9F69TGqfzdmw/edit
Checklist
I have reviewed tips for building integrations and this pull request is aligned with them.I have verified that all data streams collect metrics or logs.changelog.yml
file.Detailed changes in this PR
System overview:
Host overview:
Divergence from datavis proposal
cc @gvnmagni
Problems/Leftovers
I couldn't do some things because they are either not available in Lens yet at all or not at the current version
How to test this PR locally
Screenshots