Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve loading time of Kubernetes package Dashboards #31021

Closed
MichaelKatsoulis opened this issue Mar 28, 2022 · 29 comments · Fixed by elastic/integrations#3115
Closed

Improve loading time of Kubernetes package Dashboards #31021

MichaelKatsoulis opened this issue Mar 28, 2022 · 29 comments · Fixed by elastic/integrations#3115
Assignees
Labels
enhancement Team:Cloudnative-Monitoring Label for the Cloud Native Monitoring team

Comments

@MichaelKatsoulis
Copy link
Contributor

MichaelKatsoulis commented Mar 28, 2022

We have found some opportunities for optimization in the dashboards::

  1. Top CPU intensive pods gets the max of kubernetes.container.cpu.usage.core.ns, then uses derivative aggregation over it and keeps the positive values. We can simply use kubernetes.container.cpu.usage.node.pct instead and group by the pod name.

  2. Same for Top Memory intensive pods

  3. CPU Usage by node sums all cpu usage nanocores per container, then uses a painless script to normalise it to the metricset period and groups by the node name. Instead we can use the node metric Kubernetes.node.cpu.usage.nanocores and divide it with kubernetes.node.cpu.allocatable.cores. Same approach is used in metrics UI

  4. Same for Memory Usage by node. We can divide kubernetes.node.memory.usage.bytes to kubernetes.node.memory.allocatable.bytes

  5. Same approach for network in and out bytes

I tested that by creating a separate dashboard with all those visualisations optimised and the loading time for 24h range decreased from 1m and 10 seconds down to 30 seconds.

@MichaelKatsoulis MichaelKatsoulis added enhancement Team:Cloudnative-Monitoring Label for the Cloud Native Monitoring team labels Mar 28, 2022
@MichaelKatsoulis MichaelKatsoulis self-assigned this Mar 28, 2022
@MichaelKatsoulis MichaelKatsoulis changed the title Optimise Kubernetes Dashboards Optimize Kubernetes Dashboards Mar 28, 2022
@MichaelKatsoulis
Copy link
Contributor Author

My suggestion regarding the default kubernetes dashboard optimization is to split it into 2 different dashboards.
The split can be based on the concept of each visualisation's data.
Meaning that some of them make sense to be displayed per time, while others make more sense to display the current value as a number.

For example for the number of available/desired/unavailable pods or number of nodes it is most important is to display the current situation in the cluster.

While for other visualisations like Top CPU intensive pods or CPU utilization per node it would be insightful to display the evolution of the value per time. A user would like to see how the cpu utilisations of a specific pod or node has changed over the past week.

The two dashboards could like this:

Time series dashboard:

kubernetes 2 1

kubernetes 2 2

Current state dashboard:
Kubernetes last value

This grouping can make the dashboards more performant as less queries will be performed simultaneously.
Also costly queries with aggregations over big time range will only be performed for the vis that make sense.

@MichaelKatsoulis
Copy link
Contributor Author

MichaelKatsoulis commented Mar 30, 2022

@ruflin
Copy link
Member

ruflin commented Mar 31, 2022

++ on splitting up the dashboards. Will the dashboards link to each other?

@ChrsMark
Copy link
Member

That would be great. We do the same for Istio module to split the control plane from data plane views:
https://www.elastic.co/guide/en/beats/metricbeat/current/metricbeat-module-istio.html#_dashboard_30

@MichaelKatsoulis
Copy link
Contributor Author

Yes I was thinking something like Istio tab view! That would be great! Does this allow to set different time ranges to each one?

@ruflin
Copy link
Member

ruflin commented Mar 31, 2022

I thought by now there is a new / better way on how to link dashboards together. @alexfrancoeur You might be able to point us to the right direction here?

@alexfrancoeur
Copy link

Thanks for the ping @ruflin. I think there are a number of best practices these integration dashboards can start to leverage. I've listed a bunch here in the past.

Kibana has drilldown capabilities in a dashboard (https://www.elastic.co/guide/en/kibana/current/drilldowns.html#create-drilldowns). This is great for creating workflows from dashboard to dashboard. An overview dashboard to a details dashboard for example. We support dashboard to dashboard and dashboard to external URLs (paid feature). For the integrations dashboard, a combination of markdown for general navigation and drilldowns for workflows is probably the best option.

If you'd like to sit down with some kibana folks and discuss best practices, we're happy to engage. For example we no longer need to ship 100's of visualizations referenced by a dashboard, we can simplify to package all in a single dashboard JSON now.

This is off topic, but while we're on the topic of dashboard linking, I think it's worth raising if we should be linking to solutions as well. There is probably some low hanging fruit here. Rather than taking that context and navigating to a dashboard, we could apply it to a solution view to create solution drilldowns. I hack together this all the time for demos using the URL drilldowns. Meaning if there's a host IP in a dashboard, let's click into that to navigate to a filtered view in the metrics app. Building these experiences as part of our integrations add for a much more integrated experience when onboarding a new data source. If there's interest in collaborating on something like this, let's have a quick chat with myself and @sixstringcode

@ruflin
Copy link
Member

ruflin commented Apr 1, 2022

Thanks for the list @alexfrancoeur . The drilldown one is the one I was looking for. There lots of other great hints in the issue you linked.

About "as value" I just had a conversation with your team and we should figure out ways how to automatically convert it. @ChrsMark @MichaelKatsoulis If we redo the k8s dashboards, lets use these best practices directly as an example and also switch to "value".

On the linking to solutions, ++. It is a topic we should also involve @jasonrhodes from unified observability.

@ruflin
Copy link
Member

ruflin commented Apr 1, 2022

I've been attending the TSDB meeting yesterday and @Mpdreamz showed off a demo for using TSDB in APM. A first version of the metrics parts are merged into main in Elasticsearch and available in the snapshot builds. I think it is also worth trying this out for the k8s data to see what impact it has.

My understanding is that currently the storage and query part are available but we can't make use of it yet in Kibana. Also in the package-spec we don't support the time series fields yet. What it means is that we have to adjust the templates manually and see what affect it has.

@imotov Tried to find some public docs I can point the team to around mappings and TSDI but was not successful. Is this already available?

@ruflin
Copy link
Member

ruflin commented Apr 1, 2022

I filed elastic/package-spec#311 to get support for TSDB in the package-spec.

@MichaelKatsoulis
Copy link
Contributor Author

MichaelKatsoulis commented Apr 1, 2022

I read in the best practises and by @ruflin suggestions that moving to lens is the way forward. I don't see or maybe I don't know how some of our tsvb visualisations can be moved to lens.

I will give an example regarding desired pods. The field kubernetes.deployment.replicas.desired has a value per deployment like

kube_deployment_spec_replicas{namespace="default",deployment="hello-python"} 1
kube_deployment_spec_replicas{namespace="kube-system",deployment="coredns"} 2
kube_deployment_spec_replicas{namespace="kube-system",deployment="kube-state-metrics"} 1
kube_deployment_spec_replicas{namespace="local-path-storage",deployment="local-path-provisioner"} 1

and we want to sum up all the last values of this fields for all deployments.
If we compare seeming the same dashboards with same query in tsvb and lens we can spot huge differences.

desired pods tsvb

lens desired pods

None of the results is the correct one. But lens one extreme!
Tsvb result is actually affected by the interval.

There where discussions about this in elastic/integrations#2159 (comment) and @ChrsMark updated the tsvb query by using series aggregation and grouping by deployment name.

tsvb2

But as long as this is not supported in Lens, I don't see how we can use it for such cases.

@MichaelKatsoulis
Copy link
Contributor Author

Additional thoughts following @ChrsMark suggestions in https://github.com/elastic/enhancements/issues/14008#issuecomment-1088524593

We could have:

  1. Different dashboards per Kubernetes resource (deployments, Daemonsets, StatefulSets) with useful informations for the pods controlled by them (cpu, memory, network, disk).
  2. Each dashboard could have a dropdown menu where user can choose pods of which namespace and which deployment/daemonset/statefulset name to see metrics for.
  3. Separate dashboard for node metrics of the cluster with dropdown menu for each node name.
  4. An overview dashboard with some cluster wide information (number of deployments replicas available , number of daemonset replicas available , number of nodes). Each of the vis of this dashboard can be a drill down that leads to the more detailed dashboards of step 1 and 3.
  5. Stream/log k8s events should also be part of the dashboards.

@ruflin
Copy link
Member

ruflin commented Apr 7, 2022

@MichaelKatsoulis I like your proposal above. It would be great if we could work on these dashboards in collaboration with the team from @flash1293 We can't necessarily achive everything with Lens now and all the other great features in Kibana but we should be able to eventually. Please keep the conversations going.

@MichaelKatsoulis
Copy link
Contributor Author

I played around with drill down. It doesn't work exactly as demonstrated and documented in https://www.elastic.co/guide/en/kibana/current/drilldowns.html#_create_the_dashboard_drilldown. I was expecting a Go to dashboard option when creating a new drill down. But I only see a Go to URL. I use version 8.1.2 and also tried with 8.2.0-SNAPSHOT.
Drilldown 8 1 2

@flash1293
Copy link

@MichaelKatsoulis This is expected for metric visualizations - the "Go to Dashboard" drilldown is tied to the filter trigger, that means it's only shown in case the visualization can place a filter (and if it happens, it will prompt the user to go to the other dashboard instead). There are no plans for TSVB, but we do plan to add this functionality for Lens metric visualizations: elastic/kibana#122879

@MichaelKatsoulis
Copy link
Contributor Author

@flash1293 thanks for the clarification. To be honest I don't understand why it is tied to the filter trigger.
In our case I want an overview dash like
Overview Dash

and when the user wants to see extra detailed info for the Nodes(currently tsvb but could be lens) it will point to
Node details

I have currently created drilldown with go to URL but this won't work for an out of the box dashboard as the url of the dash will be different.

@flash1293
Copy link

Completely agree, that’s what we will work on for Lens in 8.3

@MichaelKatsoulis
Copy link
Contributor Author

An extra thing that could be discussed regarding drilldown is the user experience. Instead of the user having to press the options button and then select the drill down name like:
drilldown

There should be an easier and more clear way.
If I were the user I would not understand that this red 1 on the vis means that there is a drilldown, and in order to see it I need two more steps.
Probably pressing on the 1 (or whatever that makes more sense) should navigate them to the dash.

@flash1293
Copy link

  • The 1 is only visible in edit mode, it's not shown in view mode (which should be the common case for users)
  • The easiest integration for [Lens] Allow metric visualization to drill down  kibana#122879 is to allow the user to click into the visualization (e.g. on the "pods" text), then getting a context menu which allows them to navigate. We can think about how to provide an affordance during implementation

@MichaelKatsoulis
Copy link
Contributor Author

@flash1293 I agree with your second bullet. That would be a good way. As it is now, there is no way a user can understand there is something more. The word drilldown also does not mean anything to someone that doesn't know what it is.

@MichaelKatsoulis
Copy link
Contributor Author

As an update for the effort so far:

I have created the following dashboards which are connected with drilldown (with go to url, waiting for the go to dashboards in lens)
Kubernetes Overview
Nodes details
Pods Details
Deployments details
DaemonSets details

@MichaelKatsoulis
Copy link
Contributor Author

MichaelKatsoulis commented Apr 18, 2022

@flash1293 Could we arrange a zoom call whenever possible to ask you about best ways to show some metrics in Kibana?
I want to create some nice gauges but to get those numbers, series aggregations are needed and then mathematical formulas like division.
Gauge

I can do things like this. But cannot get to use those two number for a division to get the percentage.
Cores requested

@jasonrhodes
Copy link
Member

@katefarrar @mlunadia it would be great for us to try to understand what it is about these dashboards that people want/need/use as we try to think through the infrastructure UI.

@MichaelKatsoulis
Copy link
Contributor Author

We had a nice discussion with @flash1293 about ways to create some visualizations and we concluded that some things are not possible yet. But they can be in the near future.
Until then we can use some workarounds when showing informations like memory reserved, memory used, cores reserved, cores used, pods reserved using the mark-down option.
pods
memory

Ideally we would like to be able to math calculation with the numbers in each vis to get the percentage.
Also we are waiting for the drill down option to be available in Lens in 8.3 or 8.4 release to better connect the dashboards between each other.

@mlunadia
Copy link

@jasonrhodes 100% we have plans to tackle this holistically and will for now address any low hanging fruit. We have already started working on establishing a baseline with different discovery activities one of them will be bringing your input in.

@jasonrhodes
Copy link
Member

@elastic/infra-monitoring-ui I wonder if there are things we can learn from the optimizations in this issue that could be applied to any other querying we are doing for infra UI.

@miltonhultgren
Copy link
Contributor

miltonhultgren commented Apr 25, 2022

I guess there are two things we can do:

  1. From a joint product perspective look at which visualizations we have in the UI today that could be changed to a gauge or single value instead of a trend line. I think today we almost only use trend lines? Changing that could allow us to change to a more performant query while also giving better feedback about the data to the user. Having multiple types of visualizations feels natural.

  2. Optimize the queries themselves. I'm a bit hesitant about this since it also requires the in-depth domain knowledge about which field means what and which aggregations causes that field to mean something else. What we could do however is try to take stock of the queries we do and how they perform (similar to the SM work we're doing) and then pick the top X and see if we can optimize them or feed them into point 1.

@jasonrhodes
Copy link
Member

@miltonhultgren thanks! These sound like good ideas to me. I'm wondering if there are specific optimizations made in the work related to this issue (from @MichaelKatsoulis and others) that we could use to inform how we might optimize our own queries, but you're right that there is likely some work we'll need to do to understand whether that kind of overlap exists.

@mlunadia mlunadia changed the title Optimize Kubernetes Dashboards Improve loading time of Kubernetes package Dashboards Apr 28, 2022
@ChrsMark
Copy link
Member

ChrsMark commented Jun 14, 2022

Further improvements will take place with the usage of TSDB features. Investigations will take place along with Rally framework.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Team:Cloudnative-Monitoring Label for the Cloud Native Monitoring team
Projects
None yet
8 participants