Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Proposal] Memory Leak detection system #119023

Open
timroes opened this issue Nov 18, 2021 · 4 comments
Open

[Proposal] Memory Leak detection system #119023

timroes opened this issue Nov 18, 2021 · 4 comments
Labels
impact:needs-assessment Product and/or Engineering needs to evaluate the impact of the change. Team:Operations Team label for Operations Team Team:QA Team label for QA Team

Comments

@timroes
Copy link
Contributor

timroes commented Nov 18, 2021

We have seen a histry of memory leaks in Kibana (#99473, #99471, elastic/elastic-charts#1148, #20342, #59454), which are often impacting users running Kibana fo ra long time, especially with auto refresh on. This might cause the browser tab to crash then after a couple of days running. While we can't reliably automate memory leak detection during development, I'd suggest we build a system that tries to at least detect if a recent snapshot introduced a memory leak.

That system could do the following:

  • Run our most recent snapshot of Kibana/ES in a VM.
  • Inject a couple of test dashboards (containing a broad range of visualizations, maps, and other embeddables) into Kibana.
  • Have them all running in headless Chrome instances with auto-refresh set to a rather low value (5-10s).
  • Monitor the a couple of key metrics (memory, DOM node count, registered listeners, ..) via the instrumentation APIs (which hopefully will give access to that) over time and log them (into an ES obviously ;) ).
  • Set up some alerting that will detect if any of the metrics is not properly being "garbage collected", but simply counts up over time. This would be a high indicator of a memory leak in Kibana.
  • Redeploy that system with a new snapshot e.g. every 2 days or another time frame we feel confident that the memory leak must have clearly shown in charts over that time.

This would of course not allow us to pinpoint a memory leak directly to a PR, which would be nice. But it would help us at least recognize after 2 days if some change caused us a new memory leak in Kibana, and someone can start investigating the snapshot manually finding the (newly introduced) problematic code.

cc @pauldotpower

@timroes timroes added Team:Operations Team label for Operations Team Team:QA Team label for QA Team labels Nov 18, 2021
@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-qa (Team:QA)

@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-operations (Team:Operations)

@LeeDr
Copy link

LeeDr commented Nov 19, 2021

@marius-dr did this a few times on an ad-hoc basis. I think we just had Stack Monitoring on the Kibana server and metricbeat data collection on the browser process CPU and memory. Maybe a bit more. @marius-dr could you describe what you did here? Was the monitoring data and metricbeat data sent to a different cluster? I think we need that so we can compare over time.

@Dosant
Copy link
Contributor

Dosant commented Dec 2, 2021

Was debugging my branch thinking that my changes caused a memory leak, turned out it exists in main.
Filed coupe more issues:

@exalate-issue-sync exalate-issue-sync bot added impact:needs-assessment Product and/or Engineering needs to evaluate the impact of the change. loe:small Small Level of Effort labels Feb 16, 2022
@tylersmalley tylersmalley removed loe:small Small Level of Effort impact:needs-assessment Product and/or Engineering needs to evaluate the impact of the change. labels Mar 16, 2022
@exalate-issue-sync exalate-issue-sync bot added the impact:needs-assessment Product and/or Engineering needs to evaluate the impact of the change. label Mar 22, 2022
@lizozom lizozom changed the title Memory Leak detection system [Proposal] Memory Leak detection system Jun 23, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
impact:needs-assessment Product and/or Engineering needs to evaluate the impact of the change. Team:Operations Team label for Operations Team Team:QA Team label for QA Team
Projects
None yet
Development

No branches or pull requests

5 participants