-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Monitoring] Thread pool rejections alert #79433
[Monitoring] Thread pool rejections alert #79433
Conversation
Pinging @elastic/stack-monitoring (Team:Monitoring) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome to see this already ready! Great work so far! I had a few comments so far and I'm going to keep testing but wanted a chance to start talking through the comments.
x-pack/plugins/monitoring/server/alerts/thread_pool_rejections_alert.ts
Outdated
Show resolved
Hide resolved
x-pack/plugins/monitoring/server/alerts/thread_pool_rejections_alert.ts
Outdated
Show resolved
Hide resolved
x-pack/plugins/monitoring/server/lib/alerts/fetch_thread_pool_rejections_stats.ts
Outdated
Show resolved
Hide resolved
x-pack/plugins/monitoring/server/alerts/thread_pool_rejections_alert.ts
Outdated
Show resolved
Hide resolved
x-pack/plugins/monitoring/server/lib/alerts/fetch_thread_pool_rejections_stats.ts
Outdated
Show resolved
Hide resolved
x-pack/plugins/monitoring/server/alerts/thread_pool_rejections_alert.ts
Outdated
Show resolved
Hide resolved
x-pack/plugins/monitoring/server/alerts/thread_pool_rejections_alert.ts
Outdated
Show resolved
Hide resolved
Nice work! In the second screenshot there is a link "Tune thread pools" where does this link to? (Sorry, can't find it in the code). |
https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-threadpool.html I agree 💯 that we should guide on how to treat the cause/problem and not the symptom. But, still wanted them to be aware of the thread pool API (incase it's applicable) |
…dpool_rejection_alert
…dpool_rejection_alert
…dpool_rejection_alert
…dpool_rejection_alert
…dpool_rejection_alert
@igoristic thanks for calcifying. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Functionally, this is looking great!! Nice work separating these two out. I have a couple of suggestions and will take a deeper look in the code next.
x-pack/plugins/monitoring/server/alerts/thread_pool_rejections_alert_base.ts
Show resolved
Hide resolved
x-pack/plugins/monitoring/server/alerts/thread_pool_rejections_alert_base.ts
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few things I found in the code, but looking great!
Also, I wanted to bring up this code: https://github.com/elastic/kibana/blob/master/x-pack/plugins/monitoring/server/alerts/missing_monitoring_data_alert.ts#L109. I wonder if we need to do this for all alerts that support a customizable duration (which I think they all do). We hard-code the query to fetch clusters to look 2m
in the past, but that wasn't sufficient for the missing monitoring data alert, and I'm wondering if it's sufficient for the others too.
WDYT?
x-pack/plugins/monitoring/server/lib/alerts/fetch_thread_pool_rejections_stats.ts
Outdated
Show resolved
Hide resolved
x-pack/plugins/monitoring/server/lib/alerts/fetch_thread_pool_rejections_stats.ts
Show resolved
Hide resolved
x-pack/plugins/monitoring/server/alerts/thread_pool_rejections_alert_base.ts
Show resolved
Hide resolved
…dpool_rejection_alert
…dpool_rejection_alert
packages/kbn-optimizer/limits.yml
Outdated
@@ -54,7 +54,7 @@ pageLoadAssetSize: | |||
mapsLegacy: 116817 | |||
mapsLegacyLicensing: 20214 | |||
ml: 82187 | |||
monitoring: 268612 | |||
monitoring: 288612 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Our unofficial goal is to ultimately limit all page load asset sizes to 200kb in the next year or so, as teams continue to work reducing the size of their bundles. Is anyone from the monitoring team working to reduce the page load asset size of this bundle? a 20kb increase feels reasonable for now, but I'd like to ask that someone from the monitoring team take a look at the x-pack/plugins/monitoring/target/public/stats.json
in one of the many webpack analyzers or visualizers after running node scripts/build_kibana_platform_plugins --focus monitoring --profile
?
When I run it I see that the server code is being bundles in the page load bundle, which seems like the best way to fix the limit issue, rather than raising the limit.
PS, I'm working on unifying the docs for the best way to diagnose and deal with this stuff now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is awesome! Thank you @spalger 🙇
@@ -81,9 +81,9 @@ | |||
* | |||
* @param {[type]} prov [description] | |||
*/ | |||
import _ from 'lodash'; | |||
import { partial, uniqueId, isObject } from 'lodash'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI, this has no impact on the size of the bundles since we always load and share a single, complete, lodash instance.
…dpool_rejection_alert
…dpool_rejection_alert
@chrisronline Re-requested your already approved review, since I had to remove all the |
💚 Build SucceededMetrics [docs]@kbn/optimizer bundle module count
async chunk count
async chunks size
distributable file count
page load bundle size
History
To update your PR or re-run it, just comment with: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh snap, love the bundle limit decrease! 💯
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Great work!
* master: (71 commits) [Chrome] Extension to append an element to the last breadcrumb (elastic#82015) [Monitoring] Thread pool rejections alert (elastic#79433) [Actions] Fix actionType type on registerType function (elastic#82125) [Security Solution] Modal for saving timeline (elastic#81802) add tests for index pattern switching (elastic#81987) TS project references for share plugin (elastic#82051) [Graph] Fix problem with duplicate ids (elastic#82109) skip 'returns a single bucket if array has 1'. related elastic#81460 Add a link to documentation in the alerts and actions management UI (elastic#81909) [Fleet] fix duplicate ingest pipeline refs (elastic#82078) Context menu trigger for URL Drilldown (elastic#81158) SO management: fix legacy import index pattern selection being reset when switching page (elastic#81621) Fixed dead links (elastic#78696) [Search] Add "restore" to session service (elastic#81924) fix Lens heading structure (elastic#81752) [ML] Data Frame Analytics: Fix feature importance cell value and decision path chart (elastic#82011) Remove legacy app arch items from codeowners. (elastic#82084) [TSVB] Renamed 'positive rate' to 'counter rate' (elastic#80939) Expressions/migrations2 (elastic#81281) [Telemetry] [Schema] remove number type and support all es number types (elastic#81774) ...
* Thread pool rejections first draft * Split search and write rejections to seperate alerts * Code review feedback * Optimized page loading and bundle size * Increased monitoring bundle limit * Removed server app import into the frontend * Fixed tests and bundle size Co-authored-by: Kibana Machine <[email protected]> # Conflicts: # packages/kbn-optimizer/limits.yml
Backport: |
Resolves #74822
The check calculates each data node to make sure that the thread pool rejections MAX within the last
5m
range is below the implied threshold.This is part of the "Additional Alerting" effort for Stack Monitoring
Testing:
search
orwrite
threshold to a value and make sure it's enabled eg:thread_pool.search.rejected
innode_stats
document (that's within the last 5 minutes) in the.monitoring-es*
index*Note: that it might take a couple of minutes for the notification to show up in the UI