-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Monitoring] Fetch shard data more efficiently #54028
[Monitoring] Fetch shard data more efficiently #54028
Conversation
💔 Build FailedTo update your PR or re-run it, just comment with: |
Pinging @elastic/stack-monitoring (Team:Monitoring) |
} | ||
], | ||
"shardStats": { | ||
"nodes": { | ||
"jUT5KdxfRbORSCWkb5zjmA": { | ||
"shardCount": 38, | ||
"indexCount": 20, | ||
"shardCount": 5, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm pretty sure these are different because the new query now limits the shard data to the specific index instead of across all indices
@@ -32,7 +32,7 @@ export default function({ getService }) { | |||
it('should summarize node with metrics', async () => { | |||
const { body } = await supertest | |||
.post( | |||
'/api/monitoring/v1/clusters/YCxj-RAgSZCP6GuOQ8M1EQ/elasticsearch/nodes/jxcP6ue7eRCieNNitFTT0EA' | |||
'/api/monitoring/v1/clusters/YCxj-RAgSZCP6GuOQ8M1EQ/elasticsearch/nodes/jUT5KdxfRbORSCWkb5zjmA' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have no idea why this was changed. The original node id doesn't actually exist in the archived data! See #23715
const esIndexPattern = '*'; | ||
const cluster = {}; | ||
const stats = await getIndicesUnassignedShardStats(req, esIndexPattern, cluster); | ||
expect(stats.indices).toEqual(indices); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to also test status
here? Since, looks like you already have the right replica/primary counts to test for all three colors 💚 💛 ❤️
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should test for it now. There is a status
field here -> https://github.com/elastic/kibana/pull/54028/files/ffdb7d79aa65d7694eb3f2d88d45c16bedfcfc27#diff-3429702abd39406ddf3dc4c1ad63f5a6R12
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is awesome stuff @chrisronline! 🏆
My benchmarks were a little faster overall, but were still within similar margins. Maybe because I ran it from docker (or my computer is > than yours)
@elasticmachine merge upstream |
💚 Build SucceededHistory
To update your PR or re-run it, just comment with: |
* For the nodes listing page, do not fetch shard data for indices * Optimize our shard queries for the index and node listing pages * This change isn't necessary * Rename file and function * Use optimized query for ml jobs and es overview * Apply to node/index detail page, and more renaming * Unnecessary change * Fix tests * Add basic tests Co-authored-by: Elastic Machine <[email protected]>
* For the nodes listing page, do not fetch shard data for indices * Optimize our shard queries for the index and node listing pages * This change isn't necessary * Rename file and function * Use optimized query for ml jobs and es overview * Apply to node/index detail page, and more renaming * Unnecessary change * Fix tests * Add basic tests Co-authored-by: Elastic Machine <[email protected]> Co-authored-by: Elastic Machine <[email protected]>
Backport: 7.x: d811e4d |
* master: (69 commits) [Graph] Fix various a11y issues (elastic#54097) Add ApplicationService app status management (elastic#50223) logs in one time (elastic#54447) Deprecate using `elasticsearch.ssl.certificate` without `elasticsearch.ssl.key` and vice versa (elastic#54392) [Optimizer] Fix a stack overflow with watch_cache when it attempts to delete very large folders. (elastic#54457) Security - Role Mappings UI (elastic#53620) [SIEM] [Detection engine] Permission II (elastic#54292) Allow User to Cleanup Repository from UI (elastic#53047) [Detection engine] Some UX for rule creation (elastic#54471) share specific instances of some ui packages (elastic#54079) [ML] APM modules configs for RUM Javascript and NodeJS (elastic#53792) [APM] Delay rendering invalid license notification (elastic#53924) [Graph] Improve error message on graph requests (elastic#54230) [ILM] Kibana should allow a min_age setting of 0ms in ILM policy phases (elastic#53719) Unit Tests for common/lib (elastic#53736) [Graph] Only show explorable fields (elastic#54101) remove linting rule exception for markdown (elastic#54232) [Monitoring] Fetch shard data more efficiently (elastic#54028) [Maps] Add hiddenLayers option to embeddable map input (elastic#54355) Pass termOrder and hasTermsAgg properties to serializeThresholdWatch function (elastic#54391) ...
* master: (69 commits) [Graph] Fix various a11y issues (elastic#54097) Add ApplicationService app status management (elastic#50223) logs in one time (elastic#54447) Deprecate using `elasticsearch.ssl.certificate` without `elasticsearch.ssl.key` and vice versa (elastic#54392) [Optimizer] Fix a stack overflow with watch_cache when it attempts to delete very large folders. (elastic#54457) Security - Role Mappings UI (elastic#53620) [SIEM] [Detection engine] Permission II (elastic#54292) Allow User to Cleanup Repository from UI (elastic#53047) [Detection engine] Some UX for rule creation (elastic#54471) share specific instances of some ui packages (elastic#54079) [ML] APM modules configs for RUM Javascript and NodeJS (elastic#53792) [APM] Delay rendering invalid license notification (elastic#53924) [Graph] Improve error message on graph requests (elastic#54230) [ILM] Kibana should allow a min_age setting of 0ms in ILM policy phases (elastic#53719) Unit Tests for common/lib (elastic#53736) [Graph] Only show explorable fields (elastic#54101) remove linting rule exception for markdown (elastic#54232) [Monitoring] Fetch shard data more efficiently (elastic#54028) [Maps] Add hiddenLayers option to embeddable map input (elastic#54355) Pass termOrder and hasTermsAgg properties to serializeThresholdWatch function (elastic#54391) ...
While debugging some performance issues on an ESMS cluster with @pickypg, we discovered that our query to fetch shard data (in an oversharded environment) performed very poorly. It turns out that there are a few major issues with our existing query:
This PR fixes all of these issues and drastically improves the loading time of various ES monitoring pages that slow down for large clusters.
Performance
On a sample ESMS cluster (which is severely oversharded) in a constant, absolute time period, I tested the timing to fetch shard stats data.
Current
Indices listing: ~23s
Nodes listing: ~23s
ML jobs listing: ~23s
ES cluster overview: ~23s
Index detail page: ~23s
Node detail page: ~23s
PR
Indices listing: ~1.7s
Nodes listing: ~1.7s
ML jobs listing: ~1.7s
ES cluster overview: ~1.7s
Index detail page: ~215ms
Node detail page: ~1.2s
Testing
This is a bit tricky. The UI should be unaffected - the api should return the same data the UI needs so we're just looking to ensure we didn't miss something.
Notes