-
Notifications
You must be signed in to change notification settings - Fork 794
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add metrics to auto scale based on indexing pressure #904
Conversation
…sure.memory.current.current.all_in_bytes to allow auto-scaling based on how close the cluster nodes are to dropping indexing requests due to the indxing request memory buffer reaching capacity. Signed-off-by: emilandresentac <emil.andresen@telusagcg.com>
8f425be
to
40b06f9
Compare
Hi. I'm personally putting a $60 U.S. Dollar bounty on merging this PR (or an equivalent change) because my team needs it and seeing two other issues and a PR means I think there is some demand for this beyond just our team (and because open source maintainers are under appreciated). If you merge the PR please make sure you have your sponsor stuff setup in Github and I'll send you a one time $60 thank you. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think overall, this looks good.
Based on the other PR not having any activity in the last year, I would be happy to move forward with this PR.
With the change to just include the cluster, host, and node labels rather than all the default labels, here is what the (sanitized) output from /metrics looks like: elasticsearch_exporter % curl http://localhost:9114/metrics | grep pressure TYPE elasticsearch_indexing_pressure_current_all_in_bytes gaugeelasticsearch_indexing_pressure_current_all_in_bytes{cluster="production-cluster",host="10.1.2.14",indexing_pressure="memory",name="red"} 0 HELP elasticsearch_indexing_pressure_limit_in_bytes Configured memory limit, in bytes, for the indexing requestsTYPE elasticsearch_indexing_pressure_limit_in_bytes gaugeelasticsearch_indexing_pressure_limit_in_bytes{cluster="production-cluster",host="10.1.2.14",indexing_pressure="memory",name="red"} 8.24180736e+08 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! I think this looks good and can be merged, but the DCO check is failing. Can you please amend your latest commit with a sign off?
…de, and name to save on storage space. Signed-off-by: emilandresentac <emil.andresen@telusagcg.com>
b9fea3e
to
d607c9f
Compare
I've added the sign off to the last commit. I have not had to amend a commit on a fork before. So I used this command: git commit --amend --signoff Then "git push origin issue/638" gave me a message that "Updates were rejected because the tip of your current branch is behind". So I had to push the update to the branch using this command: git push --force-with-lease origin issue/638 I hope that is correct in this scenario. The lines to be merged still look correct. Many thanks for your help getting this merged! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks!
Woo hoo! I promised a $60 bounty to get this merged. Where do you want it to go? If you have sponsorship setup in Github I can send it direct to you that way or via another channel like buymeacoffee.com. Or, I can donate it to the charity or open source foundation of your choice. |
@tac-emil-andresen I don't have sponsorship set up and I'm not sure it's worth it for me. If you want to donate, this is currently my charity of choice: https://www.thefarmette.org/donate |
Add metrics indexing_pressure.memory.limit_in_bytes and indexing_pressure.memory.current.current.all_in_bytes to allow auto-scaling based on how close the cluster nodes are to dropping indexing requests due to the indexing request memory buffer reaching capacity.
This change may address the following two issues:
#638
#875
There is an open pull request from over a year ago attempting to address issue 638:
#727
In pull request 727 a large number of indexing_pressure related metrics are included in the code changes. The comment chain indicates a concern that this could unnecessarily increase cardinality. In this pull request we only add the two metrics required to address the need to auto-scale based on indexing pressure.
We (Telus Agriculture and Consumer Goods) have a production ES cluster that we are upgrading from v7 to v8. As part of this upgrade the "elasticsearch_thread_pool_rejected_count" metric has been removed by the ES Dev Team because they switched from using a fixed length queue of indexing requests with a maximums size to using a memory buffer that defaults to 10% of available memory. In the past, when the queue would reach capacity, the cluster would start rejecting indexing requests and you could auto-scale up the cluster to address that pressure. Since the queue was eliminated we need a way to new way scale up based on indexing pressure so that we don't get behind on processing incoming requests. Based on our investigation, the new way to do this is to compare the total size of the indexing memory buffer to the current used amount of buffer. In this PR we add just the two metrics required to achieve auto scaling based on indexing pressure.