Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add metrics to auto scale based on indexing pressure #904

Merged
merged 2 commits into from
Jul 11, 2024

Conversation

tac-emil-andresen
Copy link
Contributor

Add metrics indexing_pressure.memory.limit_in_bytes and indexing_pressure.memory.current.current.all_in_bytes to allow auto-scaling based on how close the cluster nodes are to dropping indexing requests due to the indexing request memory buffer reaching capacity.

This change may address the following two issues:

#638
#875

There is an open pull request from over a year ago attempting to address issue 638:

#727

In pull request 727 a large number of indexing_pressure related metrics are included in the code changes. The comment chain indicates a concern that this could unnecessarily increase cardinality. In this pull request we only add the two metrics required to address the need to auto-scale based on indexing pressure.

We (Telus Agriculture and Consumer Goods) have a production ES cluster that we are upgrading from v7 to v8. As part of this upgrade the "elasticsearch_thread_pool_rejected_count" metric has been removed by the ES Dev Team because they switched from using a fixed length queue of indexing requests with a maximums size to using a memory buffer that defaults to 10% of available memory. In the past, when the queue would reach capacity, the cluster would start rejecting indexing requests and you could auto-scale up the cluster to address that pressure. Since the queue was eliminated we need a way to new way scale up based on indexing pressure so that we don't get behind on processing incoming requests. Based on our investigation, the new way to do this is to compare the total size of the indexing memory buffer to the current used amount of buffer. In this PR we add just the two metrics required to achieve auto scaling based on indexing pressure.

@tac-emil-andresen tac-emil-andresen changed the title Add metrics auto scale based on indexing pressure Add metrics to auto scale based on indexing pressure Jun 24, 2024

Verified

This commit was signed with the committer’s verified signature.
phillip-kruger Phillip Krüger
…sure.memory.current.current.all_in_bytes to allow auto-scaling based on how close the cluster nodes are to dropping indexing requests due to the indxing request memory buffer reaching capacity.

Signed-off-by: emilandresentac <emil.andresen@telusagcg.com>
@tac-emil-andresen
Copy link
Contributor Author

tac-emil-andresen commented Jun 27, 2024

Hi. I'm personally putting a $60 U.S. Dollar bounty on merging this PR (or an equivalent change) because my team needs it and seeing two other issues and a PR means I think there is some demand for this beyond just our team (and because open source maintainers are under appreciated). If you merge the PR please make sure you have your sponsor stuff setup in Github and I'll send you a one time $60 thank you.

Copy link
Contributor

@sysadmind sysadmind left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think overall, this looks good.

Based on the other PR not having any activity in the last year, I would be happy to move forward with this PR.

collector/nodes.go Outdated Show resolved Hide resolved
@sysadmind sysadmind self-assigned this Jul 6, 2024
@tac-emil-andresen
Copy link
Contributor Author

With the change to just include the cluster, host, and node labels rather than all the default labels, here is what the (sanitized) output from /metrics looks like:

elasticsearch_exporter % curl http://localhost:9114/metrics | grep pressure
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0# HELP elasticsearch_indexing_pressure_current_all_in_bytes Memory consumed, in bytes, by indexing requests in the coordinating, primary, or replica stage.

TYPE elasticsearch_indexing_pressure_current_all_in_bytes gauge

elasticsearch_indexing_pressure_current_all_in_bytes{cluster="production-cluster",host="10.1.2.14",indexing_pressure="memory",name="red"} 0
elasticsearch_indexing_pressure_current_all_in_bytes{cluster="production-cluster",host="10.1.2.15",indexing_pressure="memory",name="orange"} 0
elasticsearch_indexing_pressure_current_all_in_bytes{cluster="production-cluster",host="10.1.2.16",indexing_pressure="memory",name="yellow"} 0
elasticsearch_indexing_pressure_current_all_in_bytes{cluster="production-cluster",host="10.1.2.17",indexing_pressure="memory",name="green"} 0
elasticsearch_indexing_pressure_current_all_in_bytes{cluster="production-cluster",host="10.1.2.18",indexing_pressure="memory",name="blue"} 0
elasticsearch_indexing_pressure_current_all_in_bytes{cluster="production-cluster",host="10.1.2.19",indexing_pressure="memory",name="violet"} 0
elasticsearch_indexing_pressure_current_all_in_bytes{cluster="production-cluster",host="10.1.2.23",indexing_pressure="memory",name="cyan"} 0
elasticsearch_indexing_pressure_current_all_in_bytes{cluster="production-cluster",host="10.1.2.27",indexing_pressure="memory",name="magenta"} 768
elasticsearch_indexing_pressure_current_all_in_bytes{cluster="production-cluster",host="10.1.2.37",indexing_pressure="memory",name="amber"} 0
elasticsearch_indexing_pressure_current_all_in_bytes{cluster="production-cluster",host="10.1.2.4",indexing_pressure="memory",name="white"} 0
elasticsearch_indexing_pressure_current_all_in_bytes{cluster="production-cluster",host="10.1.2.43",indexing_pressure="memory",name="brown"} 0
elasticsearch_indexing_pressure_current_all_in_bytes{cluster="production-cluster",host="10.1.2.44",indexing_pressure="memory",name="black"} 0
elasticsearch_indexing_pressure_current_all_in_bytes{cluster="production-cluster",host="10.1.2.45",indexing_pressure="memory",name="gray"} 0
elasticsearch_indexing_pressure_current_all_in_bytes{cluster="production-cluster",host="10.1.2.46",indexing_pressure="memory",name="aqua"} 280
elasticsearch_indexing_pressure_current_all_in_bytes{cluster="production-cluster",host="10.1.2.48",indexing_pressure="memory",name="maroon"} 0
elasticsearch_indexing_pressure_current_all_in_bytes{cluster="production-cluster",host="10.1.2.50",indexing_pressure="memory",name="seafoam"} 0
elasticsearch_indexing_pressure_current_all_in_bytes{cluster="production-cluster",host="10.1.2.53",indexing_pressure="memory",name="chartruese"} 0
elasticsearch_indexing_pressure_current_all_in_bytes{cluster="production-cluster",host="10.1.2.8",indexing_pressure="memory",name="goldenrod"} 0

HELP elasticsearch_indexing_pressure_limit_in_bytes Configured memory limit, in bytes, for the indexing requests

TYPE elasticsearch_indexing_pressure_limit_in_bytes gauge

elasticsearch_indexing_pressure_limit_in_bytes{cluster="production-cluster",host="10.1.2.14",indexing_pressure="memory",name="red"} 8.24180736e+08
elasticsearch_indexing_pressure_limit_in_bytes{cluster="production-cluster",host="10.1.2.15",indexing_pressure="memory",name="orange"} 8.24180736e+08
elasticsearch_indexing_pressure_limit_in_bytes{cluster="production-cluster",host="10.1.2.16",indexing_pressure="memory",name="yellow"} 8.24180736e+08
elasticsearch_indexing_pressure_limit_in_bytes{cluster="production-cluster",host="10.1.2.17",indexing_pressure="memory",name="green"} 8.24180736e+08
elasticsearch_indexing_pressure_limit_in_bytes{cluster="production-cluster",host="10.1.2.18",indexing_pressure="memory",name="blue"} 8.24180736e+08
elasticsearch_indexing_pressure_limit_in_bytes{cluster="production-cluster",host="10.1.2.19",indexing_pressure="memory",name="violet"} 8.24180736e+08
elasticsearch_indexing_pressure_limit_in_bytes{cluster="production-cluster",host="10.1.2.23",indexing_pressure="memory",name="cyan"} 8.24180736e+08
elasticsearch_indexing_pressure_limit_in_bytes{cluster="production-cluster",host="10.1.2.27",indexing_pressure="memory",name="magenta"} 8.24180736e+08
elasticsearch_indexing_pressure_limit_in_bytes{cluster="production-cluster",host="10.1.2.37",indexing_pressure="memory",name="amber"} 8.24180736e+08
elasticsearch_indexing_pressure_limit_in_bytes{cluster="production-cluster",host="10.1.2.4",indexing_pressure="memory",name="white"} 8.22922444e+08
elasticsearch_indexing_pressure_limit_in_bytes{cluster="production-cluster",host="10.1.2.43",indexing_pressure="memory",name="brown"} 8.24180736e+08
elasticsearch_indexing_pressure_limit_in_bytes{cluster="production-cluster",host="10.1.2.44",indexing_pressure="memory",name="black"} 8.24180736e+08
elasticsearch_indexing_pressure_limit_in_bytes{cluster="production-cluster",host="10.1.2.45",indexing_pressure="memory",name="gray"} 8.24180736e+08
elasticsearch_indexing_pressure_limit_in_bytes{cluster="production-cluster",host="10.1.2.46",indexing_pressure="memory",name="aqua"} 8.22922444e+08
elasticsearch_indexing_pressure_limit_in_bytes{cluster="production-cluster",host="10.1.2.48",indexing_pressure="memory",name="maroon"} 8.22922444e+08
elasticsearch_indexing_pressure_limit_in_bytes{cluster="production-cluster",host="10.1.2.50",indexing_pressure="memory",name="seafoam"} 8.24180736e+08
elasticsearch_indexing_pressure_limit_in_bytes{cluster="production-cluster",host="10.1.2.53",indexing_pressure="memory",name="chartruese"} 8.24180736e+08
elasticsearch_indexing_pressure_limit_in_bytes{cluster="production-cluster",host="10.1.2.8",indexing_pressure="memory",name="goldenrod"} 8.24180736e+08
100 1296k 0 1296k 0 0 2897k 0 --:--:-- --:--:-- --:--:-- 2901k

Copy link
Contributor

@sysadmind sysadmind left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I think this looks good and can be merged, but the DCO check is failing. Can you please amend your latest commit with a sign off?

…de, and name to save on storage space.

Signed-off-by: emilandresentac <emil.andresen@telusagcg.com>
@tac-emil-andresen
Copy link
Contributor Author

Thanks! I think this looks good and can be merged, but the DCO check is failing. Can you please amend your latest commit with a sign off?

I've added the sign off to the last commit. I have not had to amend a commit on a fork before. So I used this command:

git commit --amend --signoff

Then "git push origin issue/638" gave me a message that "Updates were rejected because the tip of your current branch is behind". So I had to push the update to the branch using this command:

git push --force-with-lease origin issue/638

I hope that is correct in this scenario. The lines to be merged still look correct.

Many thanks for your help getting this merged!

Copy link
Contributor

@sysadmind sysadmind left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks!

@sysadmind sysadmind merged commit d13c555 into prometheus-community:master Jul 11, 2024
4 checks passed
@tac-emil-andresen
Copy link
Contributor Author

LGTM. Thanks!

Woo hoo! I promised a $60 bounty to get this merged. Where do you want it to go? If you have sponsorship setup in Github I can send it direct to you that way or via another channel like buymeacoffee.com. Or, I can donate it to the charity or open source foundation of your choice.

@sysadmind
Copy link
Contributor

@tac-emil-andresen I don't have sponsorship set up and I'm not sure it's worth it for me. If you want to donate, this is currently my charity of choice: https://www.thefarmette.org/donate
Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants