Add metrics to auto scale based on indexing pressure #904

tac-emil-andresen · 2024-06-24T17:14:17Z

Add metrics indexing_pressure.memory.limit_in_bytes and indexing_pressure.memory.current.current.all_in_bytes to allow auto-scaling based on how close the cluster nodes are to dropping indexing requests due to the indexing request memory buffer reaching capacity.

This change may address the following two issues:

#638
#875

There is an open pull request from over a year ago attempting to address issue 638:

#727

In pull request 727 a large number of indexing_pressure related metrics are included in the code changes. The comment chain indicates a concern that this could unnecessarily increase cardinality. In this pull request we only add the two metrics required to address the need to auto-scale based on indexing pressure.

We (Telus Agriculture and Consumer Goods) have a production ES cluster that we are upgrading from v7 to v8. As part of this upgrade the "elasticsearch_thread_pool_rejected_count" metric has been removed by the ES Dev Team because they switched from using a fixed length queue of indexing requests with a maximums size to using a memory buffer that defaults to 10% of available memory. In the past, when the queue would reach capacity, the cluster would start rejecting indexing requests and you could auto-scale up the cluster to address that pressure. Since the queue was eliminated we need a way to new way scale up based on indexing pressure so that we don't get behind on processing incoming requests. Based on our investigation, the new way to do this is to compare the total size of the indexing memory buffer to the current used amount of buffer. In this PR we add just the two metrics required to achieve auto scaling based on indexing pressure.

…sure.memory.current.current.all_in_bytes to allow auto-scaling based on how close the cluster nodes are to dropping indexing requests due to the indxing request memory buffer reaching capacity. Signed-off-by: emilandresentac <emil.andresen@telusagcg.com>

tac-emil-andresen · 2024-06-27T16:07:52Z

Hi. I'm personally putting a $60 U.S. Dollar bounty on merging this PR (or an equivalent change) because my team needs it and seeing two other issues and a PR means I think there is some demand for this beyond just our team (and because open source maintainers are under appreciated). If you merge the PR please make sure you have your sponsor stuff setup in Github and I'll send you a one time $60 thank you.

sysadmind

I think overall, this looks good.

Based on the other PR not having any activity in the last year, I would be happy to move forward with this PR.

collector/nodes.go

tac-emil-andresen · 2024-07-06T20:45:04Z

With the change to just include the cluster, host, and node labels rather than all the default labels, here is what the (sanitized) output from /metrics looks like:

elasticsearch_exporter % curl http://localhost:9114/metrics | grep pressure
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0# HELP elasticsearch_indexing_pressure_current_all_in_bytes Memory consumed, in bytes, by indexing requests in the coordinating, primary, or replica stage.

TYPE elasticsearch_indexing_pressure_current_all_in_bytes gauge

elasticsearch_indexing_pressure_current_all_in_bytes{cluster="production-cluster",host="10.1.2.14",indexing_pressure="memory",name="red"} 0
elasticsearch_indexing_pressure_current_all_in_bytes{cluster="production-cluster",host="10.1.2.15",indexing_pressure="memory",name="orange"} 0
elasticsearch_indexing_pressure_current_all_in_bytes{cluster="production-cluster",host="10.1.2.16",indexing_pressure="memory",name="yellow"} 0
elasticsearch_indexing_pressure_current_all_in_bytes{cluster="production-cluster",host="10.1.2.17",indexing_pressure="memory",name="green"} 0
elasticsearch_indexing_pressure_current_all_in_bytes{cluster="production-cluster",host="10.1.2.18",indexing_pressure="memory",name="blue"} 0
elasticsearch_indexing_pressure_current_all_in_bytes{cluster="production-cluster",host="10.1.2.19",indexing_pressure="memory",name="violet"} 0
elasticsearch_indexing_pressure_current_all_in_bytes{cluster="production-cluster",host="10.1.2.23",indexing_pressure="memory",name="cyan"} 0
elasticsearch_indexing_pressure_current_all_in_bytes{cluster="production-cluster",host="10.1.2.27",indexing_pressure="memory",name="magenta"} 768
elasticsearch_indexing_pressure_current_all_in_bytes{cluster="production-cluster",host="10.1.2.37",indexing_pressure="memory",name="amber"} 0
elasticsearch_indexing_pressure_current_all_in_bytes{cluster="production-cluster",host="10.1.2.4",indexing_pressure="memory",name="white"} 0
elasticsearch_indexing_pressure_current_all_in_bytes{cluster="production-cluster",host="10.1.2.43",indexing_pressure="memory",name="brown"} 0
elasticsearch_indexing_pressure_current_all_in_bytes{cluster="production-cluster",host="10.1.2.44",indexing_pressure="memory",name="black"} 0
elasticsearch_indexing_pressure_current_all_in_bytes{cluster="production-cluster",host="10.1.2.45",indexing_pressure="memory",name="gray"} 0
elasticsearch_indexing_pressure_current_all_in_bytes{cluster="production-cluster",host="10.1.2.46",indexing_pressure="memory",name="aqua"} 280
elasticsearch_indexing_pressure_current_all_in_bytes{cluster="production-cluster",host="10.1.2.48",indexing_pressure="memory",name="maroon"} 0
elasticsearch_indexing_pressure_current_all_in_bytes{cluster="production-cluster",host="10.1.2.50",indexing_pressure="memory",name="seafoam"} 0
elasticsearch_indexing_pressure_current_all_in_bytes{cluster="production-cluster",host="10.1.2.53",indexing_pressure="memory",name="chartruese"} 0
elasticsearch_indexing_pressure_current_all_in_bytes{cluster="production-cluster",host="10.1.2.8",indexing_pressure="memory",name="goldenrod"} 0

HELP elasticsearch_indexing_pressure_limit_in_bytes Configured memory limit, in bytes, for the indexing requests

TYPE elasticsearch_indexing_pressure_limit_in_bytes gauge

elasticsearch_indexing_pressure_limit_in_bytes{cluster="production-cluster",host="10.1.2.14",indexing_pressure="memory",name="red"} 8.24180736e+08
elasticsearch_indexing_pressure_limit_in_bytes{cluster="production-cluster",host="10.1.2.15",indexing_pressure="memory",name="orange"} 8.24180736e+08
elasticsearch_indexing_pressure_limit_in_bytes{cluster="production-cluster",host="10.1.2.16",indexing_pressure="memory",name="yellow"} 8.24180736e+08
elasticsearch_indexing_pressure_limit_in_bytes{cluster="production-cluster",host="10.1.2.17",indexing_pressure="memory",name="green"} 8.24180736e+08
elasticsearch_indexing_pressure_limit_in_bytes{cluster="production-cluster",host="10.1.2.18",indexing_pressure="memory",name="blue"} 8.24180736e+08
elasticsearch_indexing_pressure_limit_in_bytes{cluster="production-cluster",host="10.1.2.19",indexing_pressure="memory",name="violet"} 8.24180736e+08
elasticsearch_indexing_pressure_limit_in_bytes{cluster="production-cluster",host="10.1.2.23",indexing_pressure="memory",name="cyan"} 8.24180736e+08
elasticsearch_indexing_pressure_limit_in_bytes{cluster="production-cluster",host="10.1.2.27",indexing_pressure="memory",name="magenta"} 8.24180736e+08
elasticsearch_indexing_pressure_limit_in_bytes{cluster="production-cluster",host="10.1.2.37",indexing_pressure="memory",name="amber"} 8.24180736e+08
elasticsearch_indexing_pressure_limit_in_bytes{cluster="production-cluster",host="10.1.2.4",indexing_pressure="memory",name="white"} 8.22922444e+08
elasticsearch_indexing_pressure_limit_in_bytes{cluster="production-cluster",host="10.1.2.43",indexing_pressure="memory",name="brown"} 8.24180736e+08
elasticsearch_indexing_pressure_limit_in_bytes{cluster="production-cluster",host="10.1.2.44",indexing_pressure="memory",name="black"} 8.24180736e+08
elasticsearch_indexing_pressure_limit_in_bytes{cluster="production-cluster",host="10.1.2.45",indexing_pressure="memory",name="gray"} 8.24180736e+08
elasticsearch_indexing_pressure_limit_in_bytes{cluster="production-cluster",host="10.1.2.46",indexing_pressure="memory",name="aqua"} 8.22922444e+08
elasticsearch_indexing_pressure_limit_in_bytes{cluster="production-cluster",host="10.1.2.48",indexing_pressure="memory",name="maroon"} 8.22922444e+08
elasticsearch_indexing_pressure_limit_in_bytes{cluster="production-cluster",host="10.1.2.50",indexing_pressure="memory",name="seafoam"} 8.24180736e+08
elasticsearch_indexing_pressure_limit_in_bytes{cluster="production-cluster",host="10.1.2.53",indexing_pressure="memory",name="chartruese"} 8.24180736e+08
elasticsearch_indexing_pressure_limit_in_bytes{cluster="production-cluster",host="10.1.2.8",indexing_pressure="memory",name="goldenrod"} 8.24180736e+08
100 1296k 0 1296k 0 0 2897k 0 --:--:-- --:--:-- --:--:-- 2901k

sysadmind

Thanks! I think this looks good and can be merged, but the DCO check is failing. Can you please amend your latest commit with a sign off?

…de, and name to save on storage space. Signed-off-by: emilandresentac <emil.andresen@telusagcg.com>

tac-emil-andresen · 2024-07-11T16:06:55Z

Thanks! I think this looks good and can be merged, but the DCO check is failing. Can you please amend your latest commit with a sign off?

I've added the sign off to the last commit. I have not had to amend a commit on a fork before. So I used this command:

git commit --amend --signoff

Then "git push origin issue/638" gave me a message that "Updates were rejected because the tip of your current branch is behind". So I had to push the update to the branch using this command:

git push --force-with-lease origin issue/638

I hope that is correct in this scenario. The lines to be merged still look correct.

Many thanks for your help getting this merged!

sysadmind

LGTM. Thanks!

tac-emil-andresen · 2024-07-11T18:01:18Z

LGTM. Thanks!

Woo hoo! I promised a $60 bounty to get this merged. Where do you want it to go? If you have sponsorship setup in Github I can send it direct to you that way or via another channel like buymeacoffee.com. Or, I can donate it to the charity or open source foundation of your choice.

sysadmind · 2024-09-28T16:35:48Z

@tac-emil-andresen I don't have sponsorship set up and I'm not sure it's worth it for me. If you want to donate, this is currently my charity of choice: https://www.thefarmette.org/donate
Thanks

tac-emil-andresen changed the title ~~Add metrics auto scale based on indexing pressure~~ Add metrics to auto scale based on indexing pressure Jun 24, 2024

tac-emil-andresen force-pushed the issue/638 branch from 8f425be to 40b06f9 Compare June 24, 2024 18:07

sysadmind requested changes Jul 6, 2024

View reviewed changes

collector/nodes.go Outdated Show resolved Hide resolved

sysadmind self-assigned this Jul 6, 2024

sysadmind reviewed Jul 7, 2024

View reviewed changes

Reduce labels per metric for indexing pressure metrics to cluster, no…

Loading
Loading status checks…

d607c9f

…de, and name to save on storage space. Signed-off-by: emilandresentac <emil.andresen@telusagcg.com>

tac-emil-andresen force-pushed the issue/638 branch from b9fea3e to d607c9f Compare July 11, 2024 15:53

sysadmind approved these changes Jul 11, 2024

View reviewed changes

sysadmind merged commit d13c555 into prometheus-community:master Jul 11, 2024
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add metrics to auto scale based on indexing pressure #904

Add metrics to auto scale based on indexing pressure #904

tac-emil-andresen commented Jun 24, 2024

tac-emil-andresen commented Jun 27, 2024 •

edited

Loading

sysadmind left a comment

tac-emil-andresen commented Jul 6, 2024

sysadmind left a comment

tac-emil-andresen commented Jul 11, 2024

sysadmind left a comment

tac-emil-andresen commented Jul 11, 2024

sysadmind commented Sep 28, 2024

Add metrics to auto scale based on indexing pressure #904

Add metrics to auto scale based on indexing pressure #904

Conversation

tac-emil-andresen commented Jun 24, 2024

tac-emil-andresen commented Jun 27, 2024 • edited Loading

sysadmind left a comment

Choose a reason for hiding this comment

tac-emil-andresen commented Jul 6, 2024

TYPE elasticsearch_indexing_pressure_current_all_in_bytes gauge

HELP elasticsearch_indexing_pressure_limit_in_bytes Configured memory limit, in bytes, for the indexing requests

TYPE elasticsearch_indexing_pressure_limit_in_bytes gauge

sysadmind left a comment

Choose a reason for hiding this comment

tac-emil-andresen commented Jul 11, 2024

sysadmind left a comment

Choose a reason for hiding this comment

tac-emil-andresen commented Jul 11, 2024

sysadmind commented Sep 28, 2024

tac-emil-andresen commented Jun 27, 2024 •

edited

Loading