Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Metricbeat] Exponential backoff for http timeout in elasticsearch module #17948

Open
imotov opened this issue Apr 23, 2020 · 11 comments
Open

[Metricbeat] Exponential backoff for http timeout in elasticsearch module #17948

imotov opened this issue Apr 23, 2020 · 11 comments
Labels
discuss Issue needs further discussion. enhancement Team:Service-Integrations Label for the Service Integrations team [zube]: Investigate

Comments

@imotov
Copy link

imotov commented Apr 23, 2020

At the moment elasticsearch stats methods used by metricbeat's elasticsearch module don't have any internal timeouts, which means that elasticsearch will try to perform the request until it gets responses from all nodes or unresponsive nodes die. We have recently observed some cases (elastic/elasticsearch#50241 for example) where a data node in a small cluster was responding very very slowly but didn't disconnect from the cluster. Meanwhile metricbeat was sending requests to elasticsearch every 10 seconds with 10 seconds response timeout (default settings). Basically, we were adding 6 in-flight requests per minute. This caused an eventual accumulation of in-flight stats requests on the master node that cause it to crash with OOM error. We are addressing this issue on the elasticsearch side elastic/elasticsearch#55550 but I was hoping we can improve metricbeat's behavior as well by introducing an exponential backoff for the timeout value.

@ycombinator ycombinator added discuss Issue needs further discussion. enhancement Team:Services (Deprecated) Label for the former Integrations-Services team labels Apr 23, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/integrations-services (Team:Services)

@ycombinator
Copy link
Contributor

If/when we go down this path, we'd probably want to implement it for all HTTP-based modules.

@andresrc
Copy link
Contributor

We would need to consolidate this among the different beats. Also related to #16856

@botelastic
Copy link

botelastic bot commented Mar 28, 2021

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@botelastic botelastic bot added the Stalled label Mar 28, 2021
@willemdh
Copy link

Pinging this as we are seeing similar issues.

@botelastic botelastic bot removed the Stalled label Mar 29, 2021
@botelastic
Copy link

botelastic bot commented Mar 29, 2022

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@botelastic botelastic bot added the Stalled label Mar 29, 2022
@jlind23 jlind23 added Team:Service-Integrations Label for the Service Integrations team and removed Team:Services (Deprecated) Label for the former Integrations-Services team labels Mar 31, 2022
@botelastic botelastic bot removed the Stalled label Mar 31, 2022
@willemdh
Copy link

willemdh commented Apr 2, 2022

Imho still interesting

@botelastic
Copy link

botelastic bot commented Apr 2, 2023

Hi!
We just realized that we haven't looked into this issue in a while. We're sorry!

We're labeling this issue as Stale to make it hit our filters and make sure we get back to it as soon as possible. In the meantime, it'd be extremely helpful if you could take a look at it as well and confirm its relevance. A simple comment with a nice emoji will be enough :+1.
Thank you for your contribution!

@botelastic botelastic bot added the Stalled label Apr 2, 2023
@willemdh
Copy link

willemdh commented Apr 4, 2023

.

@botelastic botelastic bot removed the Stalled label Apr 4, 2023
@botelastic
Copy link

botelastic bot commented Apr 3, 2024

Hi!
We just realized that we haven't looked into this issue in a while. We're sorry!

We're labeling this issue as Stale to make it hit our filters and make sure we get back to it as soon as possible. In the meantime, it'd be extremely helpful if you could take a look at it as well and confirm its relevance. A simple comment with a nice emoji will be enough :+1.
Thank you for your contribution!

@botelastic botelastic bot added the Stalled label Apr 3, 2024
@willemdh
Copy link

willemdh commented Apr 4, 2024

Quite sure this is still needed

@botelastic botelastic bot removed the Stalled label Apr 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discuss Issue needs further discussion. enhancement Team:Service-Integrations Label for the Service Integrations team [zube]: Investigate
Projects
None yet
Development

No branches or pull requests

6 participants