-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Metricbeat] Exponential backoff for http timeout in elasticsearch module #17948
Comments
Pinging @elastic/integrations-services (Team:Services) |
If/when we go down this path, we'd probably want to implement it for all HTTP-based modules. |
We would need to consolidate this among the different beats. Also related to #16856 |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
Pinging this as we are seeing similar issues. |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
Imho still interesting |
Hi! We're labeling this issue as |
. |
Hi! We're labeling this issue as |
Quite sure this is still needed |
At the moment elasticsearch stats methods used by metricbeat's elasticsearch module don't have any internal timeouts, which means that elasticsearch will try to perform the request until it gets responses from all nodes or unresponsive nodes die. We have recently observed some cases (elastic/elasticsearch#50241 for example) where a data node in a small cluster was responding very very slowly but didn't disconnect from the cluster. Meanwhile metricbeat was sending requests to elasticsearch every 10 seconds with 10 seconds response timeout (default settings). Basically, we were adding 6 in-flight requests per minute. This caused an eventual accumulation of in-flight stats requests on the master node that cause it to crash with OOM error. We are addressing this issue on the elasticsearch side elastic/elasticsearch#55550 but I was hoping we can improve metricbeat's behavior as well by introducing an exponential backoff for the timeout value.
The text was updated successfully, but these errors were encountered: