Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate options to support a better recovery of overloaded backends #1515

Open
a-thaler opened this issue Oct 10, 2024 · 1 comment
Open
Labels
area/logs LogPipeline kind/feature Categorizes issue or PR as related to a new feature. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness.

Comments

@a-thaler
Copy link
Collaborator

Description

When using SAP Cloud Logging with the dev plan as log backend, a high log throughput can bring the instance into an overload situation easy. If the refusal rate is getting too high, the buffer used in fluent-bit will fill up as expected, and the items in the buffer will be retried repeatedly using a exponential backoff period. That retries of the buffered items will bring the system under load even more into trouble and a recovery is getting unlikely.
Especially, if the input log load is reduced, the retries of the buffered items will keep the backend so much under pressure that the recover might not succeed.

Goal: Investigate options to support a recovery of a overloaded backend

Ideas:

  • Support some kind of "dev" mode where the buffer is switched off or perform just one retry
  • Support a configurable "time-in-queue" setting to have the amount of retries configurable
  • Increase the minimal time between retries

First experiments showed that a increased minimal retry period has no big effect. Instead reducing the amount of retries to a low value (3) relaxed the situation very fast

@a-thaler a-thaler added kind/feature Categorizes issue or PR as related to a new feature. area/logs LogPipeline labels Oct 10, 2024
Copy link

This issue has been automatically marked as stale due to the lack of recent activity. It will soon be closed if no further activity occurs.
Thank you for your contributions.

@github-actions github-actions bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 10, 2024
@a-thaler a-thaler added lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Dec 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/logs LogPipeline kind/feature Categorizes issue or PR as related to a new feature. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness.
Projects
None yet
Development

No branches or pull requests

1 participant