Investigate options to support a better recovery of overloaded backends #1515
Labels
area/logs
LogPipeline
kind/feature
Categorizes issue or PR as related to a new feature.
lifecycle/frozen
Indicates that an issue or PR should not be auto-closed due to staleness.
Description
When using SAP Cloud Logging with the
dev
plan as log backend, a high log throughput can bring the instance into an overload situation easy. If the refusal rate is getting too high, the buffer used in fluent-bit will fill up as expected, and the items in the buffer will be retried repeatedly using a exponential backoff period. That retries of the buffered items will bring the system under load even more into trouble and a recovery is getting unlikely.Especially, if the input log load is reduced, the retries of the buffered items will keep the backend so much under pressure that the recover might not succeed.
Goal: Investigate options to support a recovery of a overloaded backend
Ideas:
First experiments showed that a increased minimal retry period has no big effect. Instead reducing the amount of retries to a low value (3) relaxed the situation very fast
The text was updated successfully, but these errors were encountered: