-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cherry-pick #21493 to 7.x: [libbeat] Add configurable exponential backoff for disk queue write errors #21497
Conversation
…rrors (elastic#21493) (cherry picked from commit b0236ee)
Pinging @elastic/integrations (Team:Integrations) |
💔 Tests FailedExpand to view the summary
Build stats
Test stats 🧪
Test errorsExpand to view the tests failures
Steps errorsExpand to view the steps failures
Log outputExpand to view the last 100 lines of log output
|
Cherry-pick of PR #21493 to 7.x branch. Original message:
What does this PR do?
This PR adds user-configurable fields
retry_interval
andmax_retry_interval
to the disk queue, and uses them to perform exponential backoff when encountering fatal errors writing to disk.I'm aware that there are some existing helper wrappers for this functionality, e.g.
ExpBackoff
inlibbeat/common/backoff
. Unfortunately they didn't fit the cancellation or error handling model in the queue, so the backoff here is done "by hand." I've tried to restrict the moving parts to self-contained helper functions.I have made corresponding changes to the documentationI have made corresponding change to the default configuration filesI have added tests that prove my fix is effective or that my feature worksI have added an entry inCHANGELOG.next.asciidoc
orCHANGELOG-developer.next.asciidoc
.How to test this PR locally
Enable the disk queue (with e.g.
queue.disk.max_size: 1GB
in the beat config) and start the beat. While it's running, remove write permissions todata/diskqueue
. This should log errors for the writer and deleter (if applicable), e.g.:By default, any such errors should start 1 second apart and grow by powers of 2 up to 30 seconds. This default can be changed by setting
queue.disk.{retry_interval, max_retry_interval}
.