-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Propagate retriable/haltable errors from compactor #1183
Conversation
Also, although this will fix compact exiting, the work the compactor has written to disk is thrown away when the compaction is retried. I'm thinking about a good way to avoid this - the simplest thing would be to just retry the bucket requests more times before returning an error. |
|
@GiedriusS sure, I can use that instead 👍 |
Updated, I'll squash the commits if we're happy to proceed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One suggestion, otherwise LGTM.
Thanks for this
Previously, a new error was created from all worker errors so the custom type was not propagated. If all errors returned are retriable, we assume it is safe to retry the compaction. Since the compactor may still return a single error this logic is preserved.
a1e8191
to
3dfb893
Compare
Rebased this against master and updated the import path against tsdb 0.8.0 👍 |
f89f92f
to
a658dbf
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perfect to me, LGTM thanks!
Dissmissing @povilasv review as all his suggestions were addressed and he is chilling on vacations (:
* Propagate retriable/haltable errors from compactor Previously, a new error was created from all worker errors so the custom type was not propagated. If all errors returned are retriable, we assume it is safe to retry the compaction. Since the compactor may still return a single error this logic is preserved. * Update MultiError import for tsdb 0.8.0 * Update import path in tests
Changes
We observed thanos compactor was exiting on upload errors. Previously, a new
error was created from all worker errors so the custom type was not propagated.
This introduces
go-multierror
to preserve the error types.If all errors returned are retriable, it is safe to retry the compaction.
Since the compactor may still return a single error this logic is preserved.
(Happy to refactor out the duplicated logic with individual error checks.)
Verification
Tests are passing and I'll run this change with real workloads shortly.