Propagate retriable/haltable errors from compactor #1183

mattrco · 2019-05-28T15:52:38Z

Changes

We observed thanos compactor was exiting on upload errors. Previously, a new
error was created from all worker errors so the custom type was not propagated.
This introduces go-multierror to preserve the error types.

If all errors returned are retriable, it is safe to retry the compaction.

Since the compactor may still return a single error this logic is preserved.

(Happy to refactor out the duplicated logic with individual error checks.)

Verification

Tests are passing and I'll run this change with real workloads shortly.

mattrco · 2019-05-28T15:58:20Z

Also, although this will fix compact exiting, the work the compactor has written to disk is thrown away when the compaction is retried. I'm thinking about a good way to avoid this - the simplest thing would be to just retry the bucket requests more times before returning an error.

GiedriusS · 2019-05-29T07:08:34Z

prometheus/tsdb already has a struct for this tsdb.MultiError that we use in some places already. Would it be possible to reuse it here?

mattrco · 2019-05-29T12:01:35Z

@GiedriusS sure, I can use that instead 👍

mattrco · 2019-05-29T13:24:30Z

Updated, I'll squash the commits if we're happy to proceed.

bwplotka

One suggestion, otherwise LGTM.

Thanks for this

cmd/thanos/compact.go

mattrco · 2019-06-04T10:11:25Z

@bwplotka @povilasv if you have a moment to take another look at this, that'd be much appreciated 🙇

Previously, a new error was created from all worker errors so the custom type was not propagated. If all errors returned are retriable, we assume it is safe to retry the compaction. Since the compactor may still return a single error this logic is preserved.

mattrco · 2019-06-05T14:57:11Z

Rebased this against master and updated the import path against tsdb 0.8.0 👍

bwplotka

Perfect to me, LGTM thanks!

Dissmissing @povilasv review as all his suggestions were addressed and he is chilling on vacations (:

on vacations

* Propagate retriable/haltable errors from compactor Previously, a new error was created from all worker errors so the custom type was not propagated. If all errors returned are retriable, we assume it is safe to retry the compaction. Since the compactor may still return a single error this logic is preserved. * Update MultiError import for tsdb 0.8.0 * Update import path in tests

bwplotka requested review from povilasv, bwplotka and GiedriusS May 28, 2019 18:28

mattrco mentioned this pull request May 29, 2019

Ensure retriable errors are propagated monzo/thanos#2

Closed

bwplotka requested changes May 30, 2019

View reviewed changes

cmd/thanos/compact.go Outdated Show resolved Hide resolved

povilasv previously requested changes May 31, 2019

View reviewed changes

cmd/thanos/compact.go Outdated Show resolved Hide resolved

cmd/thanos/compact.go Outdated Show resolved Hide resolved

mattrco force-pushed the mattrco/propagate-retries branch from a1e8191 to 3dfb893 Compare June 5, 2019 14:24

Update MultiError import for tsdb 0.8.0

1913d0b

Update import path in tests

a658dbf

mattrco force-pushed the mattrco/propagate-retries branch from f89f92f to a658dbf Compare June 5, 2019 15:02

bwplotka approved these changes Jun 6, 2019

View reviewed changes

bwplotka merged commit ce1b22a into thanos-io:master Jun 6, 2019

mattrco deleted the mattrco/propagate-retries branch June 6, 2019 15:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Propagate retriable/haltable errors from compactor #1183

Propagate retriable/haltable errors from compactor #1183

mattrco commented May 28, 2019

mattrco commented May 28, 2019 •

edited

Loading

GiedriusS commented May 29, 2019

mattrco commented May 29, 2019

mattrco commented May 29, 2019

bwplotka left a comment •

edited

Loading

mattrco commented Jun 4, 2019

mattrco commented Jun 5, 2019

bwplotka left a comment •

edited

Loading

Propagate retriable/haltable errors from compactor #1183

Propagate retriable/haltable errors from compactor #1183

Conversation

mattrco commented May 28, 2019

Changes

Verification

mattrco commented May 28, 2019 • edited Loading

GiedriusS commented May 29, 2019

mattrco commented May 29, 2019

mattrco commented May 29, 2019

bwplotka left a comment • edited Loading

Choose a reason for hiding this comment

mattrco commented Jun 4, 2019

mattrco commented Jun 5, 2019

bwplotka left a comment • edited Loading

Choose a reason for hiding this comment

mattrco commented May 28, 2019 •

edited

Loading

bwplotka left a comment •

edited

Loading

bwplotka left a comment •

edited

Loading