Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retry and prevent cleanup of local blocks for upto 5 iterations #3894

Closed
wants to merge 6 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
27 changes: 22 additions & 5 deletions pkg/block/block.go
Original file line number Diff line number Diff line change
Expand Up @@ -99,6 +99,12 @@ func Download(ctx context.Context, logger log.Logger, bucket objstore.Bucket, id
// TODO(bplotka): Ensure bucket operations have reasonable backoff retries.
// NOTE: Upload updates `meta.Thanos.File` section.
func Upload(ctx context.Context, logger log.Logger, bkt objstore.Bucket, bdir string, hf metadata.HashFunc) error {
return UploadWithRetry(ctx, logger, bkt, bdir, hf, 0)
}

// UploadWithRetry is a utility function for upload and acts as a workaround for absence of default parameters (which in this case is retryCounter = 0).
func UploadWithRetry(ctx context.Context, logger log.Logger, bkt objstore.Bucket, bdir string, hf metadata.HashFunc, retryCounter int) error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the name UploadWithRetry I will think it runs the Upload function with retries, but that is not the case here. It would have been better to have an function Upload(...) which handles the upload logic without retry, then a UploadWithRetries function that retries this Upload fn itself.

But we are only trying to retry when upload fails, so instead of having retries here, doesn't it makes more sense to have a objstore.UploadDirWithRetries that will handle this transparently.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is doing exactly that. Isn't it ? @prmsrswt 😕

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem with a separate function would be : we'd have to carry over the error which caused a failure, and keeping track of which error caused it to fail would become difficult otherwise. The only reason I made the UploadWithRetries function was to keep the integrity of Upload intact and not add any other parameter to it.
It would have been really great if Golang supported default params 😁

flag := false
df, err := os.Stat(bdir)
if err != nil {
return err
Expand All @@ -120,29 +126,40 @@ func Upload(ctx context.Context, logger log.Logger, bkt objstore.Bucket, bdir st
}

if meta.Thanos.Labels == nil || len(meta.Thanos.Labels) == 0 {
return errors.New("empty external labels are not allowed for Thanos block.")
return errors.New("empty external labels are not allowed for Thanos block")
}

meta.Thanos.Files, err = gatherFileStats(bdir, hf, logger)
if err != nil {
return errors.Wrap(err, "gather meta file stats")
}

metaEncoded := strings.Builder{}
if err := meta.Write(&metaEncoded); err != nil {
return errors.Wrap(err, "encode meta file")
}

if err := bkt.Upload(ctx, path.Join(DebugMetas, fmt.Sprintf("%s.json", id)), strings.NewReader(metaEncoded.String())); err != nil {
return cleanUp(logger, bkt, id, errors.Wrap(err, "upload debug meta file"))
if retryCounter == 5 {
return cleanUp(logger, bkt, id, errors.Wrap(err, "upload debug meta file"))
}
flag = true
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need this flag? We can simply do return UploadWithRetry(..., retryCounter+1) here to get the same result.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My first commit was exactly that, after which i shifted to this flag method.
Let's say we encounter an error and call Upload once again with retry x+1. If that executes successfully we'll be returned back to the scope of function with retry value x.
Actually I feel the flag check should take place in the end, here

rather than
if flag && retryCounter < 5 {

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than using recursive calls for the same, you could create a loop in which you retry for maxRetryCount times, in this case, 5, and if successful, you could break out of the loop. That would be more easier to follow, and you might not need a flag variable to do the same.

So something like this:

for i := 0; i < maxRetryCount; i++ {
     if err:= bkt.Upload(...); err == nil {
           break // bucket upload was successful
     }
     log.Debug('retrying again...')
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that makes sense, and instead of the for loop, I'll also go for exponential backoff for each upload

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why we cannot leverage/improve runutils pacakge for this?

e.g https://github.com/thanos-io/thanos/blob/main/pkg/runutil/runutil.go#L86

}

if err := objstore.UploadDir(ctx, logger, bkt, path.Join(bdir, ChunksDirname), path.Join(id.String(), ChunksDirname)); err != nil {
return cleanUp(logger, bkt, id, errors.Wrap(err, "upload chunks"))
if retryCounter == 5 {
return cleanUp(logger, bkt, id, errors.Wrap(err, "upload chunks"))
}
flag = true
}

if err := objstore.UploadFile(ctx, logger, bkt, path.Join(bdir, IndexFilename), path.Join(id.String(), IndexFilename)); err != nil {
return cleanUp(logger, bkt, id, errors.Wrap(err, "upload index"))
if retryCounter == 5 {
return cleanUp(logger, bkt, id, errors.Wrap(err, "upload index"))
}
flag = true
}
if flag && retryCounter < 5 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why 5?

You can answer the question here, but for anyone visiting this code in the future, it's a mystery number. We need to at least create a const for this, something like const UploadMaxRetries = 5 will do. But ideally, this value should be configurable by the users with a reasonable default. So you need to dynamically pass this value to the function as an argument.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it was only a first draft to check if the logic itself works at the first place. I'd definitely go for a user defined retryCounter value if the logic recieves a green flag 💯

return UploadWithRetry(ctx, logger, bkt, bdir, hf, retryCounter+1)
}

// Meta.json always need to be uploaded as a last item. This will allow to assume block directories without meta file to be pending uploads.
Expand Down
2 changes: 1 addition & 1 deletion pkg/block/block_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -207,7 +207,7 @@ func TestUpload(t *testing.T) {
testutil.Ok(t, err)
err = Upload(ctx, log.NewNopLogger(), bkt, path.Join(tmpDir, b2.String()), metadata.NoneFunc)
testutil.NotOk(t, err)
testutil.Equals(t, "empty external labels are not allowed for Thanos block.", err.Error())
testutil.Equals(t, "empty external labels are not allowed for Thanos block", err.Error())
testutil.Equals(t, 4, len(bkt.Objects()))
}
}
Expand Down