Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retry and prevent cleanup of local blocks for upto 5 iterations #3894

Closed
wants to merge 6 commits into from

Conversation

Biswajitghosh98
Copy link
Contributor

Checking Proof of concept for #3744

Signed-off-by: Biswajit Ghosh <[email protected]>
@Biswajitghosh98 Biswajitghosh98 marked this pull request as draft March 7, 2021 19:01
@Biswajitghosh98
Copy link
Contributor Author

cc @bwplotka @kakkoyun

Signed-off-by: Biswajit Ghosh <[email protected]>
@Biswajitghosh98 Biswajitghosh98 marked this pull request as ready for review March 8, 2021 06:30
Signed-off-by: Biswajit Ghosh <[email protected]>
pkg/block/block.go Outdated Show resolved Hide resolved
Copy link
Contributor

@yashrsharma44 yashrsharma44 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can use a retry package https://pkg.go.dev/github.com/cenkalti/backoff/v4#Retry which abstracts our boilerplate code for writing retry functions. Has different types of retry logic out of the box.

@Biswajitghosh98
Copy link
Contributor Author

I think we can use a retry package https://pkg.go.dev/github.com/cenkalti/backoff/v4#Retry which abstracts our boilerplate code for writing retry functions. Has different types of retry logic out of the box.

Do we need to import another package as an overload are we good with this logic itself ? WDYT ? @yashrsharma44

Signed-off-by: Biswajit Ghosh <[email protected]>
Copy link
Member

@onprem onprem left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added a few comments on this implementation, but I feel this might not be the right place to solve this problem. Implementing retries for the objstore.Upload/UploadDir functions might be a better idea.

Also, the idea of retrying uploads with exponential backoff is worth a look, as upload errors can result from hitting the object storage rate limit.

if retryCounter == 5 {
return cleanUp(logger, bkt, id, errors.Wrap(err, "upload debug meta file"))
}
flag = true
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need this flag? We can simply do return UploadWithRetry(..., retryCounter+1) here to get the same result.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My first commit was exactly that, after which i shifted to this flag method.
Let's say we encounter an error and call Upload once again with retry x+1. If that executes successfully we'll be returned back to the scope of function with retry value x.
Actually I feel the flag check should take place in the end, here

rather than
if flag && retryCounter < 5 {

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than using recursive calls for the same, you could create a loop in which you retry for maxRetryCount times, in this case, 5, and if successful, you could break out of the loop. That would be more easier to follow, and you might not need a flag variable to do the same.

So something like this:

for i := 0; i < maxRetryCount; i++ {
     if err:= bkt.Upload(...); err == nil {
           break // bucket upload was successful
     }
     log.Debug('retrying again...')
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that makes sense, and instead of the for loop, I'll also go for exponential backoff for each upload

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why we cannot leverage/improve runutils pacakge for this?

e.g https://github.com/thanos-io/thanos/blob/main/pkg/runutil/runutil.go#L86

}

// UploadWithRetry is a utility function for upload and acts as a workaround for absence of default parameters (which in this case is retryCounter = 0).
func UploadWithRetry(ctx context.Context, logger log.Logger, bkt objstore.Bucket, bdir string, hf metadata.HashFunc, retryCounter int) error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the name UploadWithRetry I will think it runs the Upload function with retries, but that is not the case here. It would have been better to have an function Upload(...) which handles the upload logic without retry, then a UploadWithRetries function that retries this Upload fn itself.

But we are only trying to retry when upload fails, so instead of having retries here, doesn't it makes more sense to have a objstore.UploadDirWithRetries that will handle this transparently.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is doing exactly that. Isn't it ? @prmsrswt 😕

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem with a separate function would be : we'd have to carry over the error which caused a failure, and keeping track of which error caused it to fail would become difficult otherwise. The only reason I made the UploadWithRetries function was to keep the integrity of Upload intact and not add any other parameter to it.
It would have been really great if Golang supported default params 😁

}
flag = true
}
if flag && retryCounter < 5 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why 5?

You can answer the question here, but for anyone visiting this code in the future, it's a mystery number. We need to at least create a const for this, something like const UploadMaxRetries = 5 will do. But ideally, this value should be configurable by the users with a reasonable default. So you need to dynamically pass this value to the function as an argument.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it was only a first draft to check if the logic itself works at the first place. I'd definitely go for a user defined retryCounter value if the logic recieves a green flag 💯

@Biswajitghosh98
Copy link
Contributor Author

I have added a few comments on this implementation, but I feel this might not be the right place to solve this problem. Implementing retries for the objstore.Upload/UploadDir functions might be a better idea.

Also, the idea of retrying uploads with exponential backoff is worth a look, as upload errors can result from hitting the object storage rate limit.

I'd surely try to have a look at this as well 🤗

Copy link
Member

@bwplotka bwplotka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey, I am really sorry, but I think I mislead you in the #3744 (comment)

Added comment to explain, all. This ticket is mostly upload removing need for deletions on sidecar, not the retry logic. Sidecar will retry in next cycle so we don't need this. In fact such retries on this level are not recommended, see #3907 for bigger discussion. 🤗

hope that makes sense. Also we have another contributor trying to fix the same thing, so ideally we could help with #3756 (review, comment) before creating another PR, what do you think? (: We can talk about this on tomorrows 1:1

if retryCounter == 5 {
return cleanUp(logger, bkt, id, errors.Wrap(err, "upload debug meta file"))
}
flag = true
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why we cannot leverage/improve runutils pacakge for this?

e.g https://github.com/thanos-io/thanos/blob/main/pkg/runutil/runutil.go#L86

@Biswajitghosh98
Copy link
Contributor Author

Hey, I am really sorry, but I think I mislead you in the #3744 (comment)

Added comment to explain, all. This ticket is mostly upload removing need for deletions on sidecar, not the retry logic. Sidecar will retry in next cycle so we don't need this. In fact such retries on this level are not recommended, see #3907 for bigger discussion. 🤗

hope that makes sense. Also we have another contributor trying to fix the same thing, so ideally we could help with #3756 (review, comment) before creating another PR, what do you think? (: We can talk about this on tomorrows 1:1

No worries. We'll see where we go ahead with this tomorrow. I was anyway looking to break the initial jitters of committing to the codebase, and I guess I'm past it by now, so maybe we can also move forward with issues concerning vertical block sharding 😃

@bwplotka
Copy link
Member

Definitely 🤗
Should we close this then?

@Biswajitghosh98
Copy link
Contributor Author

Definitely 🤗
Should we close this then?

Yes, sure. I'll close this one

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants