-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Retry and prevent cleanup of local blocks for upto 5 iterations #3894
Conversation
Signed-off-by: Biswajit Ghosh <[email protected]>
Signed-off-by: Biswajit Ghosh <[email protected]>
Signed-off-by: Biswajit Ghosh <[email protected]>
Signed-off-by: Biswajit Ghosh <[email protected]>
Signed-off-by: Biswajit Ghosh <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can use a retry package https://pkg.go.dev/github.com/cenkalti/backoff/v4#Retry which abstracts our boilerplate code for writing retry functions. Has different types of retry logic out of the box.
Do we need to import another package as an overload are we good with this logic itself ? WDYT ? @yashrsharma44 |
Signed-off-by: Biswajit Ghosh <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have added a few comments on this implementation, but I feel this might not be the right place to solve this problem. Implementing retries for the objstore.Upload/UploadDir
functions might be a better idea.
Also, the idea of retrying uploads with exponential backoff is worth a look, as upload errors can result from hitting the object storage rate limit.
if retryCounter == 5 { | ||
return cleanUp(logger, bkt, id, errors.Wrap(err, "upload debug meta file")) | ||
} | ||
flag = true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really need this flag? We can simply do return UploadWithRetry(..., retryCounter+1)
here to get the same result.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My first commit was exactly that, after which i shifted to this flag
method.
Let's say we encounter an error and call Upload
once again with retry x+1. If that executes successfully we'll be returned back to the scope of function with retry value x.
Actually I feel the flag check should take place in the end, here
Line 173 in fcb6293
Line 161 in fcb6293
if flag && retryCounter < 5 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than using recursive calls for the same, you could create a loop in which you retry for maxRetryCount
times, in this case, 5
, and if successful, you could break out of the loop. That would be more easier to follow, and you might not need a flag
variable to do the same.
So something like this:
for i := 0; i < maxRetryCount; i++ {
if err:= bkt.Upload(...); err == nil {
break // bucket upload was successful
}
log.Debug('retrying again...')
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that makes sense, and instead of the for loop, I'll also go for exponential backoff for each upload
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why we cannot leverage/improve runutils
pacakge for this?
e.g https://github.com/thanos-io/thanos/blob/main/pkg/runutil/runutil.go#L86
} | ||
|
||
// UploadWithRetry is a utility function for upload and acts as a workaround for absence of default parameters (which in this case is retryCounter = 0). | ||
func UploadWithRetry(ctx context.Context, logger log.Logger, bkt objstore.Bucket, bdir string, hf metadata.HashFunc, retryCounter int) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From the name UploadWithRetry
I will think it runs the Upload
function with retries, but that is not the case here. It would have been better to have an function Upload(...)
which handles the upload logic without retry, then a UploadWithRetries
function that retries this Upload
fn itself.
But we are only trying to retry when upload fails, so instead of having retries here, doesn't it makes more sense to have a objstore.UploadDirWithRetries
that will handle this transparently.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is doing exactly that. Isn't it ? @prmsrswt 😕
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem with a separate function would be : we'd have to carry over the error which caused a failure, and keeping track of which error caused it to fail would become difficult otherwise. The only reason I made the UploadWithRetries function was to keep the integrity of Upload intact and not add any other parameter to it.
It would have been really great if Golang supported default params 😁
} | ||
flag = true | ||
} | ||
if flag && retryCounter < 5 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why 5?
You can answer the question here, but for anyone visiting this code in the future, it's a mystery number. We need to at least create a const for this, something like const UploadMaxRetries = 5
will do. But ideally, this value should be configurable by the users with a reasonable default. So you need to dynamically pass this value to the function as an argument.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it was only a first draft to check if the logic itself works at the first place. I'd definitely go for a user defined retryCounter value if the logic recieves a green flag 💯
I'd surely try to have a look at this as well 🤗 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey, I am really sorry, but I think I mislead you in the #3744 (comment)
Added comment to explain, all. This ticket is mostly upload removing need for deletions on sidecar, not the retry logic. Sidecar will retry in next cycle so we don't need this. In fact such retries on this level are not recommended, see #3907 for bigger discussion. 🤗
hope that makes sense. Also we have another contributor trying to fix the same thing, so ideally we could help with #3756 (review, comment) before creating another PR, what do you think? (: We can talk about this on tomorrows 1:1
if retryCounter == 5 { | ||
return cleanUp(logger, bkt, id, errors.Wrap(err, "upload debug meta file")) | ||
} | ||
flag = true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why we cannot leverage/improve runutils
pacakge for this?
e.g https://github.com/thanos-io/thanos/blob/main/pkg/runutil/runutil.go#L86
No worries. We'll see where we go ahead with this tomorrow. I was anyway looking to break the initial jitters of committing to the codebase, and I guess I'm past it by now, so maybe we can also move forward with issues concerning vertical block sharding 😃 |
Definitely 🤗 |
Yes, sure. I'll close this one |
Checking Proof of concept for #3744