Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

downsample: retry objstore related errors #7194

Merged
merged 3 commits into from
Mar 18, 2024

Conversation

xBazilio
Copy link
Contributor

@xBazilio xBazilio commented Mar 7, 2024

  • I added CHANGELOG entry for this change.
  • Change is not relevant to the end user.

Changes

Verification

@pull-request-size pull-request-size bot added size/S and removed size/XS labels Mar 7, 2024
@xBazilio xBazilio marked this pull request as ready for review March 7, 2024 14:57
@@ -419,7 +420,7 @@ func processDownsampling(

err = block.Upload(ctx, logger, bkt, resdir, hashFunc)
if err != nil {
return errors.Wrapf(err, "upload downsampled block %s", id)
return compact.NewRetryError(errors.Wrapf(err, "upload downsampled block %s", id))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From scanning the code, I dont think this is consumed somewhere right now, right? I dont think this will lead to retries currently!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I understand it
processDownsampling returns error to
downsampleBucket which returns error to compactMainFn in cmd/thanos/compact.go
then in cmd/thanos/compact.go there's a if compact.IsRetryError(err) { check
which should trigger

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assuming we run compactor with --wait flag

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it's better to retry right here instead of returning an error. This way the compactor will not have to go through the whole cycle again and downsample the block from the beginning.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought to make it less intrusive, retry like compaction process is retried.
I can retry upload/download calls. But it would be good to make the same logic in compaction - retry upload/download calls. And, maybe, expose a parameter --objstore.file-retries Maximum number of retries for fetch/upload block files from object storage.
What do you say, @fpetkovski ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, my bad!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xBazilio sounds good, we can keep things consistent for now. Nothing is set in stone anyway, so we can improve if needed.

@xBazilio
Copy link
Contributor Author

What are the next steps? Can it be merged?

@fpetkovski fpetkovski merged commit 6df670f into thanos-io:main Mar 18, 2024
17 of 19 checks passed
nicolastakashi pushed a commit to nicolastakashi/thanos that referenced this pull request Mar 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants