Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

archival: remove redundant manifest upload after GC #10130

Merged
merged 2 commits into from
Apr 20, 2023

Conversation

jcsp
Copy link
Contributor

@jcsp jcsp commented Apr 17, 2023

The important upload is the one that happens inside garbage_collect(), because we need to update the
manifest to avoid external readers (scrubbers and
read replicas) getting upset that they got a 404
reading a segment that's meant to exist.

The upload after garbage collection only serves
to trim the 'segments' and/or 'replaced' vectors
in the remote copy of the manifest, which has no
logical impact. We can rely on the next periodic
manifest upload (manifest_upload_interval) to
pick that up.

Followup to #10099

Backports Required

By hand, together with #10099

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v23.1.x
  • v22.3.x
  • v22.2.x

Release Notes

  • none

@@ -142,7 +142,8 @@ def __init__(self, test_context):
default_retention_segments * self.segment_size)

si_settings = SISettings(test_context,
log_segment_size=self.segment_size)
log_segment_size=self.segment_size,
fast_uploads=True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would we avoid this if we called flush_manifest_clean_offset in garbage_collect() after uploading the manifest? Otherwise this seems mildly concerning (though not necessarily blocking) since we're now delaying local GC.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. When I added this I was thinking of the test needing to see up to date manifests, but let me look into the impact on local GC...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, I think I might not have needed to use fast_uploads at all, looking again at the failure mode (it was failing because it didn't have the latest bucket scan fixes for respecting start offsets.

However, your comment got me thinking, and I've added another commit to respect local_storage_pressure() in maybe_flush_manifest_clean_offset, so that we don't risk subtle issues where background manifest uploads could leave us delaying local log prefix truncation.

The important upload is the one that happens inside
garbage_collect(), because we need to update the
manifest to avoid external readers (scrubbers and
read replicas) getting upset that they got a 404
reading a segment that's meant to exist.

The upload _after_ garbage collection only serves
to trim the 'segments' and/or 'replaced' vectors
in the remote copy of the manifest, which has no
logical impact.  We can rely on the next periodic
manifest upload (manifest_upload_interval) to
pick that up.

Followup to redpanda-data#10099
This periodic check is done in the ntp archiver loop, and we rely on
it in cases where the manifest was uploaded in the background
for GC.  Usually this can be very lazy, but if the local log
is waiting on us to advance our max_collectible offset, we should
flush as soon as we can.
@jcsp
Copy link
Contributor Author

jcsp commented Apr 20, 2023

@jcsp jcsp marked this pull request as ready for review April 20, 2023 10:23
Comment on lines -1808 to -1809
const auto retention_updated_manifest = co_await apply_retention();
const auto gc_updated_manifest = co_await garbage_collect();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we also remove the ntp_archiver_service::manifest_updated type? It's not used anywhere else

@jcsp jcsp merged commit 4d89112 into redpanda-data:dev Apr 20, 2023
@jcsp jcsp deleted the pr-10099-followup branch April 20, 2023 13:50
@vshtokman
Copy link
Contributor

@jcsp , I think we are missing backports here - would you mind taking a look?

@vshtokman
Copy link
Contributor

/backport v23.1.x

@vshtokman
Copy link
Contributor

/backport v22.3.x

@vbotbuildovich
Copy link
Collaborator

Failed to run cherry-pick command. I executed the below command:

git cherry-pick -x 534063382844a9b5b7fd2c2add850213b57beb3c 408b7caea9b9fd521122d0a7bde401c9d868c8c4

Workflow run logs.

@vbotbuildovich
Copy link
Collaborator

Failed to run cherry-pick command. I executed the below command:

git cherry-pick -x 534063382844a9b5b7fd2c2add850213b57beb3c 408b7caea9b9fd521122d0a7bde401c9d868c8c4

Workflow run logs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants