-
Notifications
You must be signed in to change notification settings - Fork 589
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
archival: remove redundant manifest upload after GC #10130
Conversation
@@ -142,7 +142,8 @@ def __init__(self, test_context): | |||
default_retention_segments * self.segment_size) | |||
|
|||
si_settings = SISettings(test_context, | |||
log_segment_size=self.segment_size) | |||
log_segment_size=self.segment_size, | |||
fast_uploads=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would we avoid this if we called flush_manifest_clean_offset
in garbage_collect()
after uploading the manifest? Otherwise this seems mildly concerning (though not necessarily blocking) since we're now delaying local GC.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm. When I added this I was thinking of the test needing to see up to date manifests, but let me look into the impact on local GC...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, I think I might not have needed to use fast_uploads at all, looking again at the failure mode (it was failing because it didn't have the latest bucket scan fixes for respecting start offsets.
However, your comment got me thinking, and I've added another commit to respect local_storage_pressure()
in maybe_flush_manifest_clean_offset, so that we don't risk subtle issues where background manifest uploads could leave us delaying local log prefix truncation.
The important upload is the one that happens inside garbage_collect(), because we need to update the manifest to avoid external readers (scrubbers and read replicas) getting upset that they got a 404 reading a segment that's meant to exist. The upload _after_ garbage collection only serves to trim the 'segments' and/or 'replaced' vectors in the remote copy of the manifest, which has no logical impact. We can rely on the next periodic manifest upload (manifest_upload_interval) to pick that up. Followup to redpanda-data#10099
This periodic check is done in the ntp archiver loop, and we rely on it in cases where the manifest was uploaded in the background for GC. Usually this can be very lazy, but if the local log is waiting on us to advance our max_collectible offset, we should flush as soon as we can.
const auto retention_updated_manifest = co_await apply_retention(); | ||
const auto gc_updated_manifest = co_await garbage_collect(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we also remove the ntp_archiver_service::manifest_updated
type? It's not used anywhere else
@jcsp , I think we are missing backports here - would you mind taking a look? |
/backport v23.1.x |
/backport v22.3.x |
Failed to run cherry-pick command. I executed the below command:
|
Failed to run cherry-pick command. I executed the below command:
|
The important upload is the one that happens inside garbage_collect(), because we need to update the
manifest to avoid external readers (scrubbers and
read replicas) getting upset that they got a 404
reading a segment that's meant to exist.
The upload after garbage collection only serves
to trim the 'segments' and/or 'replaced' vectors
in the remote copy of the manifest, which has no
logical impact. We can rely on the next periodic
manifest upload (manifest_upload_interval) to
pick that up.
Followup to #10099
Backports Required
By hand, together with #10099
Release Notes