archival: remove redundant manifest upload after GC #10130

jcsp · 2023-04-17T13:57:59Z

The important upload is the one that happens inside garbage_collect(), because we need to update the
manifest to avoid external readers (scrubbers and
read replicas) getting upset that they got a 404
reading a segment that's meant to exist.

The upload after garbage collection only serves
to trim the 'segments' and/or 'replaced' vectors
in the remote copy of the manifest, which has no
logical impact. We can rely on the next periodic
manifest upload (manifest_upload_interval) to
pick that up.

Followup to #10099

Backports Required

By hand, together with #10099

Release Notes

none

andrwng · 2023-04-17T19:13:41Z

tests/rptest/tests/retention_policy_test.py

@@ -142,7 +142,8 @@ def __init__(self, test_context):
                             default_retention_segments * self.segment_size)

        si_settings = SISettings(test_context,
-                                 log_segment_size=self.segment_size)
+                                 log_segment_size=self.segment_size,
+                                 fast_uploads=True)


Would we avoid this if we called flush_manifest_clean_offset in garbage_collect() after uploading the manifest? Otherwise this seems mildly concerning (though not necessarily blocking) since we're now delaying local GC.

Hmm. When I added this I was thinking of the test needing to see up to date manifests, but let me look into the impact on local GC...

So, I think I might not have needed to use fast_uploads at all, looking again at the failure mode (it was failing because it didn't have the latest bucket scan fixes for respecting start offsets.

However, your comment got me thinking, and I've added another commit to respect local_storage_pressure() in maybe_flush_manifest_clean_offset, so that we don't risk subtle issues where background manifest uploads could leave us delaying local log prefix truncation.

The important upload is the one that happens inside garbage_collect(), because we need to update the manifest to avoid external readers (scrubbers and read replicas) getting upset that they got a 404 reading a segment that's meant to exist. The upload _after_ garbage collection only serves to trim the 'segments' and/or 'replaced' vectors in the remote copy of the manifest, which has no logical impact. We can rely on the next periodic manifest upload (manifest_upload_interval) to pick that up. Followup to redpanda-data#10099

This periodic check is done in the ntp archiver loop, and we rely on it in cases where the manifest was uploaded in the background for GC. Usually this can be very lazy, but if the local log is waiting on us to advance our max_collectible offset, we should flush as soon as we can.

jcsp · 2023-04-20T10:23:06Z

Test failure is:

CI Failure (go-kafka-serde returned non-zero exit status 1) in SchemaRegistryAutoAuthTest.test_serde_client #10001

VladLazar · 2023-04-20T10:32:58Z

src/v/archival/ntp_archiver_service.cc

-            const auto retention_updated_manifest = co_await apply_retention();
-            const auto gc_updated_manifest = co_await garbage_collect();


Can we also remove the ntp_archiver_service::manifest_updated type? It's not used anywhere else

vshtokman · 2023-04-24T14:57:01Z

@jcsp , I think we are missing backports here - would you mind taking a look?

vshtokman · 2023-04-28T17:14:36Z

/backport v23.1.x

vshtokman · 2023-04-28T17:14:42Z

/backport v22.3.x

vbotbuildovich · 2023-04-28T17:15:33Z

Failed to run cherry-pick command. I executed the below command:

git cherry-pick -x 534063382844a9b5b7fd2c2add850213b57beb3c 408b7caea9b9fd521122d0a7bde401c9d868c8c4

Workflow run logs.

vbotbuildovich · 2023-04-28T17:15:34Z

Failed to run cherry-pick command. I executed the below command:

git cherry-pick -x 534063382844a9b5b7fd2c2add850213b57beb3c 408b7caea9b9fd521122d0a7bde401c9d868c8c4

Workflow run logs.

github-actions bot added the area/redpanda label Apr 17, 2023

jcsp requested review from andrwng and VladLazar April 17, 2023 13:58

jcsp force-pushed the pr-10099-followup branch from 37ceecf to 727c2e3 Compare April 17, 2023 16:42

andrwng reviewed Apr 17, 2023

View reviewed changes

jcsp force-pushed the pr-10099-followup branch from 727c2e3 to 5340633 Compare April 19, 2023 10:23

jcsp marked this pull request as ready for review April 20, 2023 10:23

VladLazar approved these changes Apr 20, 2023

View reviewed changes

jcsp merged commit 4d89112 into redpanda-data:dev Apr 20, 2023

jcsp deleted the pr-10099-followup branch April 20, 2023 13:50

VladLazar mentioned this pull request Apr 21, 2023

rptest: fix cloud start kafka offset computation #10262

Merged

7 tasks

This was referenced Apr 28, 2023

[v22.3.x] archival: remove redundant manifest upload after GC #10459

Closed

[v23.1.x] archival: remove redundant manifest upload after GC #10458

Closed

jcsp mentioned this pull request May 5, 2023

archival: sync manifest before GC'ing segments #10099

Merged

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

archival: remove redundant manifest upload after GC #10130

archival: remove redundant manifest upload after GC #10130

jcsp commented Apr 17, 2023 •

edited

Loading

andrwng Apr 17, 2023

jcsp Apr 18, 2023

jcsp Apr 19, 2023

jcsp commented Apr 20, 2023

VladLazar Apr 20, 2023

vshtokman commented Apr 24, 2023

vshtokman commented Apr 28, 2023

vshtokman commented Apr 28, 2023

vbotbuildovich commented Apr 28, 2023

vbotbuildovich commented Apr 28, 2023

		const auto retention_updated_manifest = co_await apply_retention();
		const auto gc_updated_manifest = co_await garbage_collect();

archival: remove redundant manifest upload after GC #10130

archival: remove redundant manifest upload after GC #10130

Conversation

jcsp commented Apr 17, 2023 • edited Loading

Backports Required

Release Notes

andrwng Apr 17, 2023

Choose a reason for hiding this comment

jcsp Apr 18, 2023

Choose a reason for hiding this comment

jcsp Apr 19, 2023

Choose a reason for hiding this comment

jcsp commented Apr 20, 2023

VladLazar Apr 20, 2023

Choose a reason for hiding this comment

vshtokman commented Apr 24, 2023

vshtokman commented Apr 28, 2023

vshtokman commented Apr 28, 2023

vbotbuildovich commented Apr 28, 2023

vbotbuildovich commented Apr 28, 2023

jcsp commented Apr 17, 2023 •

edited

Loading