cloud_storage: Use columnar projection to store spillover manifests #11294

Lazin · 2023-06-08T12:48:44Z

The list of spillover manifests is stored in the partition manifest using the column-store data structure.
Each spillover manifest is represented using segment_meta struct (which is used for segments as well).

Backports Required

Release Notes

Improvements

Compressed columnar representation is used to store tiered-storage metadata that describes spillover manifests

VladLazar

nice stuff

VladLazar · 2023-06-13T10:12:04Z

src/v/cloud_storage/materialized_segments.h

-    // Permit probe to query object counts
-    friend class remote_probe;
+    /// Cache used to store materialized spillover manifests
+    ss::shared_ptr<materialized_manifest_cache> _manifest_cache;


nit: could we use a unique_ptr here since get_materialized_manifest_cache returns a ref. Or did you mean to return shared_ptr in get_materialized_manifest_cache?

VladLazar · 2023-06-13T10:13:43Z

src/v/cloud_storage/materialized_segments.cc

-    return ss::now();
+    co_await _manifest_cache->start();
+
+    co_return;


nit: no need for co_return here

src/v/cloud_storage/segment_meta_cstore.cc

src/v/cloud_storage/segment_meta_cstore.h

src/v/cloud_storage/async_manifest_view.cc

VladLazar · 2023-06-13T10:46:40Z

src/v/cloud_storage/partition_manifest.cc

    }
+    return *_segments.at_index(target_ix);
 }


Could you update the comment on this function please? So this turned out being faster? It's not entirely obvious to me since the previous version did less steps, but had a worse memory access pattern.

src/v/cloud_storage/partition_manifest.h

src/v/cloud_storage/partition_manifest.cc

src/v/cloud_storage/async_manifest_view.cc

The spillover command was serialized with the wrong key. Because of that the spillover was never applied.

Add the list of spillover manifests to the partition manifest. The list is supposed to be used instead of the ListObjectsV2 api in S3. The 'segment_meta' structure is used to represent individual manifests. The compressed column-store which is used to store segments is also used to store spillover manifests.

Use list of spillover manifests from the partition manifest in the async_manifest_view.

Move materialized_manifest_cache into a dedicated .cc to avoid cyclic dependency.

... per shard and not per partition. The object is moved to the materialized_segments. Some cache methods are changed to accept retry_chain_logger of the caller (async_manifest_view). Previously the materialized_manifest_cache received retry_chain_logger reference through constructor.

... to materialized_resources since it manages not only segments but also spillover manifests.

Add column accessors for c-store. The columns can be used to search by individual fields without materializing the whole rows of data. This allows us to speedup individual operations on metadata.

... for spillover manifests.

Use column-store to perform a timequery. The search is performed using only selectred columns (base_timestamp and max_timestamp) using linear search.

Add metric for uploads and downloads. Add new manifest_type variant.

This commit fixes a bug in the code that causes async_manifest_view to put manifest into cache twice and returning empty manifest to the caller. This triggers assertion because async_manifest_view works only witn non-empty manifests. It also fixes manifest download code path that interpreted spillover manifests as json.

src/v/cloud_storage/materialized_manifest_cache.h

src/v/cloud_storage/async_manifest_view.cc

src/v/archival/ntp_archiver_service.cc

tests/rptest/tests/e2e_shadow_indexing_test.py

src/v/cloud_storage/async_manifest_view.cc

src/v/cluster/archival_metadata_stm.cc

Add 'cloud_storage_spillover_manifest_max_segments' parameter. The parameter is similar to 'cloud_storage_spillover_manifest_size' but instead of forcing manifest spillover based on byte size of the manifest it uses number of segments in the manifest.

Various fixes from the prev code review. A lot of renamed methods/variables. The semaphore in the materialized_manifest_cache is now named.

Persist list of spillover manifests in the archival STM snapshot

In the 'segment_meta_cstore' change 'get_archive_term_column' to 'get_archiver_term_column' to match the name of the field.

Avoid full metadata scan by using 'get_segment_term_column' to locate the manifest that contains required term id.

...used by the materialized_manifest_cache. The cache is used by several partitions simultaneosly so it has to be able to store manifests with the same base offsets. Update ducktape test to use more than one partition. Previously, this test was passing because it used only one parititon.

Move cache tests into a separate translation unit. Rename cloud_storage_basic to clud_storage and move cache test there.

Do not retry in the remote_partition::init_cursor because the async_manifest_view::get_cursor retries internally.

VladLazar

Looks good. I'm curious about the performance of cloud timequeries (original comment).

VladLazar · 2023-06-16T08:55:28Z

tests/rptest/tests/e2e_shadow_indexing_test.py

+        def all_partitions_spilled():
+            return self.num_manifests_uploaded() > 0


nit: this check doesn't ensure all partitions have spilled, but we can come back to it

VladLazar · 2023-06-16T12:37:01Z

Failure is:

CI Failure (Redpanda process unexpectedly stopped) in MemoryStressTest.test_fetch_with_many_partitions #11458
CI Failure (timeout waiting for end offsets to be updated for all partitions) in OffsetForLeaderEpochTest.test_offset_for_leader_epoch #11169

Lazin marked this pull request as draft June 8, 2023 12:48

github-actions bot added the area/redpanda label Jun 8, 2023

Lazin force-pushed the pr/implement-ntp-archiver-spillover3 branch 6 times, most recently from a4b56ec to 3381669 Compare June 12, 2023 18:47

Lazin requested review from VladLazar and andijcr June 12, 2023 18:50

Lazin force-pushed the pr/implement-ntp-archiver-spillover3 branch from 3381669 to fb8a59c Compare June 12, 2023 22:17

VladLazar reviewed Jun 13, 2023

View reviewed changes

src/v/cloud_storage/partition_manifest.h Show resolved Hide resolved

andijcr reviewed Jun 13, 2023

View reviewed changes

src/v/cloud_storage/partition_manifest.cc Show resolved Hide resolved

src/v/cloud_storage/async_manifest_view.cc Outdated Show resolved Hide resolved

Lazin marked this pull request as ready for review June 14, 2023 23:06

Lazin changed the title ~~[DRAFT] cloud_storage: Use columnar projection to store spillover manifests~~ cloud_storage: Use columnar projection to store spillover manifests Jun 14, 2023

Lazin added 11 commits June 15, 2023 05:25

cluster: Fix spillover in archival_metadata_stm

a72c0e3

The spillover command was serialized with the wrong key. Because of that the spillover was never applied.

cloud_storage: Use list of spillover manifests

fbb0174

Use list of spillover manifests from the partition manifest in the async_manifest_view.

cloud_storage: Move materialized_manifest_cache

42cfa29

Move materialized_manifest_cache into a dedicated .cc to avoid cyclic dependency.

cloud_storage: Rename materialied_segments...

111e301

... to materialized_resources since it manages not only segments but also spillover manifests.

cloud_storage: Add accessors for individual columns

b4d49b3

Add column accessors for c-store. The columns can be used to search by individual fields without materializing the whole rows of data. This allows us to speedup individual operations on metadata.

cloud_storage: Use individual columns to search...

4370429

... for spillover manifests.

cloud_storage: Update timequery

6b48931

Use column-store to perform a timequery. The search is performed using only selectred columns (base_timestamp and max_timestamp) using linear search.

cloud_storage: Add metric for spillover manifests

ab43ecd

Add metric for uploads and downloads. Add new manifest_type variant.

Lazin force-pushed the pr/implement-ntp-archiver-spillover3 branch from 83614d3 to 516ab52 Compare June 15, 2023 09:31

Lazin requested review from andijcr and VladLazar June 15, 2023 09:32

VladLazar reviewed Jun 15, 2023

View reviewed changes

Lazin added 9 commits June 15, 2023 18:29

cloud_storage: Async manifest view fixes

b704011

Various fixes from the prev code review. A lot of renamed methods/variables. The semaphore in the materialized_manifest_cache is now named.

cloud_storage: Add spillover manifests to snapshot

590cc77

Persist list of spillover manifests in the archival STM snapshot

rptest: Add spillover end to end test

302a70f

cloud_storage: Rename get_archive_term_column method

b60e202

In the 'segment_meta_cstore' change 'get_archive_term_column' to 'get_archiver_term_column' to match the name of the field.

cloud_storage: Optimize 'get_term_last_offset'

8961f90

Avoid full metadata scan by using 'get_segment_term_column' to locate the manifest that contains required term id.

cloud_storage: Extract materialized manifest cache tests

c32888d

Move cache tests into a separate translation unit. Rename cloud_storage_basic to clud_storage and move cache test there.

cloud_storage: Remove retries from init_cursor

51eaa66

Do not retry in the remote_partition::init_cursor because the async_manifest_view::get_cursor retries internally.

Lazin force-pushed the pr/implement-ntp-archiver-spillover3 branch from fb1bd28 to 51eaa66 Compare June 15, 2023 22:31

Lazin requested a review from VladLazar June 15, 2023 22:31

VladLazar approved these changes Jun 16, 2023

View reviewed changes

andijcr approved these changes Jun 16, 2023

View reviewed changes

piyushredpanda merged commit bf9ef7d into redpanda-data:dev Jun 16, 2023

shane-runsafe mentioned this pull request Oct 1, 2023

[Snyk] Security upgrade react-scripts from 4.0.0 to 5.0.0 runsafesecurity/redpanda#22

Open

This was referenced Dec 19, 2023

[Snyk] Fix for 1 vulnerabilities runsafesecurity/redpanda#27

Open

[Snyk] Fix for 11 vulnerabilities runsafesecurity/redpanda#28

Open

shane-runsafe mentioned this pull request Jan 9, 2024

[Snyk] Security upgrade react-scripts from 4.0.0 to 5.0.0 runsafesecurity/redpanda#31

Open

shane-runsafe mentioned this pull request Feb 2, 2024

[Snyk] Security upgrade react-scripts from 4.0.0 to 5.0.0 runsafesecurity/redpanda#33

Open

shane-runsafe mentioned this pull request Feb 12, 2024

[Snyk] Security upgrade react-scripts from 4.0.0 to 5.0.0 runsafesecurity/redpanda#34

Open

shane-runsafe mentioned this pull request Mar 22, 2024

[Snyk] Fix for 2 vulnerabilities runsafesecurity/redpanda#36

Open

shane-runsafe mentioned this pull request Apr 30, 2024

[Snyk] Security upgrade react-scripts from 4.0.0 to 5.0.0 runsafesecurity/redpanda#38

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cloud_storage: Use columnar projection to store spillover manifests #11294

cloud_storage: Use columnar projection to store spillover manifests #11294

Lazin commented Jun 8, 2023

VladLazar left a comment

VladLazar Jun 13, 2023

VladLazar Jun 13, 2023

VladLazar Jun 13, 2023

VladLazar left a comment •

edited

Loading

VladLazar Jun 16, 2023

VladLazar commented Jun 16, 2023 •

edited

Loading

		def all_partitions_spilled():
		return self.num_manifests_uploaded() > 0

cloud_storage: Use columnar projection to store spillover manifests #11294

cloud_storage: Use columnar projection to store spillover manifests #11294

Conversation

Lazin commented Jun 8, 2023

Backports Required

Release Notes

Improvements

VladLazar left a comment

Choose a reason for hiding this comment

VladLazar Jun 13, 2023

Choose a reason for hiding this comment

VladLazar Jun 13, 2023

Choose a reason for hiding this comment

VladLazar Jun 13, 2023

Choose a reason for hiding this comment

VladLazar left a comment • edited Loading

Choose a reason for hiding this comment

VladLazar Jun 16, 2023

Choose a reason for hiding this comment

VladLazar commented Jun 16, 2023 • edited Loading

VladLazar left a comment •

edited

Loading

VladLazar commented Jun 16, 2023 •

edited

Loading