Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cloud_storage: Use columnar projection to store spillover manifests #11294

Merged

Conversation

Lazin
Copy link
Contributor

@Lazin Lazin commented Jun 8, 2023

The list of spillover manifests is stored in the partition manifest using the column-store data structure.
Each spillover manifest is represented using segment_meta struct (which is used for segments as well).

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v23.1.x
  • v22.3.x
  • v22.2.x

Release Notes

Improvements

  • Compressed columnar representation is used to store tiered-storage metadata that describes spillover manifests

@Lazin Lazin marked this pull request as draft June 8, 2023 12:48
@Lazin Lazin force-pushed the pr/implement-ntp-archiver-spillover3 branch 6 times, most recently from a4b56ec to 3381669 Compare June 12, 2023 18:47
@Lazin Lazin requested review from VladLazar and andijcr June 12, 2023 18:50
@Lazin Lazin force-pushed the pr/implement-ntp-archiver-spillover3 branch from 3381669 to fb8a59c Compare June 12, 2023 22:17
Copy link
Contributor

@VladLazar VladLazar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice stuff

// Permit probe to query object counts
friend class remote_probe;
/// Cache used to store materialized spillover manifests
ss::shared_ptr<materialized_manifest_cache> _manifest_cache;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: could we use a unique_ptr here since get_materialized_manifest_cache returns a ref. Or did you mean to return shared_ptr in get_materialized_manifest_cache?

return ss::now();
co_await _manifest_cache->start();

co_return;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: no need for co_return here

src/v/cloud_storage/segment_meta_cstore.cc Outdated Show resolved Hide resolved
src/v/cloud_storage/segment_meta_cstore.h Show resolved Hide resolved
src/v/cloud_storage/async_manifest_view.cc Outdated Show resolved Hide resolved
}
return *_segments.at_index(target_ix);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you update the comment on this function please? So this turned out being faster? It's not entirely obvious to me since the previous version did less steps, but had a worse memory access pattern.

@Lazin Lazin marked this pull request as ready for review June 14, 2023 23:06
@Lazin Lazin changed the title [DRAFT] cloud_storage: Use columnar projection to store spillover manifests cloud_storage: Use columnar projection to store spillover manifests Jun 14, 2023
Lazin added 11 commits June 15, 2023 05:25
The spillover command was serialized with the wrong key. Because of that
the spillover was never applied.
Add the list of spillover manifests to the partition manifest. The list
is supposed to be used instead of the ListObjectsV2 api in S3. The
'segment_meta' structure is used to represent individual manifests.
The compressed column-store which is used to store segments is also used
to store spillover manifests.
Use list of spillover manifests from the partition manifest in the
async_manifest_view.
Move materialized_manifest_cache into a dedicated .cc to avoid cyclic
dependency.
... per shard and not per partition. The object is moved to the
materialized_segments. Some cache methods are changed to accept
retry_chain_logger of the caller (async_manifest_view). Previously the
materialized_manifest_cache received retry_chain_logger reference
through constructor.
... to materialized_resources since it manages not only segments but
also spillover manifests.
Add column accessors for c-store. The columns can be used to search by
individual fields without materializing the whole rows of data. This
allows us to speedup individual operations on metadata.
Use column-store to perform a timequery. The search is performed using
only selectred columns (base_timestamp and max_timestamp) using linear
search.
Add metric for uploads and downloads. Add new manifest_type variant.
This commit fixes a bug in the code that causes async_manifest_view to
put manifest into cache twice and returning empty manifest to the
caller. This triggers assertion because async_manifest_view works only
witn non-empty manifests. It also fixes manifest download code path that
interpreted spillover manifests as json.
@Lazin Lazin force-pushed the pr/implement-ntp-archiver-spillover3 branch from 83614d3 to 516ab52 Compare June 15, 2023 09:31
@Lazin Lazin requested review from andijcr and VladLazar June 15, 2023 09:32
src/v/cloud_storage/materialized_manifest_cache.h Outdated Show resolved Hide resolved
src/v/cloud_storage/async_manifest_view.cc Show resolved Hide resolved
src/v/archival/ntp_archiver_service.cc Outdated Show resolved Hide resolved
tests/rptest/tests/e2e_shadow_indexing_test.py Outdated Show resolved Hide resolved
src/v/cloud_storage/async_manifest_view.cc Show resolved Hide resolved
src/v/cluster/archival_metadata_stm.cc Show resolved Hide resolved
Add 'cloud_storage_spillover_manifest_max_segments' parameter. The
parameter is similar to 'cloud_storage_spillover_manifest_size' but
instead of forcing manifest spillover based on byte size of the
manifest it uses number of segments in the manifest.
Various fixes from the prev code review. A lot of renamed
methods/variables. The semaphore in the materialized_manifest_cache is
now named.
Persist list of spillover manifests in the archival STM snapshot
In the 'segment_meta_cstore' change 'get_archive_term_column' to
'get_archiver_term_column' to match the name of the field.
Avoid full metadata scan by using 'get_segment_term_column' to locate
the manifest that contains required term id.
...used by the materialized_manifest_cache. The cache is used by several
partitions simultaneosly so it has to be able to store manifests with
the same base offsets.

Update ducktape test to use more than one partition. Previously, this
test was passing because it used only one parititon.
Move cache tests into a separate translation unit. Rename
cloud_storage_basic to clud_storage and move cache test there.
Do not retry in the remote_partition::init_cursor because the
async_manifest_view::get_cursor retries internally.
@Lazin Lazin force-pushed the pr/implement-ntp-archiver-spillover3 branch from fb1bd28 to 51eaa66 Compare June 15, 2023 22:31
@Lazin Lazin requested a review from VladLazar June 15, 2023 22:31
Copy link
Contributor

@VladLazar VladLazar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. I'm curious about the performance of cloud timequeries (original comment).

Comment on lines +845 to +846
def all_partitions_spilled():
return self.num_manifests_uploaded() > 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this check doesn't ensure all partitions have spilled, but we can come back to it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants