-
Notifications
You must be signed in to change notification settings - Fork 589
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cloud_storage: Use columnar projection to store spillover manifests #11294
cloud_storage: Use columnar projection to store spillover manifests #11294
Conversation
a4b56ec
to
3381669
Compare
3381669
to
fb8a59c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice stuff
// Permit probe to query object counts | ||
friend class remote_probe; | ||
/// Cache used to store materialized spillover manifests | ||
ss::shared_ptr<materialized_manifest_cache> _manifest_cache; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: could we use a unique_ptr
here since get_materialized_manifest_cache
returns a ref. Or did you mean to return shared_ptr
in get_materialized_manifest_cache
?
return ss::now(); | ||
co_await _manifest_cache->start(); | ||
|
||
co_return; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: no need for co_return here
} | ||
return *_segments.at_index(target_ix); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you update the comment on this function please? So this turned out being faster? It's not entirely obvious to me since the previous version did less steps, but had a worse memory access pattern.
The spillover command was serialized with the wrong key. Because of that the spillover was never applied.
Add the list of spillover manifests to the partition manifest. The list is supposed to be used instead of the ListObjectsV2 api in S3. The 'segment_meta' structure is used to represent individual manifests. The compressed column-store which is used to store segments is also used to store spillover manifests.
Use list of spillover manifests from the partition manifest in the async_manifest_view.
Move materialized_manifest_cache into a dedicated .cc to avoid cyclic dependency.
... per shard and not per partition. The object is moved to the materialized_segments. Some cache methods are changed to accept retry_chain_logger of the caller (async_manifest_view). Previously the materialized_manifest_cache received retry_chain_logger reference through constructor.
... to materialized_resources since it manages not only segments but also spillover manifests.
Add column accessors for c-store. The columns can be used to search by individual fields without materializing the whole rows of data. This allows us to speedup individual operations on metadata.
... for spillover manifests.
Use column-store to perform a timequery. The search is performed using only selectred columns (base_timestamp and max_timestamp) using linear search.
Add metric for uploads and downloads. Add new manifest_type variant.
This commit fixes a bug in the code that causes async_manifest_view to put manifest into cache twice and returning empty manifest to the caller. This triggers assertion because async_manifest_view works only witn non-empty manifests. It also fixes manifest download code path that interpreted spillover manifests as json.
83614d3
to
516ab52
Compare
Add 'cloud_storage_spillover_manifest_max_segments' parameter. The parameter is similar to 'cloud_storage_spillover_manifest_size' but instead of forcing manifest spillover based on byte size of the manifest it uses number of segments in the manifest.
Various fixes from the prev code review. A lot of renamed methods/variables. The semaphore in the materialized_manifest_cache is now named.
Persist list of spillover manifests in the archival STM snapshot
In the 'segment_meta_cstore' change 'get_archive_term_column' to 'get_archiver_term_column' to match the name of the field.
Avoid full metadata scan by using 'get_segment_term_column' to locate the manifest that contains required term id.
...used by the materialized_manifest_cache. The cache is used by several partitions simultaneosly so it has to be able to store manifests with the same base offsets. Update ducktape test to use more than one partition. Previously, this test was passing because it used only one parititon.
Move cache tests into a separate translation unit. Rename cloud_storage_basic to clud_storage and move cache test there.
Do not retry in the remote_partition::init_cursor because the async_manifest_view::get_cursor retries internally.
fb1bd28
to
51eaa66
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. I'm curious about the performance of cloud timequeries (original comment).
def all_partitions_spilled(): | ||
return self.num_manifests_uploaded() > 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: this check doesn't ensure all partitions have spilled, but we can come back to it
The list of spillover manifests is stored in the partition manifest using the column-store data structure.
Each spillover manifest is represented using
segment_meta
struct (which is used for segments as well).Backports Required
Release Notes
Improvements