Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cloud_storage: Use columnar projection to store spillover manifests #11294

Merged

Commits on Jun 15, 2023

  1. cluster: Fix spillover in archival_metadata_stm

    The spillover command was serialized with the wrong key. Because of that
    the spillover was never applied.
    Lazin committed Jun 15, 2023
    Configuration menu
    Copy the full SHA
    a72c0e3 View commit details
    Browse the repository at this point in the history
  2. cloud_storage: Update the partition_manifest

    Add the list of spillover manifests to the partition manifest. The list
    is supposed to be used instead of the ListObjectsV2 api in S3. The
    'segment_meta' structure is used to represent individual manifests.
    The compressed column-store which is used to store segments is also used
    to store spillover manifests.
    Lazin committed Jun 15, 2023
    Configuration menu
    Copy the full SHA
    e734c3f View commit details
    Browse the repository at this point in the history
  3. cloud_storage: Use list of spillover manifests

    Use list of spillover manifests from the partition manifest in the
    async_manifest_view.
    Lazin committed Jun 15, 2023
    Configuration menu
    Copy the full SHA
    fbb0174 View commit details
    Browse the repository at this point in the history
  4. cloud_storage: Move materialized_manifest_cache

    Move materialized_manifest_cache into a dedicated .cc to avoid cyclic
    dependency.
    Lazin committed Jun 15, 2023
    Configuration menu
    Copy the full SHA
    42cfa29 View commit details
    Browse the repository at this point in the history
  5. cloud_storage: Construct materialized_manifest_cache...

    ... per shard and not per partition. The object is moved to the
    materialized_segments. Some cache methods are changed to accept
    retry_chain_logger of the caller (async_manifest_view). Previously the
    materialized_manifest_cache received retry_chain_logger reference
    through constructor.
    Lazin committed Jun 15, 2023
    Configuration menu
    Copy the full SHA
    5579d1c View commit details
    Browse the repository at this point in the history
  6. cloud_storage: Rename materialied_segments...

    ... to materialized_resources since it manages not only segments but
    also spillover manifests.
    Lazin committed Jun 15, 2023
    Configuration menu
    Copy the full SHA
    111e301 View commit details
    Browse the repository at this point in the history
  7. cloud_storage: Add accessors for individual columns

    Add column accessors for c-store. The columns can be used to search by
    individual fields without materializing the whole rows of data. This
    allows us to speedup individual operations on metadata.
    Lazin committed Jun 15, 2023
    Configuration menu
    Copy the full SHA
    b4d49b3 View commit details
    Browse the repository at this point in the history
  8. cloud_storage: Use individual columns to search...

    ... for spillover manifests.
    Lazin committed Jun 15, 2023
    Configuration menu
    Copy the full SHA
    4370429 View commit details
    Browse the repository at this point in the history
  9. cloud_storage: Update timequery

    Use column-store to perform a timequery. The search is performed using
    only selectred columns (base_timestamp and max_timestamp) using linear
    search.
    Lazin committed Jun 15, 2023
    Configuration menu
    Copy the full SHA
    6b48931 View commit details
    Browse the repository at this point in the history
  10. cloud_storage: Add metric for spillover manifests

    Add metric for uploads and downloads. Add new manifest_type variant.
    Lazin committed Jun 15, 2023
    Configuration menu
    Copy the full SHA
    ab43ecd View commit details
    Browse the repository at this point in the history
  11. cloud_storage: Put spillover manifest hydration

    This commit fixes a bug in the code that causes async_manifest_view to
    put manifest into cache twice and returning empty manifest to the
    caller. This triggers assertion because async_manifest_view works only
    witn non-empty manifests. It also fixes manifest download code path that
    interpreted spillover manifests as json.
    Lazin committed Jun 15, 2023
    Configuration menu
    Copy the full SHA
    80e00ab View commit details
    Browse the repository at this point in the history
  12. archival: Add new spillover configuration parameter

    Add 'cloud_storage_spillover_manifest_max_segments' parameter. The
    parameter is similar to 'cloud_storage_spillover_manifest_size' but
    instead of forcing manifest spillover based on byte size of the
    manifest it uses number of segments in the manifest.
    Lazin committed Jun 15, 2023
    Configuration menu
    Copy the full SHA
    4a98478 View commit details
    Browse the repository at this point in the history
  13. cloud_storage: Async manifest view fixes

    Various fixes from the prev code review. A lot of renamed
    methods/variables. The semaphore in the materialized_manifest_cache is
    now named.
    Lazin committed Jun 15, 2023
    Configuration menu
    Copy the full SHA
    b704011 View commit details
    Browse the repository at this point in the history
  14. cloud_storage: Add spillover manifests to snapshot

    Persist list of spillover manifests in the archival STM snapshot
    Lazin committed Jun 15, 2023
    Configuration menu
    Copy the full SHA
    590cc77 View commit details
    Browse the repository at this point in the history
  15. Configuration menu
    Copy the full SHA
    302a70f View commit details
    Browse the repository at this point in the history
  16. cloud_storage: Rename get_archive_term_column method

    In the 'segment_meta_cstore' change 'get_archive_term_column' to
    'get_archiver_term_column' to match the name of the field.
    Lazin committed Jun 15, 2023
    Configuration menu
    Copy the full SHA
    b60e202 View commit details
    Browse the repository at this point in the history
  17. cloud_storage: Optimize 'get_term_last_offset'

    Avoid full metadata scan by using 'get_segment_term_column' to locate
    the manifest that contains required term id.
    Lazin committed Jun 15, 2023
    Configuration menu
    Copy the full SHA
    8961f90 View commit details
    Browse the repository at this point in the history
  18. cloud_storage: Include NTP into the key

    ...used by the materialized_manifest_cache. The cache is used by several
    partitions simultaneosly so it has to be able to store manifests with
    the same base offsets.
    
    Update ducktape test to use more than one partition. Previously, this
    test was passing because it used only one parititon.
    Lazin committed Jun 15, 2023
    Configuration menu
    Copy the full SHA
    64a666d View commit details
    Browse the repository at this point in the history
  19. cloud_storage: Extract materialized manifest cache tests

    Move cache tests into a separate translation unit. Rename
    cloud_storage_basic to clud_storage and move cache test there.
    Lazin committed Jun 15, 2023
    Configuration menu
    Copy the full SHA
    c32888d View commit details
    Browse the repository at this point in the history
  20. cloud_storage: Remove retries from init_cursor

    Do not retry in the remote_partition::init_cursor because the
    async_manifest_view::get_cursor retries internally.
    Lazin committed Jun 15, 2023
    Configuration menu
    Copy the full SHA
    51eaa66 View commit details
    Browse the repository at this point in the history