Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(mito): parquet memtable reader #4967

Open
wants to merge 18 commits into
base: main
Choose a base branch
from

Commits on Nov 15, 2024

  1. wip: row group reader base

    v0y4g3r committed Nov 15, 2024
    Configuration menu
    Copy the full SHA
    d98d10b View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    5c5e36c View commit details
    Browse the repository at this point in the history
  3. Refactor MemtableRowGroupReader to streamline data fetching

     - Added early return when fetch_ranges is empty to optimize performance.
     - Replaced inline chunk data assignment with a call to `assign_dense_chunk` for cleaner code.
    v0y4g3r committed Nov 15, 2024
    Configuration menu
    Copy the full SHA
    068dc06 View commit details
    Browse the repository at this point in the history
  4. wip: row group reader

    v0y4g3r committed Nov 15, 2024
    Configuration menu
    Copy the full SHA
    b5a5ff2 View commit details
    Browse the repository at this point in the history
  5. wip: reuse RowGroupReader

    v0y4g3r committed Nov 15, 2024
    Configuration menu
    Copy the full SHA
    34478d0 View commit details
    Browse the repository at this point in the history
  6. wip: bulk part reader

    v0y4g3r committed Nov 15, 2024
    Configuration menu
    Copy the full SHA
    4470d7c View commit details
    Browse the repository at this point in the history
  7. Enhance BulkPart Iteration with Filtering

     - Introduced `RangeBase` to `BulkIterContext` for improved filter handling.
     - Implemented filter application in `BulkPartIter` to prune batches based on predicates.
     - Updated `SimpleFilterContext::new_opt` to be public for broader access.
    v0y4g3r committed Nov 15, 2024
    Configuration menu
    Copy the full SHA
    69008a5 View commit details
    Browse the repository at this point in the history
  8. chore: add prune test

    v0y4g3r committed Nov 15, 2024
    Configuration menu
    Copy the full SHA
    6d7c8d6 View commit details
    Browse the repository at this point in the history
  9. fix: clippy

    v0y4g3r committed Nov 15, 2024
    Configuration menu
    Copy the full SHA
    cc422ab View commit details
    Browse the repository at this point in the history
  10. Configuration menu
    Copy the full SHA
    1719213 View commit details
    Browse the repository at this point in the history
  11. Enhance BulkPart read method to return Option<BoxedBatchIterator>

     - Modified `BulkPart::read` to return `Option<BoxedBatchIterator>` to handle cases where no row groups are selected.
     - Added logic to return `None` when all row groups are filtered out.
     - Updated tests to handle the new return type and added a test case to verify behavior when no row groups match the pr
    v0y4g3r committed Nov 15, 2024
    Configuration menu
    Copy the full SHA
    e60b832 View commit details
    Browse the repository at this point in the history
  12. refactor/separate-paraquet-reader: Add helper function to parse parqu…

    …et metadata and integrate it into BulkPartEncoder
    v0y4g3r committed Nov 15, 2024
    Configuration menu
    Copy the full SHA
    c9ceb72 View commit details
    Browse the repository at this point in the history
  13. refactor/separate-paraquet-reader:

     Change BulkPartEncoder row_group_size from Option to usize and update tests
    v0y4g3r committed Nov 15, 2024
    Configuration menu
    Copy the full SHA
    dbb4e3e View commit details
    Browse the repository at this point in the history
  14. refactor/separate-paraquet-reader: Add context module for bulk memtab…

    …le iteration and refactor part reading
    
     • Introduce context module to encapsulate context for bulk memtable iteration.
     • Refactor BulkPart to use BulkIterContextRef for reading operations.
     • Remove redundant code in BulkPart by centralizing context creation and row group pruning logic in the new context module.
     • Create new file context.rs with structures and logic for handling iteration context.
     • Adjust part_reader.rs and row_group_reader.rs to reference the new BulkIterContextRef.
    v0y4g3r committed Nov 15, 2024
    Configuration menu
    Copy the full SHA
    74c0474 View commit details
    Browse the repository at this point in the history
  15. refactor/separate-paraquet-reader: Refactor RowGroupReader traits and…

    … implementations in memtable and parquet reader modules
    
     • Rename RowGroupReaderVirtual to RowGroupReaderContext for clarity.
     • Replace BulkPartVirt with direct usage of BulkIterContextRef in MemtableRowGroupReader.
     • Simplify MemtableRowGroupReaderBuilder by directly passing context instead of creating a BulkPartVirt instance.
     • Update RowGroupReaderBase to use context field instead of virt, reflecting the trait renaming and usage.
     • Modify FileRangeVirt to FileRangeContextRef and adjust implementations accordingly.
    v0y4g3r committed Nov 15, 2024
    Configuration menu
    Copy the full SHA
    0eebb5f View commit details
    Browse the repository at this point in the history
  16. refactor/separate-paraquet-reader: Refactor column page reader creati…

    …on and remove unused code
    
     • Centralize creation of SerializedPageReader in RowGroupBase::column_reader method.
     • Remove unused RowGroupCachedReader and related code from MemtableRowGroupPageFetcher.
     • Eliminate redundant error handling for invalid column index in multiple places.
    v0y4g3r committed Nov 15, 2024
    Configuration menu
    Copy the full SHA
    0af7d70 View commit details
    Browse the repository at this point in the history
  17. Configuration menu
    Copy the full SHA
    356bb47 View commit details
    Browse the repository at this point in the history

Commits on Dec 1, 2024

  1. fix: some comments

    v0y4g3r committed Dec 1, 2024
    Configuration menu
    Copy the full SHA
    1ec83c2 View commit details
    Browse the repository at this point in the history