Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: apply projection when reading checkpoint parquet #2717

Merged
merged 1 commit into from
Jul 30, 2024

Conversation

alexwilcoxson-rel
Copy link
Contributor

Description

This change applies projection pushdown when reading the checkpoint parquet file. Typically only a portion of the schema is read. Adds + Txns, Removes, or Metadata+Protocol.

Note: I also attempted to apply a RowFilter. This works but provided no additional gain, and in the event every column was projected it slowed things down.

The results show that the old code is the same roughly as the new code when projecting every column. However when projecting just M+P actions there is a about a 2.75x improvement.

image

I have less data on this but the full EagerSnapshot on my 1,000,000 action table went from ~600-700ms to ~150ms.

Related Issue(s)

Documentation

@github-actions github-actions bot added the binding/rust Issues for the Rust crate label Jul 29, 2024
@rtyler rtyler enabled auto-merge (rebase) July 30, 2024 10:35
@rtyler rtyler disabled auto-merge July 30, 2024 10:40
@rtyler rtyler enabled auto-merge July 30, 2024 10:40
@rtyler rtyler added this pull request to the merge queue Jul 30, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Jul 30, 2024
@rtyler rtyler added this pull request to the merge queue Jul 30, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Jul 30, 2024
@rtyler rtyler added this pull request to the merge queue Jul 30, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Jul 30, 2024
@rtyler rtyler added this pull request to the merge queue Jul 30, 2024
Merged via the queue into delta-io:main with commit 18cad15 Jul 30, 2024
18 checks passed
@alexwilcoxson-rel alexwilcoxson-rel deleted the project-checkpoint-schema branch July 30, 2024 19:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
binding/rust Issues for the Rust crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants