-
Notifications
You must be signed in to change notification settings - Fork 406
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a Rust-backed engine for write_deltalake #1861
Comments
we should use this opportunity to also consolidate our writer implementations. right now we have one in The |
@roeap I am mainly exposing |
# Description - Adds rust writer as additional engine in python - Adds overwrite schema functionality to the rust writer. @roeap feel free to point out improvements 😄 A couple gaps will exist between current Rust writer and pyarrow writer. We will have to solve this in a later PR: - Replacewhere (partition filter / predicate) overwrite (users however can solve this by doing DeltaTabel.delete and then append) # Related Issue(s) - closes #1861 --------- Signed-off-by: Nikolay Ulmasov <[email protected]> Co-authored-by: Robert Pack <[email protected]> Co-authored-by: Robert Pack <[email protected]> Co-authored-by: David Blajda <[email protected]> Co-authored-by: Nikolay Ulmasov <[email protected]> Co-authored-by: Matthew Powers <[email protected]> Co-authored-by: Thomas Frederik Hoeck <[email protected]> Co-authored-by: Adrian Ehrsam <[email protected]> Co-authored-by: Will Jones <[email protected]> Co-authored-by: Marijn Valk <[email protected]>
I'll re-open this to keep track until we have a more full-featured rust writer? |
@roeap yeah good one! |
- Adds rust writer as additional engine in python - Adds overwrite schema functionality to the rust writer. @roeap feel free to point out improvements 😄 A couple gaps will exist between current Rust writer and pyarrow writer. We will have to solve this in a later PR: - Replacewhere (partition filter / predicate) overwrite (users however can solve this by doing DeltaTabel.delete and then append) - closes delta-io#1861 --------- Signed-off-by: Nikolay Ulmasov <[email protected]> Co-authored-by: Robert Pack <[email protected]> Co-authored-by: Robert Pack <[email protected]> Co-authored-by: David Blajda <[email protected]> Co-authored-by: Nikolay Ulmasov <[email protected]> Co-authored-by: Matthew Powers <[email protected]> Co-authored-by: Thomas Frederik Hoeck <[email protected]> Co-authored-by: Adrian Ehrsam <[email protected]> Co-authored-by: Will Jones <[email protected]> Co-authored-by: Marijn Valk <[email protected]>
Description
Right now we've built on top of the PyArrow writers. This requires a lot of complex code that is essentially duplicating logic in Rust. The main motivation for writing it was that the Rust implementation wasn't ready, so it was faster to build on top of PyArrow. That might not be true anymore.
We can update the signature of
write_deltalake()
to take anengine
parameter (sort of like how Pandas read_parquet has this parameter), which would let users choose to use the pyarrow engine or the Rust engine for now. Eventually we can switch the default and deprecate the pyarrow implementation.We should be on the lookout for issues that block this. First, we need to make sure the same unit tests pass with the new writer. So we should parametrize all tests by the engine.
Second, we should be on the lookout for performance issues. We have a set of benchmarks here:
delta-rs/python/tests/test_benchmark.py
Line 28 in 56a9728
Use Case
Related Issue(s)
The text was updated successfully, but these errors were encountered: