-
Notifications
You must be signed in to change notification settings - Fork 170
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Incompatibility with deltalake v0.19.0 and above - failure in write_deltalake() #2753 #2754
[BUG] Incompatibility with deltalake v0.19.0 and above - failure in write_deltalake() #2753 #2754
Conversation
this PR conflicts with #2704 and overrides it |
CodSpeed Performance ReportMerging #2754 will degrade performances by 57.59%Comparing Summary
Benchmarks breakdown
|
hi @igor-pechersky! Thanks for contribution. Do you mind sharing the error you receiving prior to this PR? Looks like we have a test cast failing because the data schema and table schema differ
vs
We may need to still perform a manual cast |
deltalake==0.19.1
pyarrow==17.0.0
lancedb>=0.12.0
try large_dtypes=True when writing to delta |
Hi @igor-pechersky I didn't realize that we already had a PR open for this issue before I made #2827 😄 Thanks for contributing this! However I think I would prefer to merge in my PR instead because it maintains compatibility with past deltalake versions. |
Closing this because we are merging in #2827! |
Thanks @igor-pechersky for taking a stab at this! |
Deltalake v0.19 changes their `_convert_pa_schema_to_delta` function to take in a `schema_conversion_mode` instead of `large_dtypes`. Pyarrow also needed to be upgraded to v16.0.0 as well to be compatible with the new version of deltalake The difference between this PR and #2754 is that it still maintains compatibility with older deltalake versions. The reason why this PR also includes a change to arrow2 is because starting in version 0.19, deltalake uses arrow-rs by default instead of pyarrow to write files when calling `deltalake.write_deltalake`. We do not actually use this functionality but our tests do, and arrow-rs writes map arrays in a way that does not conform to the parquet spec. I figured it would be good to just add that compatibility in there just in case some user is using `arrow-rs` to write their parquet files. However, there are also other issues with deltalake's rust writer, including improper encoding of partitioned binary columns, so we will use their pyarrow writer for testing.
…nc#2827) Deltalake v0.19 changes their `_convert_pa_schema_to_delta` function to take in a `schema_conversion_mode` instead of `large_dtypes`. Pyarrow also needed to be upgraded to v16.0.0 as well to be compatible with the new version of deltalake The difference between this PR and Eventual-Inc#2754 is that it still maintains compatibility with older deltalake versions. The reason why this PR also includes a change to arrow2 is because starting in version 0.19, deltalake uses arrow-rs by default instead of pyarrow to write files when calling `deltalake.write_deltalake`. We do not actually use this functionality but our tests do, and arrow-rs writes map arrays in a way that does not conform to the parquet spec. I figured it would be good to just add that compatibility in there just in case some user is using `arrow-rs` to write their parquet files. However, there are also other issues with deltalake's rust writer, including improper encoding of partitioned binary columns, so we will use their pyarrow writer for testing.
…nc#2827) Deltalake v0.19 changes their `_convert_pa_schema_to_delta` function to take in a `schema_conversion_mode` instead of `large_dtypes`. Pyarrow also needed to be upgraded to v16.0.0 as well to be compatible with the new version of deltalake The difference between this PR and Eventual-Inc#2754 is that it still maintains compatibility with older deltalake versions. The reason why this PR also includes a change to arrow2 is because starting in version 0.19, deltalake uses arrow-rs by default instead of pyarrow to write files when calling `deltalake.write_deltalake`. We do not actually use this functionality but our tests do, and arrow-rs writes map arrays in a way that does not conform to the parquet spec. I figured it would be good to just add that compatibility in there just in case some user is using `arrow-rs` to write their parquet files. However, there are also other issues with deltalake's rust writer, including improper encoding of partitioned binary columns, so we will use their pyarrow writer for testing.
#2753