Releases: delta-io/delta-rs
Releases · delta-io/delta-rs
python-v0.22.0
What's Changed
- chore: bump the python package version to 0.21.0 by @rtyler in #2967
- fix: enable readerFeatures in minReaderVersion 3 by @rjancewicz in #2970
- chore(deps): update which requirement from 6 to 7 by @dependabot in #2971
- perf: close partition writers concurrently by @alexwilcoxson-rel in #2984
- chore(deps): update thiserror requirement from 1 to 2 by @dependabot in #2985
- perf: batch json decode checkpoint actions when writing to parquet by @alexwilcoxson-rel in #2983
- docs: fix typo by @RyRyRyNguyen in #2977
- fix: update object_store to 0.10.2 by @thomasfrederikhoeck in #2994
- fix: cache credential resolution with the AWS credential provider by @rtyler in #2987
- fix: jsonwriter should checkpoint by default by @jusjosj in #2993
- fix: correctly recognize existing delta tables using the transaction log by @stretchadito in #3005
- fix: fixed the deprecation warnings in spot check step of the build by @vksx in #3007
- chore: upgrade to datafusion 43 by @ion-elgreco in #2886
- docs: fix the verify table existence example in usage docs by @vksx in #3003
- fix: decimal stat rounding overflow by @gruuya in #2975
- feat: upgrade to delta_kernel 0.4.1 🎉 by @rtyler in #3016
- chore: include license files in published crates by @ankane in #3009
- docs: explain the value of deltalake on first page of docs by @braaannigan in #3017
- feat: provide direct TableProvider integration in datafusion-python by @timsaucer in #3012
- fix: expression with dates to string conversation by @JonasDev1 in #3019
- chore: fixed a bunch of warnings and deprecations by @hntd187 in #3020
- fix: num rows statistics by @lewiszlw in #2990
- docs: mention AWS_ENDPOINT_URL_DYNAMODB by @maxitg in #3021
- feat: override dynamodb config by @thomas-chauvet in #3011
New Contributors
- @rjancewicz made their first contribution in #2970
- @RyRyRyNguyen made their first contribution in #2977
- @stretchadito made their first contribution in #3005
- @vksx made their first contribution in #3007
- @ankane made their first contribution in #3009
- @timsaucer made their first contribution in #3012
- @lewiszlw made their first contribution in #2990
- @maxitg made their first contribution in #3021
- @thomas-chauvet made their first contribution in #3011
Full Changelog: python-v0.21.0...python-v0.22.0
python-v0.21.0
What's Changed
- fix: avoid reference after move when building w/o datafusion by @rtyler in #2917
- docs: updating docs for MinIO and Docker, with working conditional put support by @rwhaling in #2895
- feat: get earliest version by @ion-elgreco in #2797
- fix: cdc merge union schema order by @ion-elgreco in #2919
- chore: update the changelog with the latest rust merges by @rtyler in #2918
- fix: datafusion sanity checker passes when all files filtered out by @adamfaulkner-at in #2830
- fix:
ListingSchemaProvider
not forwarding customstorage_options
by @Nordalf in #2924 - fix: update createdTime during schema evolution by @ion-elgreco in #2926
- fix: properly handle the different flavors of storage options keys by @rtyler in #2931
- chore: introduce some more writer tests to demonstrate schema promotion by @rtyler in #2935
- fix: panics when creating delta tables in threads by @PeterKeDer in #2940
- docs: minor fixes to the architecture documentation by @0x26res in #2934
- feat(python): reading/writing transaction identifiers in python by @PeterKeDer in #2942
- fix: improve errors on field cast failures by @jkylling in #2932
- fix: dramatically reduce checkpoint memory consumption by @rtyler in #2956
- chore: enable a dummy DCO app to unblock the merge queue by @rtyler in #2959
- docs: add merge guide by @avriiil in #2953
- fix: ensure the checkpoint decoder is regularly flushed by @rtyler in #2960
- chore: fix many warnings across the codebase by @hntd187 in #2955
- chore: be less verbose about zorder by @rtyler in #2963
New Contributors
- @rwhaling made their first contribution in #2895
- @adamfaulkner-at made their first contribution in #2830
- @Nordalf made their first contribution in #2924
- @0x26res made their first contribution in #2934
Full Changelog: python-v0.20.2...python-v0.21.0
rust-v0.20.1
What's Changed
- chore: pin the Rust baseline version to 1.80 by @rtyler in #2842
- fix: pin the build-dependencies for Python to a slightly older vendored openssl by @rtyler in #2856
- fix: escaped columns in dataskippingstatscolumns by @ion-elgreco in #2855
- fix: set put mode to overwrite in mount backend by @ion-elgreco in #2861
- fix(rust): scan schema fix for predicate by @sherlockbeard in #2869
- chore: attempt to ignore all dependabot checks for arrow and datafusion by @rtyler in #2870
- fix: stats is optional in add action by @jkylling in #2841
- fix: prepare the next 🦀 release with fixed version ranges by @rtyler in #2875
- fix: pin broken dependencies and changes in 0.19.1 by @rtyler in #2878
- chore: exclude parquet from dependabot as well by @rtyler in #2874
- chore: rearrange github actions a bit by @rtyler in #2868
- chore: cleanup codecov defaults by @rtyler in #2876
- chore(deps): update sqlparser requirement from 0.50 to 0.51 by @dependabot in #2881
- chore: set max_retries in CommitProperties by @helanto in #2826
- fix(python, rust): use require files by @ion-elgreco in #2809
- fix: re-enable optional old casting behavior in merge by @ion-elgreco in #2853
- fix: conditionally disable enable_io non-unix based systems by @hntd187 in #2884
- docs: fix typo in delta-lake-dagster by @jessy1092 in #2883
- feat: add support for
pyarrow.ExtensionType
by @fecet in #2885 - feat: improve AWS credential loading between S3 and DynamoDb code paths by @rtyler in #2887
- feat(python, rust):
add feature
operation by @ion-elgreco in #2712 - fix: convert last checkpoint json keys to camelCase by @feniljain in #2889
- fix: ensure log store correctly identifies existing delta tables by @rtyler in #2890
- docs: updating usage/managing-tables on optimizing tables by @VillePuuska in #2892
- fix: double-encode paths during zorder optimize when they contain special characters by @rtyler in #2897
- fix: check lowercase config keys with/without aws prefix in s3logstorefactory by @ion-elgreco in #2896
- docs: typo fix by @avriiil in #2902
- docs: introduce GCS backend docs by @avriiil in #2903
- docs: correct Minio docs fix typo by @avriiil in #2904
- refactor: exposing CommitConflictError enum by @Filip-Dziuba in #2907
- docs: add adls backend docs by @avriiil in #2913
- chore: bump the patch version for a 0.20.1 release by @rtyler in #2916
New Contributors
- @helanto made their first contribution in #2826
- @jessy1092 made their first contribution in #2883
- @fecet made their first contribution in #2885
- @feniljain made their first contribution in #2889
- @VillePuuska made their first contribution in #2892
- @Filip-Dziuba made their first contribution in #2907
Full Changelog: python-v0.19.2...rust-v0.20.1
python-v0.20.1
Bug Fixes
- fix: convert last checkpoint json keys to camelCase by @feniljain in #2889
- fix: ensure log store correctly identifies existing delta tables by @rtyler in #2890
- fix: double-encode paths during zorder optimize when they contain special characters by @rtyler in #2897
- fix: check lowercase config keys with/without aws prefix in s3logstorefactory by @ion-elgreco in #2896
Other Changes
- refactor: exposing CommitConflictError enum by @Filip-Dziuba in #2907
- docs: add adls backend docs by @avriiil in #2913
- chore: bump the patch version for a 0.20.1 release by @rtyler in #2916
- docs: typo fix by @avriiil in #2902
- docs: introduce GCS backend docs by @avriiil in #2903
- docs: correct Minio docs fix typo by @avriiil in #2904
- docs: updating usage/managing-tables on optimizing tables by @VillePuuska in #2892
New Contributors
- @feniljain made their first contribution in #2889
- @VillePuuska made their first contribution in #2892
- @Filip-Dziuba made their first contribution in #2907
Full Changelog: python-v0.20.0...python-v0.20.1
python-v0.20.0
New features
- feat(python, rust):
add feature
operation by @ion-elgreco in #2712 - feat: add support for
pyarrow.ExtensionType
by @fecet in #2885 - feat: improve AWS credential loading between S3 and DynamoDb code paths by @rtyler in #2887
Bug Fixes
- fix: pin the build-dependencies for Python to a slightly older vendored openssl by @rtyler in #2856
- fix: escaped columns in dataskippingstatscolumns by @ion-elgreco in #2855
- fix: set put mode to overwrite in mount backend by @ion-elgreco in #2861
- fix(rust): scan schema fix for predicate by @sherlockbeard in #2869
- fix(python, rust): use require files by @ion-elgreco in #2809
- fix: re-enable optional old casting behavior in merge by @ion-elgreco in #2853
- fix: conditionally disable enable_io non-unix based systems by @hntd187 in #2884
- fix: stats is optional in add action by @jkylling in #2841
- fix: prepare the next 🦀 release with fixed version ranges by @rtyler in #2875
- fix: pin broken dependencies and changes in 0.19.1 by @rtyler in #2878
Other Changes
- chore: pin the Rust baseline version to 1.80 by @rtyler in #2842
- chore: attempt to ignore all dependabot checks for arrow and datafusion by @rtyler in #2870
- docs: fix typo in delta-lake-dagster by @jessy1092 in #2883
- chore: exclude parquet from dependabot as well by @rtyler in #2874
- chore: rearrange github actions a bit by @rtyler in #2868
- chore: cleanup codecov defaults by @rtyler in #2876
- chore(deps): update sqlparser requirement from 0.50 to 0.51 by @dependabot in #2881
- chore: set max_retries in CommitProperties by @helanto in #2826
New Contributors
- @helanto made their first contribution in #2826
- @jessy1092 made their first contribution in #2883
- @fecet made their first contribution in #2885
Full Changelog: python-v0.19.2...python-v0.20.0
python-v0.19.2: objectstore conditional put
New features
- perf: conditional put for default log store (e.g. azure, gcs, minio, cloudflare) by @ion-elgreco in #2813
- feat: make
Add::get_json_stats
public by @gruuya in #2822 - refactor(python): add pymergebuilder by @ion-elgreco in #2823
- feat: public method to get partitions for DeltaTable (#2671) by @omkar-foss in #2816
- feat(rust): add operationMetrics to WRITE by @gavinmead in #2838
Bug Fixes
- fix: enable feature flags to which deltalake-core build tokio with enable_io by @rtyler in #2803
- fix(rust): set token provider explicitly by @ion-elgreco in #2817
- fix(python, rust): allow
in
pushdowns in early_filter by @ion-elgreco in #2807 - fix: use table config target file size, expose target_file_size in python by @ion-elgreco in #2811
Other Changes
- chore(python): raise not implemented in from_data_catalog by @ion-elgreco in #2799
- docs: added WriterProperties documentation by @sherlockbeard in #2804
- chore(python): remove deprecated or duplicated functions by @ion-elgreco in #2801
- test(python): fix optimize call in benchmark by @ion-elgreco in #2812
- docs: fix documentation about max_spill_size by @junhl in #2835
- refactor(python): post_commit_hook_properties derive by @ion-elgreco in #2824
- docs: fix docstring of set_table_properties by @astrojuanlu in #2820
- chore: enable codecov reporting by @rtyler in #2836
- chore(aws): use backon to replace backoff by @Xuanwo in #2840
- chore: update python by @ion-elgreco in #2845
- docs: concurrent writes permission missing by @poguez in #2846
New Contributors
- @junhl made their first contribution in #2835
- @Xuanwo made their first contribution in #2840
- @gavinmead made their first contribution in #2838
- @poguez made their first contribution in #2846
Full Changelog: python-v0.19.1...python-v0.19.2
python-v0.19.1: separate IO runtime
New features
- feat: configurable IO runtime by @ion-elgreco in #2789
- feat(python, rust): added statistics_truncate_length in WriterProperties by @sherlockbeard in #2784
- feat(python, rust): add ColumnProperties And rework in python WriterProperties by @sherlockbeard in #2793
Bug Fixes
- fix: pin maturin verison by @ion-elgreco in #2778
- fix(rust):
max_spill_size
default value by @mrjsj in #2795 - fix: trim trailing slash in url storage options (#2656) by @omkar-foss in #2775
Other Changes
- chore: update the changelog with the 0.19.0 release by @rtyler in #2774
- style: more consistent imports by @roeap in #2786
- chore: remove some
file_actions
call sites by @roeap in #2787 - chore(deps): update sqlparser requirement from 0.49 to 0.50 by @dependabot in #2792
New Contributors
Full Changelog: python-v0.19.0...python-v0.19.1
python-v0.19.0: complete CDF support, add column operation, faster MERGE
Breaking changes!
Default writer engine has changed to rust. Replace your partition_filters with a predicate (sql) instead. PyArrow engine is deprecated now, and will be removed in v1.0.
Highlights
- CDF support in write_deltalake, delete, and merge operation
- Expired logs cleanup during post-commit. Can be disabled with
delta.enableExpiredLogCleanup = false
- Improved MERGE performance by using predicate non-partition columns min/max for prefiltering
ADD column
operation- Speed up log parsing
Performance improvements
- perf: apply projection when reading checkpoint parquet by @alexwilcoxson-rel in #2717
- perf: grab file size in rust by @ion-elgreco in #2734
- feat: improve merge performance by using predicate non-partition columns min/max for prefiltering by @JonasDev1 in #2513
- perf: early stop if all values in arr are null by @ion-elgreco in #2764
New features
- feat(python, rust): cdc write-support for
delete
operation by @ion-elgreco in #2721 - feat(python, rust): cdc write-support for
overwrite
andreplacewhere
writes by @ion-elgreco in #2722 - feat: introduce CDC generation for merge operations by @rtyler in #2747
- feat: use logical plan in delete, delta planner refactoring by @ion-elgreco in #2725
- feat: use logical plan in update, refactor/simplify CDCTracker by @ion-elgreco in #2727
- feat(python, rust): arrow large/view types passthrough, rust default engine by @ion-elgreco in #2738
- feat(python, rust): cleanup expired logs post-commit hook by @ion-elgreco in #2459
- feat(python, rust):
add column
operation by @ion-elgreco in #2562 - feat(python): handle PyCapsule interface objects in write_deltalake by @kylebarron in #2534
- feat(rust): fix size_in_bytes in last_checkpoint_ to i64 by @sherlockbeard in #2649
- feat(rust,python): cast each parquet file to delta schema by @HawaiianSpork in #2615
- feat: support userMetadata in CommitInfo by @jkylling in #2670
- feat(python, rust): add projection in CDF reads by @ion-elgreco in #2704
- feat(python): add DeltaTable.is_deltatable static method (#2662) by @omkar-foss in #2715
- feat: improved test fixtures by @roeap in #2749
- feat: fail fast on forked process by @Tom-Newton in #2765
- feat: restore the TryFrom for DeltaTablePartition by @rtyler in #2767
- feat: more economic data skipping with datafusion by @roeap in #2772
Bug Fixes
- fix(rust): inconsistent order of partitioning columns (#2494) by @aditanase in #2614
- fix(rust,python): checkpoint with column nullable false by @sherlockbeard in #2680
- fix: update delta kernel version by @jeppe742 in #2685
- fix(python): empty dataset fix for "pyarrow" engine by @sherlockbeard in #2689
- fix: ensure DataFusion SessionState Parquet options are applied to DeltaScan by @alexwilcoxson-rel in #2702
- fix(python, rust): use url encoder when encoding partition values by @ion-elgreco in #2705
- fix(python, rust): use input schema to get correct schema in cdf reads by @ion-elgreco in #2723
- fix: change arrow map root name to follow with parquet root name by @sclmn in #2538
- fix: schema adapter doesn't map partial batches correctly by @alexwilcoxson-rel in #2735
- fix: optimize Spark written tables by @rtyler in #1650
- fix(python, rust): cdc in writer not creating inserts by @ion-elgreco in #2751
- fix(python, rust): don't flatten fields during cdf read by @ion-elgreco in #2763
- fix: column parsing to include nested columns and enclosing char by @gtrawinski in #2737
Other Changes
- chore: missed one macos runner reference in actions by @rtyler in #2645
- chore: add a reproduction case for merge failures with struct by @rtyler in #2644
- ci: update CODEOWNERS by @hntd187 in #2650
- chore: increase subcrate versions by @rtyler in #2648
- docs: fix bullets on hdfs docs by @Kimahriman in #2653
- docs: improve navigation fixes by @avriiil in #2660
- docs: add integration docs for s3 backend by @avriiil in #2658
- chore: bump ruff to 0.5.2 by @fpgmaas in #2673
- chore: enable
RUF
ruleset forruff
by @fpgmaas in #2677 - chore: pin
ruff
andmypy
versions in thelint
stage in the CI pipeline by @fpgmaas in #2679 - chore: update README.md by @veronewra in #2684
- chore: create separate action to setup python and rust in the cicd pipeline by @fpgmaas in #2687
- chore: add test coverage command to
Makefile
by @fpgmaas in #2688 - chore: improve contributing.md by @fpgmaas in #2672
- chore: remove stale code for conditional import of
Literal
by @fpgmaas in #2676 - chore: remove references to black from the project by @fpgmaas in #2674
- chore: refactor
write_deltalake
inwriter.py
by @fpgmaas in #2695 - chore: upgrade to datafusion 40 by @rtyler in #2661
- chore: prepare python release 0.18.3 by @ion-elgreco in #2707
- chore: enabling actions for merge groups by @rtyler in #2718
- chore(deps): update sqlparser requirement from 0.47 to 0.49 by @dependabot in #2714
- chore: try an alternative docke compose invocation syntax by @rtyler in #2724
- chore(deps): update which requirement from 4 to 6 by @dependabot in #2730
- chore: update changelog and versions for next release by @rtyler in #2740
- chore: add to code_owner crates by @ion-elgreco in #2741
- chore: update delta_kernel to 0.3.0 by @alexwilcoxson-rel in #2742
- docs: fix broken link in docs by @astrojuanlu in #2746
- chore: upgrade to datafusion 41 by @rtyler in #2761
- chore: prepare the next notable release of 0.19.0 by @rtyler in #2768
- chore: fix a bunch of clippy lints and re-enable tests by @rtyler in #2773
New Contributors
- @aditanase made their first contribution in #2614
- @fpgmaas made their first contribution in #2673
- @kylebarron made their first contribution in #2534
- @veronewra made their first contribution in #2684
- @jeppe742 made their first contribution in #2685
- @sclmn made their first contribution in #2538
- @astrojuanlu made their first contribution in #2746
- @gtrawinski made their first contribution in #2737
Full Changelog: python-v0.18.2...python-v0.19.0
python-v0.18.2: HDFS support
New features
- feat(#2597): allow pyarrow.dataset.Expression in filters kwarg by @giacomorebecchi in #2600
- feat(rust, python): add HDFS support via hdfs-native package by @Kimahriman in #2612
- feat: report DataFusion metrics for DeltaScan by @alexwilcoxson-rel in #2617
Bug Fixes
- fix: enable parquet pushdown for DeltaScan via TableProvider impl for DeltaTable (rebase) by @rtyler in #2637
- fix(rust, python): fix writing empty structs when creating checkpoint by @sherlockbeard in #2627
- fix(python): fixed large_dtype to schema convert by @sherlockbeard in #2635
- fix(rust, python): fix merge schema with overwrite by @sherlockbeard in #2623
- fix(python): constrain multipart upload size to fixed length by @abhiaagarwal in #2606
- fix: update changelog by @rtyler in #2599
Other Changes
- chore: migrate to pyo3 Bounds API by @abhiaagarwal in #2596
- chore(deps): update dashmap requirement from 5 to 6 by @dependabot in #2641
- chore: remove macos builders from pull request flow by @rtyler in #2638
- docs: add Daft writer by @avriiil in #2594
- chore: fix documentation generation with a pin of griffe by @rtyler in #2636
- chore: bump python 0.18.2 by @ion-elgreco in #2621
- chore: implement regression test for push down panic by @rtyler in #2604
- docs: fix typo by @avriiil in #2603
- test: reintroduce azurite SAS integration tests by @giacomorebecchi in #2598
New Contributors
- @giacomorebecchi made their first contribution in #2598
- @Kimahriman made their first contribution in #2612
- @sherlockbeard made their first contribution in #2623
Full Changelog: python-v0.18.1...python-v0.18.2
python-v0.18.1
New features
- feat: add custom dynamodb endpoint configuration by @hnaoto in #2575
- chore: bump to datafusion 39, arrow 52, pyo3 0.21 by @abhiaagarwal in #2581
Bug Fixes
- chore: bump macOS runners, maybe resolve import error by @ion-elgreco in #2588
Other Changes
- docs: improve S3 access docs by @avriiil in #2589
- chore: expose
files_by_partition
to public api by @edmondop in #2533
New Contributors
- @abhiaagarwal made their first contribution in #2581
- @edmondop made their first contribution in #2533
Full Changelog: python-v0.18.0...python-v0.18.1