Skip to content

Latest commit

 

History

History
631 lines (497 loc) · 56.9 KB

CHANGELOG.md

File metadata and controls

631 lines (497 loc) · 56.9 KB

Changelog

rust-v0.17.0 (2024-02-06)

⚠️ The release of 0.17.0 removes the legacy dynamodb lock functionality, AWS users must read these release notes! ⚠️

File handlers

The 0.17.0 release moves storage implementations into their own crates, such as deltalake-aws. A consequence of that refactoring is that custom storage and file scheme handlers must be registered/initialized at runtime. Storage subcrates conventionally define a register_handlers function which performs that task. Users may see errors such as:

thread 'main' panicked at /home/ubuntu/.cargo/registry/src/index.crates.io-6f17d22bba15001f/deltalake-core-0.17.0/src/table/builder.rs:189:48:
The specified table_uri is not valid: InvalidTableLocation("Unknown scheme: s3")
  • Users of the meta-crate (deltalake) can call the storage crate via: deltalake::aws::register_handlers(None); at the entrypoint for their code.
  • Users who adopt core and storage crates independently (e.g. deltalake-aws) can register via deltalake_aws::register_handlers(None);.

The AWS, Azure, and GCP crates must all have their custom file schemes registered in this fashion.

dynamodblock to S3DynamoDbLogStore

The locking mechanism is fundamentally different between deltalake v0.16.x and v0.17.0, starting with this release the deltalake and deltalake-aws crates this library now relies on the same protocol for concurrent writes on AWS as the Delta Lake/Spark implementation.

Fundamentally the DynamoDB table structure changes, which is documented here. The configuration of a Rust process should continue to use the AWS_S3_LOCKING_PROVIDER environment value of dynamodb. The new table must be specified with the DELTA_DYNAMO_TABLE_NAME environment or configuration variable, and that should name the new S3DynamoDbLogStore compatible DynamoDB table.

Because locking is required to ensure safe cconsistent writes, there is no iterative migration, 0.16 and 0.17 writers cannot safely coexist. The following steps should be taken when upgrading:

  1. Stop all 0.16.x writers
  2. Ensure writes are completed, and lock table is empty.
  3. Deploy 0.17.0 writers

Full Changelog

Implemented enhancements:

  • Expose the ability to compile DataFusion with SIMD #2118
  • Updating Table log retention configuration with write_deltalake silently changes nothing #2108
  • ALTER table, ALTER Column, Add/Modify Comment, Add/remove/rename partitions, Set Tags, Set location, Set TBLProperties #2088
  • Docs: Update docs for check constraints #2063
  • Don't ensure_table_uri when creating a table with_log_store #2036
  • Exposing custom_metadata in merge operation #2031
  • Support custom table properties via TableAlterer and write/merge #2022
  • Remove parquet2 crate support #2004
  • Merge operation that only touches necessary partitions #1991
  • store userMetadata on write operations #1990
  • Create Dask integration page #1956
  • Merge: Filtering on partitions #1918
  • Rethink the load_version and load_with_datetime interfaces #1910
  • docs: Delta Lake + Arrow Integration #1908
  • docs: Delta Lake + Polars integration #1906
  • Rethink decision to expose the public interface in namespaces #1900
  • Add documentation on how to build and run documentation locally #1893
  • Add API to create an empty Delta Lake table #1892
  • Implementing CHECK constraints #1881
  • Check Invariants are respecting table features for write paths #1880
  • Organize docs with single lefthand sidebar #1873
  • Make sure invariants are handled properly throughout the codebase #1870
  • Unable to use deltalake Schema in write_deltalake #1862
  • Add a Rust-backed engine for write_deltalake #1861
  • Run doctest in CI for Python API examples #1783
  • [RFC] Use arrow for checkpoint reading and state handling #1776
  • Expose Python exceptions in public module #1771
  • Expose cleanup_metadata or create_checkpoint_from_table_uri_and_cleanup to the Python API #1768
  • Expose convert_to_delta to Python API #1767
  • Add high-level checking for append-only tables #1759

Fixed bugs:

  • Row order no longer preserved after merge operation #2165
  • Error when reading delta table with IDENTITY column #2152
  • Merge on IS NULL condition doesn't work for empty table #2148
  • JsonWriter converts structured parsing error into plain string #2143
  • Pandas import error when merging tables #2112
  • test_repair_on_update broken in main #2109
  • WriteBuilder::with_input_execution_plan does not apply the schema to the log's metadata fields #2105
  • MERGE logical plan vs execution plan schema mismatch #2104
  • Partitions not pushed down #2090
  • Cant create empty table with write_deltalake #2086
  • Unexpected high costs on Google Cloud Storage #2085
  • Unable to read s3 table: Unknown scheme: s3 #2065
  • write_deltalake not respecting writer_properties #2064
  • Unable to read/write tables with the "gs" schema in the table_uri in 0.15.1 #2060
  • LockClient requiered error for S3 backend in 0.15.1 python #2057
  • Error while writing Pandas DataFrame to Delta Lake (S3) #2051
  • Error with dynamo locking provider on 0.15 #2034
  • Conda version 0.15.0 is missing files #2021
  • Rust panicking through Python library when a delete predicate uses a nullable field #2019
  • No snapshot or version 0 found, perhaps /Users/watsy0007/resources/test_table/ is an empty dir? #2016
  • Generic DeltaTable error: type_coercion in Struct column in merge operation #1998
  • Constraint expr not formatted during commit action #1971
  • .load_with_datetime() is incorrectly rounding to nearest second #1967
  • vacuuming log files #1965
  • Unable to merge uppercase column names #1960
  • Schema error: Invalid data type for Delta Lake: Null #1946
  • Python v0.14 wheel files not up to date #1945
  • python Release 0.14 is missing Windows wheels #1942
  • CI integration test fails randomly: test_restore_by_datetime #1925
  • Merge data freezes indefenetely #1920
  • Load DeltaTable from non-existing folder causing empty folder creation #1916
  • Reoptimizes merge bins with only 1 file, even though they have no effect. #1901
  • The Python Docs link in README.MD points to old docs #1898
  • optimize.compact() fails with bad schema after updating to pyarrow 8.0 #1889
  • Python build is broken on main #1856
  • Checkpoint error with Azure Synapse #1847
  • merge very slow compared to delete + append on larger dataset #1846
  • get_add_actions fails with deltalake 0.13 #1835
  • Handle PyArrow CVE-2023-47248 #1834
  • Delta-rs writer hangs with to many file handles open (Azure) #1832
  • Encountering NotATable("No snapshot or version 0 found, perhaps xxx is an empty dir?") #1831
  • write_deltalake is not creating checkpoints #1815
  • Problem writing tables in directory named with char ~ #1806
  • DeltaTable Merge throws in merging if there are uppercase in Schema. #1797
  • rust merge error - datafusion panics #1790
  • expose use_dictionary=False when writing Delta Table and running optimize #1772

Closed issues:

  • Is this print necessary? Can we remove this. #2110
  • Azure concurrent writes #2069
  • Fix docs deployment #1867
  • Add a header in old docs and direct users to new docs #1865

rust-v0.16.5 (2023-11-15)

Full Changelog

Implemented enhancements:

  • When will upgrade object_store to 0.8? #1858
  • No Official Help #1849
  • Auto assign GitHub issues with a "take" message #1791

Fixed bugs:

  • cargo clippy fails on core in main #1843

rust-v0.16.4 (2023-11-12)

Full Changelog

Implemented enhancements:

  • Unable to add deltalake git dependency to cargo.toml #1821

rust-v0.16.3 (2023-11-08)

Full Changelog

Implemented enhancements:

  • Docs: add release GitHub action #1799
  • Use bulk deletes where possible #1761

Fixed bugs:

  • Code Owners no longer valid #1794
  • MERGE works incorrectly with partitioned table if the data column order is not same as table column order #1787
  • errors when using pyarrow dataset as a source #1779
  • Write to Microsoft OneLake failed. #1764

rust-v0.16.2 (2023-10-21)

Full Changelog

rust-v0.16.1 (2023-10-21)

Full Changelog

rust-v0.16.0 (2023-09-27)

Full Changelog

Implemented enhancements:

  • Expose Optimize option min_commit_interval in Python #1640
  • Expose create_checkpoint_for #1513
  • integration tests regularly fail for HDFS #1428
  • Add Support for Microsoft OneLake #1418
  • add support for atomic rename in R2 #1356

Fixed bugs:

  • Writing with large arrow types (e.g. large_utf8), writes wrong partition encoding #1669
  • [python] Different stringification of partition values in reader and writer #1653
  • Unable to interface with data written from Spark Databricks #1651
  • get_last_checkpoint does some unnecessary listing #1643
  • PartitionWriter's buffer_len doesn't include incomplete row groups #1637
  • Slack community invite link has expired #1636
  • delta-rs does not appear to support tables with liquid clustering #1626
  • Internal Parquet panic when using a Map type. #1619
  • partition_by with "$" on local filesystem #1591
  • ProtocolChanged error when perfoming append write #1585
  • Unable to cargo update using git tag or rev on Rust 1.70 #1580
  • NoMetadata error when reading detlatable #1562
  • Cannot read delta table: Delta protocol violation #1557
  • Update the CODEOWNERS to capture the current reviewers and contributors #1553
  • [Python] Incorrect file URIs when partition values contain escape character #1533
  • add documentation how to Query Delta natively from datafusion #1485
  • Python: write_deltalake to ADLS Gen2 issue #1456
  • Partition values that have been url encoded cannot be read when using deltalake #1446
  • Error optimizing large table #1419
  • Cannot read partitions with special characters (including space) with pyarrow >= 11 #1393
  • ImportError: deltalake/_internal.abi3.so: cannot allocate memory in static TLS block #1380
  • Invalid JSON in log record missing field schemaString for DLT tables #1302
  • Special characters in partition path not handled locally #1299

Merged pull requests:

  • chore: bump rust crate version #1675 (rtyler)
  • fix: change partitioning schema from large to normal string for pyarrow<12 #1671 (ion-elgreco)
  • feat: allow to set large dtypes for the schema check in write_deltalake #1668 (ion-elgreco)
  • docs: small consistency update in guide and readme #1666 (ion-elgreco)
  • fix: exception string in writer.py #1665 (sebdiem)
  • chore: increment python library version #1664 (wjones127)
  • docs: fix some typos #1662 (ion-elgreco)
  • fix: more consistent handling of partition values and file paths #1661 (roeap)
  • docs: add docstring to protocol method #1660 (MrPowers)
  • docs: make docs.rs build docs with all features enabled #1658 (simonvandel)
  • fix: enable offset listing for s3 #1654 (eeroel)
  • chore: fix the incorrect Slack link in our readme #1649 (rtyler)
  • fix: compensate for invalid log files created by Delta Live Tables #1647 (rtyler)
  • chore: proposed updated CODEOWNERS to allow better review notifications #1646 (rtyler)
  • feat: expose min_commit_interval to optimize.compact and optimize.z_order #1645 (ion-elgreco)
  • fix: avoid excess listing of log files #1644 (eeroel)
  • fix: introduce support for Microsoft OneLake #1642 (rtyler)
  • fix: explicitly require chrono 0.4.31 or greater #1641 (rtyler)
  • fix: include in-progress row group when calculating in-memory buffer length #1638 (BnMcG)
  • chore: relax chrono pin to 0.4 #1635 (houqp)
  • chore: update datafusion to 31, arrow to 46 and object_store to 0.7 #1634 (houqp)
  • docs: update Readme #1633 (dennyglee)
  • chore: pin the chrono dependency #1631 (rtyler)
  • feat: pass known file sizes to filesystem in Python #1630 (eeroel)
  • feat: implement parsing for the new domainMetadata actions in the commit log #1629 (rtyler)
  • ci: fix python release #1624 (wjones127)
  • ci: extend azure timeout #1622 (wjones127)
  • feat: allow multiple incremental commits in optimize #1621 (kvap)
  • fix: change map nullable value to false #1620 (cmackenzie1)
  • Introduce the changelog for the last couple releases #1617 (rtyler)
  • chore: bump python version to 0.10.2 #1616 (wjones127)
  • perf: avoid holding GIL in DeltaFileSystemHandler #1615 (wjones127)
  • fix: don't re-encode paths #1613 (wjones127)
  • feat: use url parsing from object store #1592 (roeap)
  • feat: buffered reading of transaction logs #1549 (eeroel)
  • feat: merge operation #1522 (Blajda)
  • feat: expose create_checkpoint_for to the public #1514 (haruband)
  • docs: update Readme #1440 (roeap)
  • refactor: re-organize top level modules #1434 (roeap)
  • feat: integrate unity catalog with datafusion #1338 (roeap)

rust-v0.15.0 (2023-09-06)

Full Changelog

Implemented enhancements:

  • Configurable number of retries for transaction commit loop #1595

Fixed bugs:

  • Unable to read table using VM Managed Identity on Azure #1462
  • Unable to query by partition column #1445

Merged pull requests:

rust-v0.14.0 (2023-08-01)

Full Changelog

Implemented enhancements:

  • Define common dependencies in Cargo Workspace #1572
  • Make delta_datafusion::find_files public #1559

Fixed bugs:

  • Excessive integration test sizes causing builds to fail #1550
  • Slack invite link is not working #1530

Merged pull requests:

rust-v0.13.1 (2023-07-18)

Fixed bugs:

  • Revert premature merge of an attempted fix for binary column statistics #1544

rust-v0.13.0 (2023-07-15)

Full Changelog

Implemented enhancements:

  • Add nested struct supports #1518
  • Support FixedLenByteArray UUID statistics as a logical scalar #1483
  • Exposing create_add in the API #1458
  • Update features table on README #1404
  • docs(python): show data catalog options in Python API reference #1347
  • Add optimization to only list log files starting at a certain name #1252
  • Support configuring parquet compression #1235
  • parallel processing in Optimize command #1171

Fixed bugs:

  • get_add_actions() MAX is not showing complete value #1534
  • Can't get stats's minValues in add actions #1515
  • Pyarrow is_null filter not working as expected after loading using deltalake #1496
  • Can't write to table that uses generated columns #1495
  • Json error: Binary is not supported by JSON when writing checkpoint files #1493
  • _last_checkpoint size field is incorrect #1468
  • Error when Z Ordering a larger dataset #1459
  • Timestamp parsing issue #1455
  • File options are ignored when writing delta #1444
  • Slack Invite Link No Longer Valid #1425
  • cleanup_metadata doesn't remove .checkpoint.parquet files #1420
  • The test of reading the data from the blob storage located in Azurite container failed #1415
  • The test of reading the data from the bucket located in Minio container failed #1408
  • Datafusion: unreachable code reached when parsing statistics with missing columns #1374
  • vacuum is very slow on Cloudflare R2 #1366

Closed issues:

  • Expose Compression Options or WriterProperties for writing to Delta #1469
  • Support out-of-core Z-order using DataFusion #1460
  • Expose Z-order in Python #1442

Merged pull requests:

rust-v0.12.0 (2023-05-30)

Full Changelog

Implemented enhancements:

  • Release delta-rs 0.11.0 (next release after 0.10.0) #1362
  • Support writing statistics for date columns in Rust #1209

Fixed bugs:

  • Rust writer in operations makes a lot of data copies #1394
  • Unable to read timestamp fields from column statistics #1372
  • Unable to write custom metadata via configuration since version 0.9.0 #1353
  • .get_add_actions() returns wrong column statistics when dataSkippingNumIndexedCols property of the table was changed #1223
  • Ensure decimal statistics are written correctly in Rust #1208

Merged pull requests:

  • feat: add list_with_offset to DeltaObjectStore #1410 (ognis1205)
  • chore: type-check friendlier exports #1407 (roeap)
  • chore: remove ancillary crates from the git tree #1406 (rtyler)
  • chore: bump the version for the next release #1405 (rtyler)
  • feat: more efficient parquet writer and more statistics #1397 (wjones127)
  • perf: improve record batch partitioning #1396 (roeap)
  • chore: bump datafusion to 25 #1389 (roeap)
  • refactor!: remove DeltaDataType aliases #1388 (cmackenzie1)
  • feat: vacuum with concurrent requests #1382 (wjones127)
  • feat: add datafusion storage catalog #1381 (roeap)
  • docs: updated schema.rs to use the right signature for decimal data type in documentation #1377 (rahulj51)
  • fix: delete operation when partition and non partition columns are used #1375 (Blajda)
  • fix: add conversion for string for Field::TimestampMicros (#1372) #1373 (cmackenzie1)
  • fix: allow user defined config keys #1365 (roeap)
  • ci: disable full debug symbol generation #1364 (roeap)
  • fix: include stats for all columns (#1223) #1342 (mrjoe7)

rust-v0.11.0 (2023-05-12)

Full Changelog

Implemented enhancements:

  • Implement simple delete case #832

Merged pull requests:

  • chore: update Rust package version #1346 (rtyler)
  • fix: replace deprecated arrow::json::reader::Decoder #1226 (rtyler)
  • feat: delete operation #1176 (Blajda)
  • feat: add wasbs to known schemes #1345 (iajoiner)
  • test: add some missing unit and doc tests for DeltaTablePartition #1341 (rtyler)
  • feat: write command improvements #1267 (roeap)
  • feat: added support for Databricks Unity Catalog #1331 (nohajc)
  • fix: double url encode of partition key #1324 (mrjoe7)

rust-v0.10.0 (2023-05-02)

Full Changelog

Implemented enhancements:

  • Support Optimize on non-append-only tables #1125

Fixed bugs:

  • DataFusion integration incorrectly handles partition columns defined "first" in schema #1168
  • Datafusion: SQL projection returns wrong column for partitioned data #1292
  • Unable to query partitioned tables #1291

Merged pull requests:

  • chore: add deprecation notices for commit logic on DeltaTable #1323 (roeap)
  • fix: handle local paths on windows #1322 (roeap)
  • fix: scan partitioned tables with datafusion #1303 (roeap)
  • fix: allow special characters in storage prefix #1311 (wjones127)
  • feat: upgrade to Arrow 37 and Datafusion 23 #1314 (rtyler)
  • Hide the parquet/json feature behind our own JSON feature #1307 (rtyler)
  • Enable the json feature for the parquet crate #1300 (rtyler)

rust-v0.9.0 (2023-04-14)

Full Changelog

Implemented enhancements:

  • hdfs support #300
  • Add decimal primitive type to document #1280
  • Improve error message when filtering on non-existant partition columns #1218

Fixed bugs:

  • Datafusion table provider: issues with timestamp types #441
  • Not matching column names when creating a RecordBatch from MapArray #1257
  • All stores created using DeltaObjectStore::new have an identical object_store_url #1188

Merged pull requests:

  • Upgrade datafusion to 22 which brings arrow upgrades with it #1249 (rtyler)
  • chore: df / arrow changes after update #1288 (roeap)
  • feat: read schema from parquet files in datafusion scans #1266 (roeap)
  • HDFS storage support via datafusion-objectstore-hdfs #1279 (iajoiner)
  • Add description of decimal primitive to SchemaDataType #1281 (ognis1205)
  • Fix names and nullability when creating RecordBatch from MapArray #1258 (balbok0)
  • Simplify the Store Backend Configuration code #1265 (mrjoe7)
  • feat: optimistic transaction protocol #632 (roeap)
  • Write support for additional Arrow datatypes #1044(chitralverma)
  • Unique delta object store url #1212 (gruuya)
  • improve err msg on use of non-partitioned column #1221 (marijncv)

rust-v0.8.0 (2023-03-10)

Full Changelog

Implemented enhancements:

  • feat(rust): support additional types for partition values #1170

Fixed bugs:

  • File pruning does not occur on partition columns #1175
  • Bug: Error loading Delta table locally #1157
  • Deltalake 0.7.0 with s3 feature compliation error due to rusoto_dynamodb version conflict #1191
  • Writing from a Delta table scan using WriteBuilder fails due to missing object store #1186

Merged pull requests:

rust-v0.7.0 (2023-02-11)

Full Changelog

Implemented enhancements:

  • Support FSCK REPAIR TABLE Operation #1092
  • Expose the Delta Log in a DataFrame that's easy for analysis #1031
  • Provide case-insensitive storage options in backend #999
  • Support local file path in CreateBuilder::with_location() #998
  • Save operational params in the same way with delta io #1054 (ismoshkov)

Fixed bugs:

  • DeltaTable DataFusion TableProvider does not support filter pushdown #1064
  • DeltaTable DataFusion scan does not prune files properly #1063
  • deltalake.DeltaTable constructor hangs in Jupyter #1093
  • Transaction log JSON formatting issue when writing data via Python bindings #1017
  • crates.io entry is missing link to rustdoc documentation #1076
  • URL Registered with ObjectStore registry is different from url in DeltaScan #1018
  • Not able to connect to Azure Storage with client id/secret #977
  • Deltalake 0.5 crate s3 feature dynamodb version mismatch #973
  • Overwrite mode does not work with Azure #939
  • Use Chrono without default features #914
  • cargo test does not run due to tls conflict #985
  • Azure SAS authorization fails with <AuthenticationErrorDetail>Signature fields not well formed. #910

Merged pull requests:

  • Make rustls default across all packages #1097 (wjones127)
  • Implement filesystem check #1103 (Blajda)
  • refactor: move vacuum command to operations module #1045 (roeap)
  • feat: enable passing storage options to Delta table builder via DataFusion's CREATE EXTERNAL TABLE #1043 (gruuya)
  • feat: improve storage location handling #1065 (roeap)
  • Fix to support UTC timezone #1022 (andrei-ionescu)
  • feat: harmonize and simplify storage configuration #1052 (roeap)
  • feat: expose function to get table of add actions #1033 (wjones127)
  • fix: change unexpected field logging level to debug #1112 (houqp)
  • fix: datafusion predicate pushdown and dependencies #1071 (roeap)
  • fix: azure sas key url encoding #1036 (roeap)
  • Add provisional workaround to support CDC #1039 #1042 (Fazzani)
  • improve debuggability of json ser/de errors #1119 (houqp)
  • Add an example of writing to a delta table with a RecordBatch #1085 (rtyler)
  • minor: optimize partition lookup for vacuum loop #1120 (houqp)
  • Add missing documentation metadata to Cargo.toml #1077 (johnbatty)
  • add test for null_count_schema_for_fields #1135 (marijncv)
  • add test for min_max_schema_for_fields #1122 (marijncv)
  • add test for get_boolean_from_metadata #1121 (marijncv)
  • add test for left_larger_than_right #1110 (marijncv)
  • Add test for: to_scalar_value #1086 (marijncv)
  • Fix typo in delta-inspect #1072 (byteink)
  • chore: update datafusion #1114 (roeap)

rust-v0.6.0 (2022-12-16)

Full Changelog

Implemented enhancements:

  • Support Apache Arrow DataFusion 15 #1020
  • Python package: Loosen version requirements for maturin #1004
  • Remove Cargo.lock from library crates and add Cargo.lock to binary ones #1000
  • More frequent Rust releases #969
  • Thoughts on adding read_delta to pandas #869
  • Add the support of the AWS_PROFILE environment variable for S3 #986 (fvaleye)

Fixed bugs:

  • Azure SAS signatures ending in "=" don't work #1003
  • Fail to compile deltalake crate, need to update dynamodb_lock in crates.io #1002
  • error reading delta table to pandas: runtime dropped the dispatch task #975
  • MacOS arm64 wheels are generated incorrectly #972
  • Overwrite creates new file #960
  • The written delta file has corrupted structure #956
  • Write mode doesn't work with Azure storage #955
  • Python: We don't error on reader protocol v2 #886
  • Cannot open a deltatable in S3 using AWS_PROFILE based credentials from a local machine #855

Merged pull requests:

* This Changelog was automatically generated by github_changelog_generator