Releases: delta-io/delta-rs
python-v0.13.0: Repair operation and PyArrow 13+ support
New features
- feat(python): expose FSCK (repair) operation by @ion-elgreco in #1730
- feat: add VACUUM operation as commit in transaction log by @ion-elgreco in #1728
- fix(python): add support for pyarrow 13+ by @ion-elgreco in #1804
- feat(python): allow python objects to be passed as new values in
.update()
by @ion-elgreco in #1749 - feat(python): allow for multiple
when
calls in MERGE operation by @ion-elgreco in #1750
Bug fixes
- fix: ignore binary columns for stats generation by @emcake in #1766
- fix(python): add write support explicitly for pyarrow dataset by @ion-elgreco in #1780
- fix(python): ignore infinity in stats by @wjones127 in #1784
- fix: delta scan partition ordering bug by @Blajda in #1789
Other changes
- fix: relax
pyarrow
pin by @dhirschfeld in #1743 - fix: remove
pandas
pin by @dhirschfeld in #1746 - refactor!: update operations to use delta scan by @Blajda in #1639
- chore: update datafusion by @roeap in #1741
- docs: convert docs to use mkdocs by @r3stl355 in #1731
- docs: dynamodb lock configuration by @brayanjuls in #1752
- refactor: perform bulk deletes during metadata cleanup by @cmackenzie1 in #1763
- docs: enhance docs to enable multi-lingual examples by @r3stl355 in #1781
- chore: refactor into the deltalake meta crate and deltalake-core crates by @rtyler in #1774
- feat: add deltalake sql crate by @roeap in #1757
- feat: initial table features implementation by @hntd187 in #1796
- docs: add CI for docs by @r3stl355 in #1798
New Contributors
- @dhirschfeld made their first contribution in #1743
- @brayanjuls made their first contribution in #1752
- @emcake made their first contribution in #1766
- @hntd187 made their first contribution in #1796
Full Changelog: python-v0.12.0...python-v0.13.0
python-v0.12.0: Delete, Update, and Merge
What's Changed
New features
- feat: allow to set large dtypes for the schema check in
write_deltalake
by @ion-elgreco in #1668 - feat(python): expose delete operation by @guilhem-dvr in #1687
- feat(python): expose UPDATE operation by @ion-elgreco in #1694
- feat(python): expose MERGE operation by @ion-elgreco in #1685
- feat: add version number in
.history()
and display in reversed chronological order by @ion-elgreco in #1710
Bug fixes
- fix: exception string in writer.py by @sebdiem in #1665
- fix: change partitioning schema from large to normal string for pyarrow<12 by @ion-elgreco in #1671
- fix: use epoch instead of ce for date stats by @universalmind303 in #1672
- fix: unify environment variables referenced by Databricks docs by @rtyler in #1673
- fix!: ensure predicates are parsable by @Blajda in #1690
- fix: merge operation with string predicates by @Blajda in #1705
- fix: reorder encode_partition_value() checks and add tests by @ldacey in #1733
Other contributions
- perf: improve read performance by 7x with prebuffer by @ion-elgreco in #1709
- docs: small consistency update in guide and readme by @ion-elgreco in #1666
- docs: fix typo in readme by @JosiahParry in #1696
- docs: add Python API reference to mkdocs by @wjones127 in #1563
- docs(python): document the delete operation by @guilhem-dvr in #1704
- docs: add a write example to delta.rs by @r3stl355 in #1711
- chore: remove deprecated functions by @wjones127 in #1735
Breaking changes
The DeltaTable.history()
method now returns transactions in reverse chronological order. This matches the Spark implementation.
DeltaTable.files_by_partitions()
has been removed. It has been deprecated since 0.7.0. Use DeltaTable.file_uris()
instead.
DeltaTable.pyarrow_schema()
has been removed. it has been deprecated since 0.7.0. Use DeltaTable.schema().to_pyarrow()
instead.
New Contributors
- @sebdiem made their first contribution in #1665
- @universalmind303 made their first contribution in #1672
- @JosiahParry made their first contribution in #1696
- @r3stl355 made their first contribution in #1711
- @ldacey made their first contribution in #1733
Full Changelog: python-v0.11.0...python-v0.12.0
rust-v0.16.0
Implemented enhancements:
- Expose Optimize option min_commit_interval in Python #1640
- Expose create_checkpoint_for #1513
- integration tests regularly fail for HDFS #1428
- Add Support for Microsoft OneLake #1418
- add support for atomic rename in R2 #1356
Fixed bugs:
- Writing with large arrow types (e.g. large_utf8), writes wrong partition encoding #1669
- [python] Different stringification of partition values in reader and writer #1653
- Unable to interface with data written from Spark Databricks #1651
get_last_checkpoint
does some unnecessary listing #1643PartitionWriter
'sbuffer_len
doesn't include incomplete row groups #1637- Slack community invite link has expired #1636
- delta-rs does not appear to support tables with liquid clustering #1626
- Internal Parquet panic when using a Map type. #1619
- partition_by with "$" on local filesystem #1591
- ProtocolChanged error when perfoming append write #1585
- Unable to
cargo update
using git tag or rev on Rust 1.70 #1580 - NoMetadata error when reading detlatable #1562
- Cannot read delta table:
Delta protocol violation
#1557 - Update the CODEOWNERS to capture the current reviewers and contributors #1553
- [Python] Incorrect file URIs when partition values contain escape character #1533
- add documentation how to Query Delta natively from datafusion #1485
- Python: write_deltalake to ADLS Gen2 issue #1456
- Partition values that have been url encoded cannot be read when using deltalake #1446
- Error optimizing large table #1419
- Cannot read partitions with special characters (including space) with pyarrow >= 11 #1393
- ImportError: deltalake/_internal.abi3.so: cannot allocate memory in static TLS block #1380
- Invalid JSON in log record missing field
schemaString
for DLT tables #1302 - Special characters in partition path not handled locally #1299
Merged pull requests:
- chore: bump rust crate version #1675 (rtyler)
- fix: change partitioning schema from large to normal string for pyarrow<12 #1671 (ion-elgreco)
- feat: allow to set large dtypes for the schema check in
write_deltalake
#1668 (ion-elgreco) - docs: small consistency update in guide and readme #1666 (ion-elgreco)
- fix: exception string in writer.py #1665 (sebdiem)
- chore: increment python library version #1664 (wjones127)
- docs: fix some typos #1662 (ion-elgreco)
- fix: more consistent handling of partition values and file paths #1661 (roeap)
- docs: add docstring to protocol method #1660 (MrPowers)
- docs: make docs.rs build docs with all features enabled #1658 (simonvandel)
- fix: enable offset listing for s3 #1654 (eeroel)
- chore: fix the incorrect Slack link in our readme #1649 (rtyler)
- fix: compensate for invalid log files created by Delta Live Tables #1647 (rtyler)
- chore: proposed updated CODEOWNERS to allow better review notifications #1646 (rtyler)
- feat: expose min_commit_interval to
optimize.compact
andoptimize.z_order
#1645 (ion-elgreco) - fix: avoid excess listing of log files #1644 (eeroel)
- fix: introduce support for Microsoft OneLake #1642 (rtyler)
- fix: explicitly require chrono 0.4.31 or greater #1641 (rtyler)
- fix: include in-progress row group when calculating in-memory buffer length #1638 (BnMcG)
- chore: relax chrono pin to 0.4 #1635 (houqp)
- chore: update datafusion to 31, arrow to 46 and object_store to 0.7 #1634 (houqp)
- docs: update Readme #1633 (dennyglee)
- chore: pin the chrono dependency #1631 (rtyler)
- feat: pass known file sizes to filesystem in Python #1630 (eeroel)
- feat: implement parsing for the new
domainMetadata
actions in the commit log #1629 (rtyler) - ci: fix python release #1624 (wjones127)
- ci: extend azure timeout #1622 (wjones127)
- feat: allow multiple incremental commits in optimize #1621 (kvap)
- fix: change map nullable value to false #1620 (cmackenzie1)
- Introduce the changelog for the last couple releases #1617 (rtyler)
- chore: bump python version to 0.10.2 #1616 (wjones127)
- perf: avoid holding GIL in DeltaFileSystemHandler #1615 (wjones127)
- fix: don't re-encode paths #1613 (wjones127)
- feat: use url parsing from object store #1592 (roeap)
- feat: buffered reading of transaction logs #1549 (eeroel)
- feat: merge operation #1522 (Blajda)
- feat: expose create_checkpoint_for to the public #1514 (haruband)
- docs: update Readme #1440 (roeap)
- refactor: re-organize top level modules #1434 (roeap)
- feat: integrate unity catalog with datafusion #1338 (roeap)
python-v0.11.0
What's Changed
New Features
- feat: expose min_commit_interval to
optimize.compact
andoptimize.z_order
by @ion-elgreco in #1645 - feat: allow multiple incremental commits in optimize by @kvap in #1621
- feat: introduce support for Microsoft OneLake by @rtyler in #1642
Performance Improvements
- feat: pass known file sizes to filesystem in Python by @eeroel in #1630
- fix: avoid excess listing of log files by @eeroel in #1644
- fix: enable offset listing for s3 by @eeroel in #1654
Other
- chore: update datafusion to 31, arrow to 46 and object_store to 0.7 by @houqp in #1634
- feat: implement parsing for the new
domainMetadata
actions in the commit log by @rtyler in #1629 - feat: integrate unity catalog with datafusion by @roeap in #1338
- fix: compensate for invalid log files created by Delta Live Tables by @rtyler in #1647
- docs: add docstring to protocol method by @MrPowers in #1660
- docs: fix some typos by @ion-elgreco in #1662
- feat: use url parsing from object store by @roeap in #1592
- chore: proposed updated CODEOWNERS to allow better review notifications by @rtyler in #1646
- fix: more consistent handling of partition values and file paths by @roeap in #1661
New Contributors
Full Changelog: python-v0.10.2...python-v0.11.0
python-v0.10.2
What's Changed
New features
- feat: add restore command in python binding by @loleek in #1529
- feat: buffered reading of transaction logs by @eeroel in #1549
Bug fixes
- fix: correct whitespace in delta protocol reader minimum version error message by @polynomialherder in #1576
- fix: just make pyarrow 12 the max by @wjones127 in #1603
- fix: support partial statistics in JSON by @CurtHagenlocher in #1599
- perf: avoid holding GIL in DeltaFileSystemHandler by @wjones127 in #1615
- fix: change map nullable value to false by @cmackenzie1 in #1620
- fix: don't re-encode paths by @wjones127 in #1613
Other
- ci: don't run benchmark in debug mode by @wjones127 in #1566
- chore: update
datafusion
to28
and arrow to43
by @cmackenzie1 in #1571 - chore: move deps to
[workspace.dependencies]
by @cmackenzie1 in #1575 - fix: remove alpha classifier by @marcelotrevisani in #1578
- refactor: use pa.table.cast in delta_arrow_schema_from_pandas by @ion-elgreco in #1573
- feat: add metadata for operations::write::WriteBuilder by @abhimanyusinghgaur in #1584
- feat: add metadata for deletion vectors by @aersam in #1583
- refactor: clean up arrow schema defs by @polynomialherder in #1590
- fix: update python test by @wjones127 in #1608
- chore: update datafusion to 30, arrow to 45 by @scsmithr in #1606
- chore: bump python version to 0.10.2 by @wjones127 in #1616
- ci: extend azure timeout by @wjones127 in #1622
- ci: fix python release by @wjones127 in #1624
New Contributors
- @polynomialherder made their first contribution in #1576
- @marcelotrevisani made their first contribution in #1578
- @ion-elgreco made their first contribution in #1573
- @aersam made their first contribution in #1583
- @CurtHagenlocher made their first contribution in #1599
- @scsmithr made their first contribution in #1606
- @eeroel made their first contribution in #1549
Full Changelog: python-v0.10.1...python-v0.10.2
rust-v0.14.0
What's Changed
- fix: revert premature merge of an attempted fix for binary column statistics by @rtyler in #1544
- chore: disable incremental builds in CI for saving space by @rtyler in #1545
- chore: address some integration test bloat of disk usage for development by @rtyler in #1552
- chore: increment python version by @wjones127 in #1542
- docs: port docs to mkdocs by @MrPowers in #1548
- ci: install newer rust for macos python release by @wjones127 in #1565
- feat!: bulk delete for vacuum by @Blajda in #1556
- feat: make find_files public by @yjshen in #1560
- ci: don't run benchmark in debug mode by @wjones127 in #1566
- chore: update
datafusion
to28
and arrow to43
by @cmackenzie1 in #1571 - chore: move deps to
[workspace.dependencies]
by @cmackenzie1 in #1575 - fix: correct whitespace in delta protocol reader minimum version error message by @polynomialherder in #1576
- feat: add restore command in python binding by @loleek in #1529
New Contributors
- @yjshen made their first contribution in #1560
- @polynomialherder made their first contribution in #1576
Full Changelog: rust-v0.13.0...rust-v0.14.0
python-v0.10.1
What's Changed
New features
- feat: handle larger z-order jobs with streaming output and spilling by @wjones127 in #1461
- feat: implement restore operation by @loleek in #1502
- feat!: bulk delete for vacuum by @Blajda in #1556
Fixes
- fix(python): match Field signatures by @guilhem-dvr in #1463
- fix: add
sizeInBytes
to _last_checkpoint and changesize
to # of actions by @cmackenzie1 in #1477 - fix: tiny typo in AggregatedStats by @haruband in #1516
- fix: handle nulls in file-level stats by @wjones127 in #1520
Other
- docs: show data catalog options in Python API reference by @omkar-foss in #1532
- chore: fix mypy failure by @wjones127 in #1500
- chore: increment python version by @wjones127 in #1542
- ci: install newer rust for macos python release by @wjones127 in #1565
New Contributors
- @guilhem-dvr made their first contribution in #1463
- @haruband made their first contribution in #1516
- @omkar-foss made their first contribution in #1532
Full Changelog: python-v0.10.0...python-v0.10.1
rust-v0.13.0
Implemented enhancements:
- Add nested struct supports #1518
- Support FixedLenByteArray UUID statistics as a logical scalar #1483
- Exposing create_add in the API #1458
- Update features table on README #1404
- docs(python): show data catalog options in Python API reference #1347
- Add optimization to only list log files starting at a certain name #1252
- Support configuring parquet compression #1235
- parallel processing in Optimize command #1171
Fixed bugs:
- get_add_actions() MAX is not showing complete value #1534
- Can't get stats's minValues in add actions #1515
- Pyarrow is_null filter not working as expected after loading using deltalake #1496
- Can't write to table that uses generated columns #1495
- Json error: Binary is not supported by JSON when writing checkpoint files #1493
- _last_checkpoint size field is incorrect #1468
- Error when Z Ordering a larger dataset #1459
- Timestamp parsing issue #1455
- File options are ignored when writing delta #1444
- Slack Invite Link No Longer Valid #1425
cleanup_metadata
doesn't remove.checkpoint.parquet
files #1420- The test of reading the data from the blob storage located in Azurite container failed #1415
- The test of reading the data from the bucket located in Minio container failed #1408
- Datafusion: unreachable code reached when parsing statistics with missing columns #1374
- vacuum is very slow on Cloudflare R2 #1366
Closed issues:
- Expose Compression Options or WriterProperties for writing to Delta #1469
- Support out-of-core Z-order using DataFusion #1460
- Expose Z-order in Python #1442
Merged pull requests:
- chore: fix the latest clippy warnings with the newer rustc's #1536 (rtyler)
- docs: show data catalog options in Python API reference #1532 (omkar-foss)
- fix: handle nulls in file-level stats #1520 (wjones127)
- feat: add nested struct supports #1519 (haruband)
- fix: tiny typo in AggregatedStats #1516 (haruband)
- refactor: unify with_predicate for delete ops #1512 (Blajda)
- chore: remove deprecated table functions #1511 (roeap)
- chore: update datafusion and related crates #1504 (roeap)
- feat: implement restore operation #1502 (loleek)
- chore: fix mypy failure #1500 (wjones127)
- fix: avoid writing statistics for binary columns to fix JSON error #1498 (ChewingGlass)
- feat(rust): expose WriterProperties method on RecordBatchWriter and DeltaWriter #1497 (theelderbeever)
- feat: add UUID statistics handling #1484 (atefsaw)
- feat: expose create_add to the public #1482 (atefsaw)
- fix: add
sizeInBytes
to _last_checkpoint and changesize
to # of actions #1477 (cmackenzie1) - fix(python): match Field signatures #1463 (guilhem-dvr)
- feat: handle larger z-order jobs with streaming output and spilling #1461 (wjones127)
- chore: increment python version #1449 (wjones127)
- chore: upgrade to arrow 40 and datafusion 26 #1448 (rtyler)
- feat(python): expose z-order in Python #1443 (wjones127)
- ci: prune CI/CD pipelines #1433 (roeap)
- refactor: remove
LoadCheckpointError
andApplyLogError
#1432 (roeap) - feat: update writers to include compression method in file name #1431 (Blajda)
- refactor: move checkpoint and errors into separate module #1430 (roeap)
- feat: add z-order optimize #1429 (wjones127)
- fix: casting when data to be written does not match table schema #1427 (Blajda)
- docs: update README.adoc to fix expired Slack link #1426 (dennyglee)
- chore: remove no-longer-necessary build.rs for Rust bindings #1424 (rtyler)
- chore: remove the delta-checkpoint lambda which I have moved to a new repo #1423 (rtyler)
- refactor: rewrite redundant_async_block #1422 (cmackenzie1)
- fix: update cleanup regex to include
checkpoint.parquet
files #1421 (cmackenzie1) - docs: update features table in README #1414 (ognis1205)
- fix:
get_prune_stats
returns homogenousArrayRef
#1413 (cmackenzie1) - feat: explicit python exceptions #1409 (roeap)
- feat: implement update operation #1390 (Blajda)
- feat: allow concurrent file compaction #1383 (wjones127)
python-v0.10.0: Z-order, faster optimize and vacuum
What's Changed
- feat(python): expose z-order in Python by @wjones127 in #1443
- feat: add z-order optimize by @wjones127 in #1429
- feat(python): add filters argument to DeltaTable.to_pandas() for filter pushdown by @ognis1205 in #1349
- feat: add datafusion storage catalog by @roeap in #1381
- feat: allow concurrent file compaction by @wjones127 in #1383
- feat: vacuum with concurrent requests by @wjones127 in #1382
- feat: more efficient parquet writer and more statistics by @wjones127 in #1397
- feat: explicit python exceptions by @roeap in #1409
- feat: update writers to include compression method in file name by @Blajda in #1431
- fix: include stats for all columns (#1223) by @mrjoe7 in #1342
- fix: add py.typed marker by @SchutteJan in #1350
- fix: allow user defined config keys by @roeap in #1365
- fix: add conversion for string for
Field::TimestampMicros
(#1372) by @cmackenzie1 in #1373 - perf: improve record batch partitioning by @roeap in #1396
- chore: type-check friendlier exports by @roeap in #1407
New Contributors
- @SchutteJan made their first contribution in #1350
- @cmackenzie1 made their first contribution in #1373
- @rahulj51 made their first contribution in #1377
Full Changelog: python-v0.9.0...python-v0.10.0
rust-v0.12.0
Boy howdy there's some great looking performance improvements in this…