Releases: Eventual-Inc/Daft
Releases · Eventual-Inc/Daft
v0.2.12
Changes
👾 Bug Fixes
- [BUG] bugfix for empty partitions when writing out empty partitions @samster25 (#1814)
v0.2.11
Changes
✨ New Features
- [FEAT] Support Hive-Style Partitioned Writes for Tabular Writes @samster25 (#1794)
👾 Bug Fixes
- [BUG] Fix scheduler deadlock on concurrent broadcast joins. @clarkzinzow (#1812)
- [BUG] Fix type annotation on UDF @jaychia (#1807)
- [BUG] Materialize Dataframes created from file writes @colin-ho (#1785)
- [BUG] Materialize Dataframes created from in-memory data @colin-ho (#1780)
📖 Documentation
- [DOCS] Add warning during repartition to use into_partitions instead @jaychia (#1808)
- [BUG] Fix type annotation on UDF @jaychia (#1807)
- [DOCS] Update README.rst to remove beta disclaimer @jaychia (#1802)
- [CHORE] Update docs to reflect materialized Dataframes from writes and in-memory reads @colin-ho (#1795)
- [DOCS] Upgrade version of docs sphinx-book-theme dependency @jaychia (#1789)
- [DOCS] Fix notebooks to use new public parquet file @jaychia (#1788)
- [DOCS] Fix docs build for sphinxcontrib-applehelp versioning @jaychia (#1787)
- [DOCS] Update README.rst for broken links @jaychia (#1786)
- [CHORE] Update tutorials to use released version of Daft @jaychia (#1751)
🧰 Maintenance
- [CHORE] Update docs to reflect materialized Dataframes from writes and in-memory reads @colin-ho (#1795)
- [CHORE] Update tutorials to use released version of Daft @jaychia (#1751)
⬆️ Dependencies
- Bump actions/cache from 3 to 4 @dependabot (#1805)
v0.2.10
Changes
✨ New Features
- [FEAT] Add getter for Struct and List expressions @kevinzwang (#1775)
- [FEAT] Iceberg Murmur3 Hash function @samster25 (#1778)
- [FEAT] Not_Null Expression @colin-ho (#1777)
- [FEAT] Add sample function for Dataframe @colin-ho (#1770)
🚀 Performance Improvements
- [PERF] Iceberg Truncate Transform @samster25 (#1783)
- [PERF] Iceberg Hash Bucket Transform @samster25 (#1779)
👾 Bug Fixes
- [BUG] Invalidate PartitionSpec when we run Explode on it @samster25 (#1772)
📖 Documentation
- [CHORE] Add sample to docs @colin-ho (#1781)
- [CHORE] Add not_null to docs @colin-ho (#1782)
- [FEAT] Add getter for Struct and List expressions @kevinzwang (#1775)
- [DOCS] Fix broken links on readme @jaychia (#1774)
- [DOCS] Add documentation for read_iceberg @jaychia (#1769)
- [DOCS] Documentation reorganization @jaychia (#1762)
🧰 Maintenance
v0.2.9
v0.2.8
Changes
✨ New Features
- [PERF] Iceberg Partition Pruning @samster25 (#1688)
- [FEAT] annotate ray tasks with name of instructions @samster25 (#1729)
🚀 Performance Improvements
- [PERF] Iceberg Partition Pruning @samster25 (#1688)
- [PERF] Speed up CSV Reader with SIMD and reduced allocations @samster25 (#1749)
- [PERF] Greatly speed up Variable Length Concat @samster25 (#1748)
- [PERF] Predicate Pushdown into Scan Operator @samster25 (#1730)
- [PERF] Json Predicate Pushdown while reading @samster25 (#1727)
- [PERF] Predicate Pushdown for CSV Reader @samster25 (#1724)
👾 Bug Fixes
- [BUG] Concat Fix when Variable Length Array is sliced @samster25 (#1750)
- [BUG] bugfix when cluster has no workers and key error happens when fetching num cores @samster25 (#1745)
- [BUG] Fix comparing date and timestamps @samster25 (#1735)
- [BUG] Apply the default IOConfig in daft.from_glob_path @jaychia (#1731)
- [BUG] [Hotfix] Fix limit pushdown test. @clarkzinzow (#1728)
📖 Documentation
- Revert "[DOCS] Add proper robots.txt and sitemap.xml to index only latest and stable" @jaychia (#1753)
- [DOCS] Add proper robots.txt and sitemap.xml to index only latest and stable @jaychia (#1752)
- [DOCS] Add documentation on memory @jaychia (#1736)
- [DOCS] Add anonymous io_config for notebook @jaychia (#1721)
🧰 Maintenance
- [CHORE] kernel override for notebook checker @samster25 (#1746)
- [CHORE] Clean up Repr for GlobScanOperator and Explain @samster25 (#1734)
- [CHORE] Generate S3 manifests @samster25 (#1732)
- [CHORE] update dev version to 0.2.0 dev @samster25 (#1723)
v0.2.7
Changes
✨ New Features
- [FEAT] Add ability to set global IOConfig @jaychia (#1710)
- [FEAT] [Join Optimizations] Add broadcast join. @clarkzinzow (#1706)
- [FEAT] Propagate configs to Ray remote functions @jaychia (#1707)
- [FEAT] [JSON Reader] Add native streaming + parallel JSON reader. @clarkzinzow (#1679)
🚀 Performance Improvements
- [PERF] Enable Predicates in Parquet Reader @samster25 (#1702)
👾 Bug Fixes
- [BUG] [Hotfix] [Join Optimization] Fix pre-partitioned check for larger side of join. @clarkzinzow (#1718)
- [BUG] Fix set_config logic so it can be called after call to set runner @jaychia (#1709)
- [BUG] Propagate URL download expressions max_connections to S3Config @jaychia (#1708)
📖 Documentation
v0.2.6
Changes
✨ New Features
- [FEAT] Add smart planning of ScanTasks starting with merging by filesizes @jaychia (#1692)
- [FEAT] Enable Comparison between timestamp / dates @samster25 (#1689)
- [FEAT] Enable MicroPartitions by default @jaychia (#1684)
- [FEAT] Temporal Literals for Date and Timestamp @samster25 (#1683)
- [FEAT] Partitioning exprs for Iceberg @samster25 (#1680)
👾 Bug Fixes
- [BUG] Use schema_hints as hints instead of definitive schema @colin-ho (#1636)
- [BUG] Allow for use of Ray jobs for benchmarking @jaychia (#1690)
- [BUG] fix off by 1 for retries for cred provider @samster25 (#1681)
🧰 Maintenance
- [CHORE] bump gcs and s3fs @samster25 (#1699)
- [CHORE] Add warmup step for remote tpch benchmarking @jaychia (#1691)
- [CHORE] drop s3 compat mode for gcs for anonymous mode @samster25 (#1682)
- [CHORE] Remove usage of credentials in workflows @jaychia (#1686)
- [CHORE] Iceberg Image Caching @samster25 (#1687)
- [CHORE] Bump Iceberg Version and V1 of caching @samster25 (#1685)
⬆️ Dependencies
- Bump globset from 0.4.13 to 0.4.14 @dependabot (#1694)
- Bump libc from 0.2.149 to 0.2.150 @dependabot (#1693)
- Bump google-github-actions/auth from 1 to 2 @dependabot (#1698)
v0.2.5
Changes
👾 Bug Fixes
- [BUG] Check queue state while waiting to place inside @samster25 (#1678)
- [BUG] Parametrize dataframe unit-tests with Parquet data @jaychia (#1610)
🧰 Maintenance
- [CHORE] Favor traversal over visitors @samster25 (#1677)
- [CHORE] Bring in TreeNode and Refactor Expression Traversal to use TreeNode @samster25 (#1676)
⬆️ Dependencies
- Bump indexmap from 2.0.2 to 2.1.0 @dependabot (#1669)
v0.2.4
Changes
✨ New Features
- [FEAT] show number of truncated columns @samster25 (#1673)
- [FEAT] add retries to s3 credential provider timeouts @samster25 (#1663)
- [FEAT] Dynamic Responsive Printing of Tables, Schema and Series @samster25 (#1662)
- [FEAT] Print the results of a df.show() to stdout if running in non-interactive mode @jaychia (#1655)
- [FEAT] 1606 - Adding hour expression in date util @suriya-ganesh (#1637)
- [FEAT] [CSV Reader] Bulk CSV reader + general CSV reader refactor @clarkzinzow (#1614)
- [FEAT] Use cached preview from
df.collect()
indf.show()
. @clarkzinzow (#1651)
🚀 Performance Improvements
👾 Bug Fixes
- [BUG] Add an allowlist of DataTypes that ColumnRangeStatistics supports and validation of TableStatistics @jaychia (#1632)
- [BUG] favor char indices instead of slicing to deal with unicode @samster25 (#1664)
- [BUG] pass in pyarrow dtype manually into parquet read @samster25 (#1650)
- [CHORE] Fixed bug in ray version @dioptre (#1649)
🧰 Maintenance
- [CHORE] pin pandas for 3.8 @samster25 (#1661)
- [CHORE] pin ray to 2.7.1 if less than 3.8 @samster25 (#1657)
- [CHORE] enable refresh on tqdm total updates @samster25 (#1654)
⬆️ Dependencies
8 changes
- Bump chrono-tz from 0.8.3 to 0.8.4 @dependabot (#1670)
- Bump pytest from 7.4.1 to 7.4.3 @dependabot (#1644)
- Bump pandas from 2.0.3 to 2.1.3 @dependabot (#1643)
- Bump azure-storage-blob from 12.17.0 to 12.19.0 @dependabot (#1645)
- Bump async-compression from 0.4.4 to 0.4.5 @dependabot (#1638)
- Bump serde_json from 1.0.107 to 1.0.108 @dependabot (#1639)
- Bump base64 from 0.21.4 to 0.21.5 @dependabot (#1640)
- Bump dyn-clone from 1.0.14 to 1.0.16 @dependabot (#1642)
v0.2.3
Changes
✨ New Features
- Enabling quote, comment and escape character @suriya-ganesh (#1582)
- [FEAT] Iceberg Scan Operator @samster25 (#1561)
- [FEAT] Enable Progress Bars for PyRunner and RayRunner @samster25 (#1609)
👾 Bug Fixes
- [BUG] Fix CSV roundtrip for decimals (actually an f64->decimal casting bug) @jaychia (#1626)
- [BUG] Filter out size-0 directory marker files during s3 globs @jaychia (#1629)
- [BUG] raise error if non valid parquet file (less than parquet footer size) @samster25 (#1628)
- [BUG] Fix parquet timestamp tz roundtrip inference @jaychia (#1625)
- [BUG] Roundtrip tests for CSVs and Parquet @jaychia (#1616)
- [BUG] Self-concat breaks with the RayRunner @jaychia (#1617)
- [BUG] Add better handling for case where glob of parquet files returns empty @jaychia (#1615)
- [BUG] enable fixed size binary ingest to daft binary @samster25 (#1612)
- [BUG] Manually specify region in tutorial read_json @jaychia (#1608)
- [BUG] remove f strings from logging @samster25 (#1611)
📖 Documentation
🧰 Maintenance
- [CHORE] Fix style lints from #1582 @jaychia (#1635)
- [CHORE] add ray client to deps @samster25 (#1631)
- [CHORE] update fsspecs (s3, gcs, aldfs) in lockstep @samster25 (#1620)
- [CHORE] update azure storage blobs to 0.17.0 @samster25 (#1622)
- [CHORE] delete old rule runners @samster25 (#1619)
- [CHORE] drop ray default dep to make room for Pydantic > 2.0 @samster25 (#1618)
⬆️ Dependencies
- Bump moonrepo/setup-rust from 0 to 1 @dependabot (#1237)
- Bump google-cloud-storage from 0.13.1 to 0.14.0 @dependabot (#1549)
- Bump async-compat from 0.2.2 to 0.2.3 @dependabot (#1567)