diff --git a/README.adoc b/README.adoc deleted file mode 100644 index 7fa57e8f61..0000000000 --- a/README.adoc +++ /dev/null @@ -1,104 +0,0 @@ -:toc: macro - -= delta-rs - -image:https://github.com/delta-io/delta-rs/workflows/build/badge.svg[Build Status,link=https://github.com/delta-io/delta-rs/actions] -image:https://img.shields.io/crates/v/deltalake.svg?style=flat-square[Crate,link=https://crates.io/crates/deltalake] -image:https://img.shields.io/badge/docs-rust-blue.svg?style=flat-square[Docs,link=https://docs.rs/deltalake] -image:https://img.shields.io/pypi/v/deltalake.svg?style=flat-square[Python binding,link=https://pypi.org/project/deltalake] -image:https://img.shields.io/pypi/dm/deltalake?style=flat-square[PyPI - Downloads,link=https://pypi.org/project/deltalake] -image:https://img.shields.io/badge/docs-python-blue.svg?style=flat-square[Docs,link=https://delta-io.github.io/delta-rs/python] - -image::logo.png[Delta-rs logo] -A native interface to -link:https://delta.io[Delta Lake]. - -toc::[] - -== About - -This library provides low level access to Delta tables in Rust, which can be -used with data processing frameworks like -link:https://github.com/apache/arrow-datafusion[datafusion], -link:https://github.com/apache/arrow-datafusion/tree/master/ballista[ballista], -link:https://github.com/pola-rs/polars[polars], -link:https://github.com/rajasekarv/vega[vega], etc. It also provides bindings to other higher level language link:https://delta-io.github.io/delta-rs/python/[Python]. - -=== Features - -**Supported backends:** - -* Local file system -* AWS S3 -* Azure Blob Storage / Azure Datalake Storage Gen2 -* Google Cloud Storage -* HDFS - -.Support features -|=== -| Operation/Feature | Rust | Python - -| Read table -| :heavy_check_mark: -| :heavy_check_mark: - -| Stream table update -| :heavy_check_mark: -| :heavy_check_mark: - -| Filter files with partitions -| :heavy_check_mark: -| :heavy_check_mark: - -| Vacuum (delete stale files) -| :heavy_check_mark: -| :heavy_check_mark: - -| History -| :heavy_check_mark: -| :heavy_check_mark: - -| Write transactions -| :heavy_check_mark: -| - -| Checkpoint creation -| :heavy_check_mark: -| :heavy_check_mark: - -| High-level file writer -| -| :heavy_check_mark: - -| Optimize -| :heavy_check_mark: -| :heavy_check_mark: - -|=== - - -== Get Involved - -Join link:https://go.delta.io/slack[#delta-rs in the Delta Lake Slack workspace] - -=== Development Meeting - -We have a standing development sync meeting for those that are interested. The meeting is held every two weeks at **9am PST** on Tuesday mornings. The direct meeting URL is shared in the Slack channel above :point_up: before the meeting. - -These meetings are also link:https://go.delta.io/youtube[streamed live via YouTube] if you just want to listen in. - -=== Development - -delta-rs requires the Rust compiler, which can be installed with the -link:https://rustup.rs/[rustup] -command. - -Running tests can be done with `cargo test` in the root directory, or one of the directories below: - -=== Rust - -The `rust/` directory contains core Rust APIs for accessing Delta Lake from Rust, or for higher-level language bindings. - -=== Python - -The `python/` directory contains the `deltalake` Python package built on top of delta-rs diff --git a/README.md b/README.md new file mode 100644 index 0000000000..ffaa426150 --- /dev/null +++ b/README.md @@ -0,0 +1,187 @@ +

+ + delta-rs logo + +

+

+ A native Rust library for Delta Lake, with bindings into Python +
+ Python docs + · + Rust docs + · + Report a bug + · + Request a feature + · + Roadmap +
+
+ + Deltalake + + + + + + Crate + + + Deltalake + + + Deltalake + + + #delta-rs in the Delta Lake Slack workspace + +

+ +The Delta Lake project aims to unlock the power of the Deltalake for as many users and projects as possible +by providing native low level APIs aimed at developers and integrators, as well as a high level operations +API that lets you query, inspect, and operate your Delta Lake with ease. + +| Source | Downloads | Installation Command | Docs | +| --------------------- | --------------------------------- | ----------------------- | --------------- | +| **[PyPi][pypi]** | [![Downloads][pypi-dl]][pypi] | `pip install deltalake` | [Docs][py-docs] | +| **[Crates.io][pypi]** | [![Downloads][crates-dl]][crates] | `cargo add deltalake` | [Docs][rs-docs] | + +[pypi]: https://pypi.org/project/deltalake/ +[pypi-dl]: https://img.shields.io/pypi/dm/deltalake?style=flat-square&color=00ADD4 +[py-docs]: https://delta-io.github.io/delta-rs/python/ +[rs-docs]: https://docs.rs/deltalake/latest/deltalake/ +[crates]: https://crates.io/crates/deltalake +[crates-dl]: https://img.shields.io/crates/d/deltalake?color=F75101 + +## Table of contents + +- [Quick Start](#quick-start) +- [Get Involved](#get-involved) +- [Integartions](#integrations) +- [Features](#features) + +## Quick Start + +The `deltalake` library aim to adopt familiar patterns from other libraries in data processing, +so getting started should look famililiar. + +```py3 +from deltalake import DeltaTable +from deltalake.write import write_deltalake +import pandas as pd + +# write some data into a delta table +df = pd.DataFrame({"id": [1, 2], "value": ["foo", "boo"]}) +write_deltalake("./data/delta", df) + +# load data from delta table +dt = DeltaTable("./data/delta") +df2 = dt.to_pandas() + +assert df == df2 +``` + +The same table written can also be loaded using the core Rust crate: + +```rs +use deltalake::{open_table, DeltaTableError}; + +#[tokio::main] +async fn main() -> Result<(), DeltaTableError> { + // open the table written in python + let table = open_table("./data/delta").await?; + + // show all active files in the table + let files = table.get_files(); + println!("{files}"); + + Ok(()) +} +``` + +## Get Involved + +We encourage you to reach out, and are [commited](https://github.com/delta-io/delta-rs/blob/main/CODE_OF_CONDUCT.md) +to provide a welcoming community. + +- [Join us in our Slack workspace](https://go.delta.io/slack) +- [Report an issue](https://github.com/delta-io/delta-rs/issues/new?template=bug_report.md) +- Looking to contribute? See our [good first issues](https://github.com/delta-io/delta-rs/contribute). + +## Integrations + +Libraries and framewors that interoperate with delta-rs - in alphabetical order. + +- [AWS SDK for Pandas](https://github.com/aws/aws-sdk-pandas) +- [ballista][ballista] +- [datafusion][datafusion] +- [Dask](https://github.com/dask-contrib/dask-deltatable) +- [datahub](https://datahubproject.io/) +- [DuckDB](https://duckdb.org/) +- [polars](https://www.pola.rs/) +- [Ray](https://github.com/delta-incubator/deltaray) + +## Features + +The following section outline some core features like supported [storage backends](#cloud-integrations) +and [operations](#supported-operations) that can be performed against tables. The state of implementation +of features outlined in the Delta [protocol][protocol] is also [tracked](#protocol-support-level). + +### Cloud Integrations + +| Storage | Rust | Python | Comment | +| -------------------- | :-------------------: | :-------------------: | ----------------------------------- | +| Local | ![done] | ![done] | | +| S3 - AWS | ![done] | ![done] | requires lock for concurrent writes | +| S3 - MinIO | ![done] | ![done] | requires lock for concurrent writes | +| S3 - R2 | ![done] | ![done] | requires lock for concurrent writes | +| Azure Blob | ![done] | ![done] | | +| Azure ADLS Gen2 | ![done] | ![done] | | +| Micorosft OneLake | [![open]][onelake-rs] | [![open]][onelake-rs] | | +| Google Cloud Storage | ![done] | ![done] | | + +### Supported Operations + +| Operation | Rust | Python | Description | +| --------------------- | :-----------------: | :-----------------: | ------------------------------------- | +| Create | ![done] | ![done] | Create a new table | +| Read | ![done] | ![done] | Read data from a table | +| Vacuum | ![done] | ![done] | Remove unused files and log entries | +| Delete - partitions | | ![done] | Delete a table partition | +| Delete - predicates | ![done] | | Delete data based on a predicate | +| Optimize - compaction | ![done] | ![done] | Harmonize the size of data file | +| Optimize - Z-order | ![done] | ![done] | Place similar data into the same file | +| Merge | [![open]][merge-rs] | [![open]][merge-py] | | +| FS check | ![done] | | Remove corrupted files from table | + +### Protocol Support Level + +| Writer Version | Requirement | Status | +| -------------- | --------------------------------------------- | :------------------: | +| Version 2 | Append Only Tables | [![open]][roadmap] | +| Version 2 | Column Invariants | ![done] | +| Version 3 | Enforce `delta.checkpoint.writeStatsAsJson` | [![open]][writer-rs] | +| Version 3 | Enforce `delta.checkpoint.writeStatsAsStruct` | [![open]][writer-rs] | +| Version 3 | CHECK constraints | [![open]][writer-rs] | +| Version 4 | Change Data Feed | | +| Version 4 | Generated Columns | | +| Version 5 | Column Mapping | | +| Version 6 | Identity Columns | | +| Version 7 | Table Features | | + +| Reader Version | Requirement | Status | +| -------------- | ----------------------------------- | ------ | +| Version 2 | Collumn Mapping | | +| Version 3 | Table Features (requires reader V7) | | + +[datafusion]: https://github.com/apache/arrow-datafusion +[ballista]: https://github.com/apache/arrow-ballista +[polars]: https://github.com/pola-rs/polars +[open]: https://cdn.jsdelivr.net/gh/Readme-Workflows/Readme-Icons@main/icons/octicons/IssueOpened.svg +[done]: https://cdn.jsdelivr.net/gh/Readme-Workflows/Readme-Icons@main/icons/octicons/IssueClosed.svg +[roadmap]: https://github.com/delta-io/delta-rs/issues/1128 +[merge-py]: https://github.com/delta-io/delta-rs/issues/1357 +[merge-rs]: https://github.com/delta-io/delta-rs/issues/850 +[writer-rs]: https://github.com/delta-io/delta-rs/issues/851 +[onelake-rs]: https://github.com/delta-io/delta-rs/issues/1418 +[protocol]: https://github.com/delta-io/delta/blob/master/PROTOCOL.md diff --git a/python/pyproject.toml b/python/pyproject.toml index e163544fc1..634b675434 100644 --- a/python/pyproject.toml +++ b/python/pyproject.toml @@ -11,7 +11,10 @@ requires-python = ">=3.7" keywords = ["deltalake", "delta", "datalake", "pandas", "arrow"] classifiers = [ "License :: OSI Approved :: Apache Software License", - "Programming Language :: Python :: 3 :: Only" + "Programming Language :: Python :: 3.8", + "Programming Language :: Python :: 3.9", + "Programming Language :: Python :: 3.10", + "Programming Language :: Python :: 3.11" ] dependencies = [ "pyarrow>=8,<=12",