Delta-kernel-rs is an experimental Delta implementation focused on interoperability with a wide range of query engines. It currently only supports reads.
The Delta Kernel project is a Rust and C library for building Delta connectors that can read (and soon, write) Delta tables without needing to understand the Delta protocol details. This is the Rust/C equivalent of Java Delta Kernel.
Delta-kernel-rs is split into a few different crates:
- kernel: The actual core kernel crate
- acceptance: Acceptance tests that validate correctness via the Delta Acceptance Tests
- derive-macros: A crate for our derive-macros to live in
- ffi: Functionallity that enables delta-kernel-rs to be used from
C
orC++
See the ffi directory for more information.
By default we build only the kernel
and acceptance
crates, which will also build derive-macros
as a dependency.
To get started, install Rust via rustup, clone the repository, and then run:
cargo test
This will build the kernel, run all unit tests, fetch the Delta Acceptance Tests data and run the acceptance tests against it.
As it is a library, in general you will want to depend on delta-kernel-rs
by adding it as a
dependency to your Cargo.toml
. For example:
delta_kernel = "0.1"
We intend to follow Semantic Versioning. However, in the 0.x
line, the APIs
are still unstable. We therefore may break APIs within minor releases (that is, 0.1
-> 0.2
), but
we will not break APIs in patch releases (0.1.0
-> 0.1.1
).
- API Docs
- arcitecture.md document describing the kernel architecture (currently wip)
There are some example programs showing how delta-kernel-rs
can be used to interact with delta
tables. They live in the kernel/examples
directory.
delta-kernel-rs is still under heavy development but follows conventions adopted by most Rust projects.
There are a few key concepts that will help in understanding kernel:
- The
Engine
trait encapsulates all the functionality and engine or connector needs to provide to the Delta Kernel in order to read the Delta table. - The
DefaultEngine
is our default implementation of the the above trait. It lives inengine/default
, and provides a reference implementation for allEngine
functionality.DefaultEngine
uses arrow as its in-memory data format. - A
Scan
is the entrypoint for reading data from a table.
Some design principles which should be considered:
- async should live only in the
Engine
implementation. The core kernel does not use async at all. We do not wish to impose the need for an entire async runtime on an engine or connector. TheDefaultEngine
does use async quite heavily. It doesn't depend on a particular runtime however, and implementations could provide an "executor" based on tokio, smol, async-std, or whatever might be needed. Currently only atokio
based executor is provided. - Minimal
Table
API. The kernel intentionally exposes the concept of immutable versions of tables through the snapshot API. This encourages users to think about the Delta table state more accurately. - Prefer builder style APIs over object oriented ones.
- "Simple" set of default-features enabled to provide the basic functionality with the least necessary amount of dependencies possible. Putting more complex optimizations or APIs behind feature flags
- API conventions to make it clear which operations involve I/O, e.g. fetch or retrieve type verbiage in method signatures.
- When developing,
rust-analyzer
is your friend.rustup component add rust-analyzer
- If using
emacs
, both eglot and lsp-mode provide excellent integration withrust-analyzer
. rustic is a nice mode as well. - When also developing in vscode its sometimes convenient to configure rust-analyzer in
.vscode/settings.json
.
{
"editor.formatOnSave": true,
"rust-analyzer.cargo.features": ["default-engine", "acceptance"]
}
- The crate's documentation can be easily reviewed with:
cargo docs --open