feat: default logstore implementation #1742

dispanser · 2023-10-19T18:39:45Z

Description

Introduce a LogStore abstraction to channel all log store reads and writes through a single place. This is supposed to allow implementations with more sophisticated locking mechanisms that do not rely on atomic rename semantics for the underlying object store.

This does not change any functionality - it reorganizes read operations and commits on the delta commit log to be funneled through the respective methods of LogStore.

Rationale

The goal is to align the implementation of multi-cluster writes for Delta Lake on S3 with the one provided by the original delta library, enabling multi-cluster writes with some writers using Spark / Delta library and other writers using delta-rs For an overview of how it's done in delta, please see:

Delta blog post (high-level concept)
Associated Databricks design doc (detailed read)
S3DynamoDbLogStore.java(content warning: Java code behind this link)

This approach requires readers of a delta table to "recover" unfinished commits from writers - as a result, reading and writing is combined in a single interface, which in this PR is modeled after LogStore.java. Currently in delta-rs, read path for commits is implemented directly in DeltaTable, and there's no mechanism to implement storage-specific behavior like interacting with DynamoDb.

github-actions · 2023-10-19T18:40:11Z

ACTION NEEDED

delta-rs follows the Conventional Commits specification for release automation.

The PR title and description are used as the merge commit message. Please update your PR title and description to match the specification.

dispanser · 2023-10-19T18:40:26Z

@rtyler @wjones127

I'm not yet happy with how it turned out - opening as draft PR to allow others to weigh in on the approach.

In particular, the abstraction / separation between DeltaTable, LogStore,
and DeltaObjectStore doesn't feel sound yet. I think that the LogStore
could potentially be the only place that ever reads from and writes to
_delta_log, but I'm not sure how this would interact with other planned
changes (delta kernel, ...), and it would also delay the short-term goal of
aligning the multi-cluster write approaches with the JVM implementation.

Due to the fact that most operations (see DeltaOps) do not operate on a
DeltaTable directly, but still require commit semantics, the changes spread
over many different places without much benefit.

I'm thinking that maybe the functionality of DeltaObjectStore, which is a
relatively thin wrapper around the actual object, store could be folded into
the LogStore implementation directly, which would spare us all this
indirection and .object_store() calls everywhere.

Happy to hear your thoughts!

roeap · 2023-10-19T20:13:06Z

I'm thinking that maybe the functionality of DeltaObjectStore, which is a
relatively thin wrapper around the actual object, store could be folded into
the LogStore implementation directly, which would spare us all this
indirection and .object_store() calls everywhere.

Agreed, one of the things we were discussing and are moving (admittedly somewhat slowly) towards is to get rid of all the specific object store related implementations we have in this crate. to this end we recently removed the custom local storage.

There are a few operations which i believe require additional APIs - like VACUUM or FSCK - an these may yet need access to the underlying object store. Also for the Datafusion integration we need to register a store that exposes object store apis directly. All in all though being able to properly interop with spark / databricks would be great!

These are just some initial thoughts an will give this a more thorough review soon!

Thanks for this great work!

dispanser · 2023-11-01T11:25:43Z

@roeap : I would appreciate your opinion on how to move this forward.

One option I mentioned above is to merge all DeltaObjectStore functionality right into LogStore, , so LogStore would effectively replace DeltaObjectStores' role.
This could include impl ObjectStore for LogStore as it's currently done for the DeltaObjectStore, even though I feel these two things sit at different levels of abstraction:

ObjectStore: do stuff with bytes (low-level primitives to read / write / list)
LogStore: do stuff with the delta log: actions, versions, time travel, commit, ...

It's difficult for me to gauge how this impacts other lines of work currently ongoing, like crate splitting, ongoing kernel work, so I'd rather resolve this quickly before too many additional conflicts pile up.

I'm not 100% sure on the datafusion integration parts you mention - I had a quick look at register_store(...) in the datafusion module - couldn't we just pass in the actual object store, instead of DeltaObjectStore?

roeap · 2023-11-05T12:56:14Z

@dispanser - sorry for the delay.

One option I mentioned above is to merge all DeltaObjectStore functionality right into LogStore, , so LogStore would effectively replace DeltaObjectStores' role.

I believe this is the way to go. IIRC we have very little additional functionality added - i.e. just generating a unique URL for the specific object-store we use, as well as a simple check if a location is a delta table (_delta_log exists).

The main question I think is how we handle the Datafusion integration. The reason we need to create a unique store for datafusion is, that we always assume that the object stores root points at the root location for the delta table. This behaviour actually blocks us to have full delta protocol support, which allows relative paths as well as urls.

Within the kernel migrations we are also looking to migrate to URls in all external APIs instead of objects stores Path (which si always relative to the respective stores root.

I guess eventually we want to internally use somehting like datafusions registry to manage multiple stores if needed.

I'm not 100% sure on the datafusion integration parts you mention - I had a quick look at register_store(...) in the datafusion module - couldn't we just pass in the actual object store, instead of DeltaObjectStore?

Given the above, the answer is yes, we just need to register it for our custom unique url for now to handle the Paths we provide.

Introduce a `LogStore` abstraction to channel all log store reads and writes through a single place. This is supposed to allow implementations with more sophisticated locking mechanisms that do not rely on atomic rename semantics for the underlying object store. This does not change any functionality - it reorganizes read operations and commits on the delta commit log to be funneled through the respective methods of `LogStore`. The goal is to align the implementation of multi-cluster writes for Delta Lake on S3 with the one provided by the original `delta` library, enabling multi-cluster writes with some writers using Spark / Delta library and other writers using `delta-rs` For an overview of how it's done in delta, please see: 1. Delta [blog post](https://delta.io/blog/2022-05-18-multi-cluster-writes-to-delta-lake-storage-in-s3/) (high-level concept) 2. Associated Databricks [design doc](https://docs.google.com/document/d/1Gs4ZsTH19lMxth4BSdwlWjUNR-XhKHicDvBjd2RqNd8/edit#heading=h.mjjuxw9mcz9h) (detailed read) 3. [S3DynamoDbLogStore.java](https://github.com/delta-io/delta/blob/master/storage-s3-dynamodb/src/main/java/io/delta/storage/S3DynamoDBLogStore.java)(content warning: Java code behind this link) This approach requires readers of a delta table to "recover" unfinished commits from writers - as a result, reading and writing is combined in a single interface, which in this PR is modeled after [LogStore.java](https://github.com/delta-io/delta/blob/master/storage/src/main/java/io/delta/storage/LogStore.java). Currently in `delta-rs`, read path for commits is implemented directly in `DeltaTable`, and there's no mechanism to implement storage-specific behavior like interacting with DynamoDb.

dispanser · 2023-11-07T16:47:25Z

@roeap , I managed to remove all of DeltaObjectStore, and things still seem to work (according to our test suite, at least). I believe that this is now in a state that a review is useful.

FWIW I opted for not doing impl ObjectStore for LogStore, but instead pass down the inner ObjectStore for raw storage and datafusion operations. This is not set in stone, but I found it more sensible to have the distinction between low-level abstractions (ObjectStore) and higher-level concepts, which LogStore is aiming to become :).

roeap · 2023-11-07T16:57:54Z

@dispanser - awesome work, excited to get this merged, will get reviewing right away.

I opted for not doing impl ObjectStore for LogStore, but instead pass down the inner ObjectStore for raw storage and datafusion operations.

I believe that was the right thing to do. The actual solution to this is also just to move to URLs rather then object store paths in our external handling, including datafusion. once that is done, we have no more need for a dedicated store and can also get righ of the extra stuff on logstore.

As it so happens I have a PR in preparation that at least should get us a little bit in that direction 😄.

roeap

Looking great, left some minor nit comments, but overall I think this is ready to go.

One more merge with main and if you want address some comments, then we can merge.

Thanks for sticking with it, and really think this is a great improvement to the library.

crates/deltalake-core/src/operations/mod.rs

crates/deltalake-core/src/logstore/default_logstore.rs

crates/deltalake-core/src/logstore/mod.rs

Co-authored-by: Robert Pack <[email protected]>

roeap · 2023-11-08T05:06:58Z

@dispanser - the latest build failures are related to a new release of dynamo-db lock, so we have to get the fix onto main first.

Use `ObjectStore::delete_stream()` instead, which can utilize batch-delete operations from underlying cloud store APIs and is generally smarter (e.g., by emitting concurrent requests)

dispanser · 2023-11-08T07:06:28Z

@dispanser - the latest build failures are related to a new release of dynamo-db lock, so we have to get the fix onto main first.

I'll have a look into these, thanks for pointing them out.

The new version of this crate properly sets a lease duration such that the locks can actually expire

# Description Introduce a `LogStore` abstraction to channel all log store reads and writes through a single place. This is supposed to allow implementations with more sophisticated locking mechanisms that do not rely on atomic rename semantics for the underlying object store. This does not change any functionality - it reorganizes read operations and commits on the delta commit log to be funneled through the respective methods of `LogStore`. ## Rationale The goal is to align the implementation of multi-cluster writes for Delta Lake on S3 with the one provided by the original `delta` library, enabling multi-cluster writes with some writers using Spark / Delta library and other writers using `delta-rs` For an overview of how it's done in delta, please see: 1. Delta [blog post](https://delta.io/blog/2022-05-18-multi-cluster-writes-to-delta-lake-storage-in-s3/) (high-level concept) 2. Associated Databricks [design doc](https://docs.google.com/document/d/1Gs4ZsTH19lMxth4BSdwlWjUNR-XhKHicDvBjd2RqNd8/edit#heading=h.mjjuxw9mcz9h) (detailed read) 3. [S3DynamoDbLogStore.java](https://github.com/delta-io/delta/blob/master/storage-s3-dynamodb/src/main/java/io/delta/storage/S3DynamoDBLogStore.java)(content warning: Java code behind this link) This approach requires readers of a delta table to "recover" unfinished commits from writers - as a result, reading and writing is combined in a single interface, which in this PR is modeled after [LogStore.java](https://github.com/delta-io/delta/blob/master/storage/src/main/java/io/delta/storage/LogStore.java). Currently in `delta-rs`, read path for commits is implemented directly in `DeltaTable`, and there's no mechanism to implement storage-specific behavior like interacting with DynamoDb. --------- Co-authored-by: Robert Pack <[email protected]>

Introduce a `LogStore` abstraction to channel all log store reads and writes through a single place. This is supposed to allow implementations with more sophisticated locking mechanisms that do not rely on atomic rename semantics for the underlying object store. This does not change any functionality - it reorganizes read operations and commits on the delta commit log to be funneled through the respective methods of `LogStore`. The goal is to align the implementation of multi-cluster writes for Delta Lake on S3 with the one provided by the original `delta` library, enabling multi-cluster writes with some writers using Spark / Delta library and other writers using `delta-rs` For an overview of how it's done in delta, please see: 1. Delta [blog post](https://delta.io/blog/2022-05-18-multi-cluster-writes-to-delta-lake-storage-in-s3/) (high-level concept) 2. Associated Databricks [design doc](https://docs.google.com/document/d/1Gs4ZsTH19lMxth4BSdwlWjUNR-XhKHicDvBjd2RqNd8/edit#heading=h.mjjuxw9mcz9h) (detailed read) 3. [S3DynamoDbLogStore.java](https://github.com/delta-io/delta/blob/master/storage-s3-dynamodb/src/main/java/io/delta/storage/S3DynamoDBLogStore.java)(content warning: Java code behind this link) This approach requires readers of a delta table to "recover" unfinished commits from writers - as a result, reading and writing is combined in a single interface, which in this PR is modeled after [LogStore.java](https://github.com/delta-io/delta/blob/master/storage/src/main/java/io/delta/storage/LogStore.java). Currently in `delta-rs`, read path for commits is implemented directly in `DeltaTable`, and there's no mechanism to implement storage-specific behavior like interacting with DynamoDb. --------- Co-authored-by: Robert Pack <[email protected]>

github-actions bot added binding/python Issues for the Python package binding/rust Issues for the Rust crate rust labels Oct 19, 2023

dispanser mentioned this pull request Oct 19, 2023

refactor: introduce LogStore trait #1706

Closed

rtyler changed the title ~~Default logstore implementation~~ feature: default logstore implementation Oct 19, 2023

dispanser force-pushed the default-logstore-implementation branch from db65ba9 to 2bd1ef8 Compare October 20, 2023 04:14

dispanser changed the title ~~feature: default logstore implementation~~ feat: default logstore implementation Oct 20, 2023

dispanser force-pushed the default-logstore-implementation branch 4 times, most recently from 6a0d7b1 to b0d0111 Compare October 20, 2023 19:40

dispanser force-pushed the default-logstore-implementation branch 2 times, most recently from 15cc211 to 7f34f66 Compare October 27, 2023 17:51

github-actions bot added the delta-inspect label Nov 1, 2023

dispanser force-pushed the default-logstore-implementation branch from ccf67b3 to 112f7ea Compare November 6, 2023 10:49

github-actions bot removed the delta-inspect label Nov 6, 2023

dispanser force-pushed the default-logstore-implementation branch from 112f7ea to 394ae2b Compare November 6, 2023 10:52

github-actions bot removed binding/rust Issues for the Rust crate rust labels Nov 6, 2023

dispanser force-pushed the default-logstore-implementation branch from 394ae2b to 37706cc Compare November 6, 2023 14:29

dispanser added 2 commits November 6, 2023 20:57

DeltaObjectStore fold root_uri and to_uri into LogStore

4493f56

DeltaObjectStore: move additional functionality into LogStore

711c031

dispanser force-pushed the default-logstore-implementation branch from 69f9f3d to 711c031 Compare November 7, 2023 07:46

Remove DeltaObjectStore, in favor of LogStore + ObjectStore

5583247

dispanser marked this pull request as ready for review November 7, 2023 16:44

dispanser requested review from wjones127, fvaleye, roeap and rtyler as code owners November 7, 2023 16:44

roeap previously approved these changes Nov 7, 2023

View reviewed changes

Apply suggestions from code review

a24f2eb

Co-authored-by: Robert Pack <[email protected]>

dispanser dismissed roeap’s stale review via a24f2eb November 7, 2023 20:16

dispanser added 2 commits November 7, 2023 21:18

Apply additional review comments

c93f20e

Merge branch 'main' into default-logstore-implementation

7059d1e

Remove LogStore::delete_batch()

c8c6d32

Use `ObjectStore::delete_stream()` instead, which can utilize batch-delete operations from underlying cloud store APIs and is generally smarter (e.g., by emitting concurrent requests)

chore: upgrade to the latest dynamodb-lcok crate

7da59d6

The new version of this crate properly sets a lease duration such that the locks can actually expire

rtyler added enhancement New feature or request binding/rust Issues for the Rust crate labels Nov 8, 2023

Merge branch 'main' into default-logstore-implementation

2ae6948

github-actions bot removed the binding/rust Issues for the Rust crate label Nov 8, 2023

dispanser requested a review from roeap November 8, 2023 19:24

roeap approved these changes Nov 8, 2023

View reviewed changes

roeap added the binding/rust Issues for the Rust crate label Nov 9, 2023

roeap merged commit 9470678 into delta-io:main Nov 9, 2023
24 checks passed

rtyler mentioned this pull request Nov 12, 2023

cargo clippy fails on core in main #1843

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: default logstore implementation #1742

feat: default logstore implementation #1742

dispanser commented Oct 19, 2023 •

edited

Loading

github-actions bot commented Oct 19, 2023

dispanser commented Oct 19, 2023

roeap commented Oct 19, 2023

dispanser commented Nov 1, 2023

roeap commented Nov 5, 2023

dispanser commented Nov 7, 2023

roeap commented Nov 7, 2023

roeap left a comment

roeap commented Nov 8, 2023

dispanser commented Nov 8, 2023

feat: default logstore implementation #1742

feat: default logstore implementation #1742

Conversation

dispanser commented Oct 19, 2023 • edited Loading

Description

Rationale

github-actions bot commented Oct 19, 2023

dispanser commented Oct 19, 2023

roeap commented Oct 19, 2023

dispanser commented Nov 1, 2023

roeap commented Nov 5, 2023

dispanser commented Nov 7, 2023

roeap commented Nov 7, 2023

roeap left a comment

Choose a reason for hiding this comment

roeap commented Nov 8, 2023

dispanser commented Nov 8, 2023

dispanser commented Oct 19, 2023 •

edited

Loading