-
Notifications
You must be signed in to change notification settings - Fork 414
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: integrate with object_store / datafusion APIs #703
Conversation
@houqp @wjones127 - latest datafusion is already using the However since #696 has also been raised, I was wondering if we should start to integrate with the Here I could either switch the focus of this PR - or maybe skip datafusion tests on windows and do a follow up. Of course I could also go into handling windows paths, but right now this feels very temporary ... Thoughts? |
If it seems feasible, that seems like it's likely a good idea. I've been thinking trying to integrate |
yeah +1 on incrementally migrate to object_store_rs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good, except for two small things that I noticed.
This is a good start to integrating the object store.
std::fs::create_dir_all(path).unwrap(); | ||
std::fs::remove_dir_all(path).unwrap(); | ||
std::fs::create_dir_all(path).unwrap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is happening here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good question :D! Due to Path
always canonicalizing all local path, they must exists - but I guess one create_all is enough...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
turns out we needed the first crate all - remove all will panic if the folders do not exist (i.,e. in CI), but we still need empty folders, thus the second one. Added it back since it is "just" test code, but the proper solution would be to clean up directly after the tests.
rust/src/object_store.rs
Outdated
/// Move an object from one path to another in the same object store. | ||
/// | ||
/// By default, this is implemented as a copy and then delete source. It may not | ||
/// check when deleting source that it was the same object that was originally copied. | ||
/// | ||
/// If there exists an object at the destination, it will be overwritten. | ||
async fn rename(&self, from: &Path, to: &Path) -> ObjectStoreResult<()> { | ||
self.copy(from, to).await?; | ||
self.delete(from).await | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think, you can skip including this and use the default implementation.
Description
Using the latest arrow (18) and datafusion (10) dependencies.supersedes #666
part of #610
This PR moves most file path handling to use the
Path
abstraction fromobject_store
. We do so by way of moving our internalStorageBackend
behind a newDeltaObjectStore
, which implementsObjectStore
. The delta store exposes all path as relative to the table root, consistent with how they are treated in the log.To integrate with datafusion, a table-specific specific store is registered on the runtime environment. Datafusion required
get_range
to be defined, which I did for local using the new store. This was just to get datafusion tests to pass - this needs to be improved in a follow up :).One major change that is implied by
Path
is, that for local file system the table root folder has to exist now prior to creating the table, but can be empty.The most annoying aspect was handling windlows path for local. but I tried to keep all special handling contained in
DeltaObjectStore
and suspect most of it will disappear, once we move the storage backends to object_store.Last but not least, I found that the Azure paths are somewhat of an oddity compared to other - will open a follow up for that :).
@Blajda - this should also remove the delete issues with vacuum. I activated them in windows and it seems to work
cc @Tom-Newton
Related Issue(s)
closes #671
Documentation