-
Notifications
You must be signed in to change notification settings - Fork 414
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix VACUUM by using the URI of the DeltaTable when filtering the files to delete #551
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, I think this is likely going to break for filesystem backend since it supports two different schemes: table/path
and file://table/path
. I think we should either remove the file://
scheme so we have one to one mapping between backends and schemes. Or we should change the behavior of list_objs to return path without schemes. The list_obj behavior was changed in #518, perhaps a better way to fix that problem is change head_obj call to return a path without scheme.
also cc @mosyp |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, I don't expect this to impact kafka-delta-ingest as we don't use it there, maybe that's why it wasn't caught with #518
// obj_meta.path is not a URI. For example, for S3 objects, obj_meta.path is just the | ||
// object key without `s3://` and bucket name. | ||
let rel_path = extract_rel_path(&table_path, &obj_meta.path)?; | ||
let rel_path = extract_rel_path(&self.table_uri, &obj_meta.path)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I got it, this comment is obsolete now with #518 changes
If we go with adding scheme everywhere, then we should definitely remove this line: delta-rs/rust/src/storage/mod.rs Line 233 in 2fa81c6
Otherwise vacuum will not be working for table uris like |
Description
obj.meta
is the absolute path of the object on AWS S3 so we could use directly thetable_uri
for comparing the listed files with the valid files of the DeltaTable.Related Issue(s)
Tested: