Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix VACUUM by using the URI of the DeltaTable when filtering the files to delete #551

Merged
merged 1 commit into from
Feb 4, 2022

Conversation

fvaleye
Copy link
Collaborator

@fvaleye fvaleye commented Jan 26, 2022

Description

  • The obj.meta is the absolute path of the object on AWS S3 so we could use directly the table_uri for comparing the listed files with the valid files of the DeltaTable.

Related Issue(s)

Tested:

  • AWS S3
  • Local

@fvaleye fvaleye added the binding/rust Issues for the Rust crate label Jan 26, 2022
@fvaleye fvaleye requested a review from houqp January 26, 2022 13:31
Copy link
Member

@houqp houqp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, I think this is likely going to break for filesystem backend since it supports two different schemes: table/path and file://table/path. I think we should either remove the file:// scheme so we have one to one mapping between backends and schemes. Or we should change the behavior of list_objs to return path without schemes. The list_obj behavior was changed in #518, perhaps a better way to fix that problem is change head_obj call to return a path without scheme.

@houqp
Copy link
Member

houqp commented Feb 3, 2022

also cc @mosyp

@houqp houqp requested review from xianwill and mosyp February 3, 2022 22:22
@fvaleye
Copy link
Collaborator Author

fvaleye commented Feb 4, 2022

Thanks for your review @houqp 🙏

I will not merge the PR and wait for @mosyp review. I don't want to break anything with the different backend storage implementations.

Copy link
Contributor

@mosyp mosyp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, I don't expect this to impact kafka-delta-ingest as we don't use it there, maybe that's why it wasn't caught with #518

// obj_meta.path is not a URI. For example, for S3 objects, obj_meta.path is just the
// object key without `s3://` and bucket name.
let rel_path = extract_rel_path(&table_path, &obj_meta.path)?;
let rel_path = extract_rel_path(&self.table_uri, &obj_meta.path)?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I got it, this comment is obsolete now with #518 changes

@fvaleye
Copy link
Collaborator Author

fvaleye commented Feb 4, 2022

Thanks again @houqp and @mosyp

I merge it as it's ok and keep an eye out if any issues come up.

@fvaleye fvaleye merged commit 0b50018 into delta-io:main Feb 4, 2022
@houqp
Copy link
Member

houqp commented Feb 5, 2022

If we go with adding scheme everywhere, then we should definitely remove this line:

"file" => Ok(Uri::LocalPath(parts[1])),

Otherwise vacuum will not be working for table uris like file://...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
binding/rust Issues for the Rust crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

VACUUM operation failed with a S3 path
3 participants