Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API, Core: Add data file reference to DeleteFile #11443

Merged
merged 2 commits into from
Nov 2, 2024

Conversation

aokolnychyi
Copy link
Contributor

@aokolnychyi aokolnychyi commented Nov 1, 2024

This PR adds referenced_data_file to our delete metadata as per the design doc for improving position deletes.

This PR cannot be merged until the vote is closed. This work is part of #11122.

@aokolnychyi aokolnychyi changed the title API, Core: Add data file reference to DeleteFile [WIP] API, Core: Add data file reference to DeleteFile Nov 1, 2024
@@ -448,6 +449,12 @@ public Object get(int pos) {
return wrapped.equalityFieldIds();
case 15:
return wrapped.sortOrderId();
case 16:
if (wrapped instanceof DeleteFile) {
Copy link
Contributor Author

@aokolnychyi aokolnychyi Nov 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not a big fan of this instanceof given that it is called in a tight loop. We need it as IndexedDataFile is used for both data and deletes. At the same time, I am not sure it is reasonable to add IndexedDeleteFile for the sake of handling 3 different fields.

Any thoughts?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I would define referencedDataFile on ContentFile and have DataFile always return null. That's how we handle content in v1/v2. DataFile returns a constant value.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to avoid unnecessary methods in DataFile. The same goes for content offset and size, which will only be applicable to deletes. Let me think.

Copy link
Contributor Author

@aokolnychyi aokolnychyi Nov 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is better not to expose these methods on DataFile:

  • Avoids reliance on validation outside of this block to ensure these fields are null for data files. We had bugs before that allowed persisting equality field IDs for data files.
  • Avoids unnecessary methods that can be invoked but have no meaning, which we have to explain.

@@ -339,6 +344,9 @@ protected <T> void internalSet(int pos, T value) {
this.sortOrderId = (Integer) value;
return;
case 17:
this.referencedDataFile = value != null ? value.toString() : null;
return;
case 18:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This changes the ordinal of fileOrdinal. This normally doesn't matter as we read with a projection. We also changed it a couple of times before. Anything I missed?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As long as both reads and writes are updated, it should be fine.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for confirming!

@aokolnychyi aokolnychyi changed the title [WIP] API, Core: Add data file reference to DeleteFile API, Core: Add data file reference to DeleteFile Nov 2, 2024
@aokolnychyi aokolnychyi merged commit d9b9768 into apache:main Nov 2, 2024
50 checks passed
@aokolnychyi
Copy link
Contributor Author

Thanks for reviewing, @rdblue @amogh-jahagirdar!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants