Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add equality delete writer #703

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open

Conversation

ZENOTME
Copy link
Contributor

@ZENOTME ZENOTME commented Nov 19, 2024

This PR fix tests and refines the interface based on #372. Thanks, @Dysprosium0626 again for this great job! The fix is included in the extra commit. Feel free for any suggestions. @Xuanwo @liurenjie1024 @sdd @Fokko

Copy link
Contributor

@liurenjie1024 liurenjie1024 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ZENOTME for this pr, generally LGTM! There are some structs I don't understand quite well, please add comment.


/// A writer write data
pub struct EqualityDeleteFileWriter<B: FileWriterBuilder> {
inner_writer: Option<B::R>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When a None value makes sense? It seems that None always trigger error.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fn close(self) of file writer need the owenership. So we need to using option here is to take out the writer when we call close. Same thing happened in data file writer.

let writer = self.inner_writer.take().unwrap();

Maybe workaround is make fn close(self) of file writer to fn close(&mut self), but same thing will happen in file writer. We need to wrap parquet writer in Option. But the benefit is we can hide it in lower layer.

let metadata = self.writer.close().await.map_err(|err| {


/// Config for `EqualityDeleteWriter`.
pub struct EqualityDeleteWriterConfig {
equality_ids: Vec<usize>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this field id? Please add some comments here.


/// Help to project specific field from `RecordBatch`` according to the column id.
#[derive(Clone)]
pub struct ArrowFieldProjector {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm quite confusing what's the meaning of each field, please add more comment here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, do we actually need to expose this struct?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, do we actually need to expose this struct?

Yes, this struct can help the user to get the projected schema from the original schema. Our lowel writer API requires user to provide the final schema in file writer, see:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants