-
Notifications
You must be signed in to change notification settings - Fork 159
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Add equality delete writer #703
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @ZENOTME for this pr, generally LGTM! There are some structs I don't understand quite well, please add comment.
|
||
/// A writer write data | ||
pub struct EqualityDeleteFileWriter<B: FileWriterBuilder> { | ||
inner_writer: Option<B::R>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When a None
value makes sense? It seems that None
always trigger error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fn close(self)
of file writer need the owenership. So we need to using option here is to take out the writer when we call close. Same thing happened in data file writer.
let writer = self.inner_writer.take().unwrap(); |
Maybe workaround is make fn close(self)
of file writer to fn close(&mut self)
, but same thing will happen in file writer. We need to wrap parquet writer in Option. But the benefit is we can hide it in lower layer.
let metadata = self.writer.close().await.map_err(|err| { |
|
||
/// Config for `EqualityDeleteWriter`. | ||
pub struct EqualityDeleteWriterConfig { | ||
equality_ids: Vec<usize>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this field id? Please add some comments here.
|
||
/// Help to project specific field from `RecordBatch`` according to the column id. | ||
#[derive(Clone)] | ||
pub struct ArrowFieldProjector { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm quite confusing what's the meaning of each field, please add more comment here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, do we actually need to expose this struct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, do we actually need to expose this struct?
Yes, this struct can help the user to get the projected schema from the original schema. Our lowel writer API requires user to provide the final schema in file writer, see:
Arc::new(delete_schema), |
This PR fix tests and refines the interface based on #372. Thanks, @Dysprosium0626 again for this great job! The fix is included in the extra commit. Feel free for any suggestions. @Xuanwo @liurenjie1024 @sdd @Fokko