-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce RowLayout to represent rows for different purposes #2261
Conversation
use std::sync::Arc; | ||
|
||
const UTF8_DEFAULT_SIZE: usize = 20; | ||
const BINARY_DEFAULT_SIZE: usize = 100; | ||
|
||
#[derive(Copy, Clone, Debug)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The main changes are below. Other files changes are almost mechanical.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not totally familiar with this code but the changes look reasonable to me
I plan to review this PR first thing tomorrow morning US eastern time (~ 6AM or so) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code looks good to me -- I didn't see any tests for the new WordAligned
format -- I think we should add some and I suggested one possible way
Thanks @yjshen
use std::sync::Arc; | ||
|
||
const UTF8_DEFAULT_SIZE: usize = 20; | ||
const BINARY_DEFAULT_SIZE: usize = 100; | ||
|
||
#[derive(Copy, Clone, Debug)] | ||
/// Type of a RowLayout | ||
pub enum RowType { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
@@ -85,52 +85,36 @@ macro_rules! fn_get_idx_opt { | |||
|
|||
/// Read the tuple `data[base_offset..]` we are currently pointing to | |||
pub struct RowReader<'a> { | |||
/// Layout on how to read each field |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 this is nice
@@ -96,51 +96,34 @@ macro_rules! fn_set_idx { | |||
|
|||
/// Reusable row writer backed by Vec<u8> | |||
pub struct RowWriter { | |||
/// Layout on how to write each field |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❤️ for reduced repetition
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I reviewed the changes commit by commit - LGTM -- thanks @yjshen
BooleanArray, | ||
Boolean, | ||
vec![Some(true), Some(false), None, Some(true), None], | ||
WordAligned |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👌 very nice
offsets.push(offset); | ||
offset += 8; // a 8-bytes word for each field | ||
assert!(!matches!(f.data_type(), DataType::Decimal(_, _))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
"plan: {}", | ||
DisplayableExecutionPlan::with_metrics(plan).one_line() | ||
); | ||
assert!( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change appears to be intermittently failing
https://github.com/apache/arrow-datafusion/runs/6100177086?check_suite_focus=true
https://github.com/apache/arrow-datafusion/runs/6100277301?check_suite_focus=true
https://github.com/apache/arrow-datafusion/runs/6100969854?check_suite_focus=true
Edit: This would appear to occur if the operator is too fast to record time passing, I'll get a PR up to fix
Which issue does this PR close?
The second part of #2188.
Rationale for this change
To support an 8-byte aligned row layout for grouping states of hash aggregation.
What changes are included in this PR?
Enable the reading and writing raw-bytes rows with two possible layouts.
Are there any user-facing changes?
An API change might have no effects since it's on the optional feature
row
.