-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove macro in iter_to_array for List #8414
Conversation
#[test] | ||
fn test_nested_lists() { | ||
// Define inner list scalars | ||
let a1 = ListArray::from_iter_primitive::<Int32Type, _, _>(vec![Some(vec![ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test includes array that length > 1, which violate the ScalarValue::List constraint. So I rewrite the test.
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
20b0bcc
to
c4b5762
Compare
datafusion/common/src/scalar.rs
Outdated
let scalars = build_list::<i64>(vec![ | ||
vec![Some(1), Some(2), Some(3)], | ||
vec![Some(4), Some(5)], | ||
]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can add tests that contain null?
This comment was marked as resolved.
This comment was marked as resolved.
Sorry, something went wrong.
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @jayzhan211 -- this is looking like a great start
) | ||
} | ||
} | ||
fn build_list_array( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
😍 Amazing @jayzhan211
I believe as written, this function will always return ListArray
(never LargeListArray
) so I think this probably needs to be templated on <O: OffsetSizeTrait>
-- the way @Weijun-H has done in #8322 for example)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this fn handle large list too, otherwise the large list test should fail 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right
fn build_list_array( | ||
scalars: impl IntoIterator<Item = ScalarValue>, | ||
) -> Result<ArrayRef> { | ||
let arrays = scalars |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it necessary to make a 1 element array for each ScalarValue?
I wonder if you could make the values array directly like
let values = ScalarValue::iter_to_array(scalars);
// and then make a single offset of 0
🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After iter_to_array, the values is not guaranteed to be single element, so I don't think it works
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the result of iter_to_array is not guaranteed to be single array like what ScalarValue is, so recursive calls in this function does not make sense to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see what you are saying -- sorry I was confused.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @jayzhan211 -- I took another look and this is a really nice change. Thank you very much 🙏
) | ||
} | ||
} | ||
fn build_list_array( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right
fn build_list_array( | ||
scalars: impl IntoIterator<Item = ScalarValue>, | ||
) -> Result<ArrayRef> { | ||
let arrays = scalars |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see what you are saying -- sorry I was confused.
I also did some ad hoc performance testing and this change did not change the performance for count distinct queries for me:
And I verified (by putting a println in ) that this build array was getting called |
* introduce build_array_list_primitive Signed-off-by: jayzhan211 <[email protected]> * introduce large list Signed-off-by: jayzhan211 <[email protected]> * fix test Signed-off-by: jayzhan211 <[email protected]> * add null Signed-off-by: jayzhan211 <[email protected]> * clippy Signed-off-by: jayzhan211 <[email protected]> --------- Signed-off-by: jayzhan211 <[email protected]>
Which issue does this PR close?
Ref #7988
Rationale for this change
What changes are included in this PR?
Are these changes tested?
Are there any user-facing changes?