Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove macro in iter_to_array for List #8414

Merged
merged 5 commits into from
Dec 6, 2023
Merged

Conversation

jayzhan211
Copy link
Contributor

Which issue does this PR close?

Ref #7988

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

#[test]
fn test_nested_lists() {
// Define inner list scalars
let a1 = ListArray::from_iter_primitive::<Int32Type, _, _>(vec![Some(vec![
Copy link
Contributor Author

@jayzhan211 jayzhan211 Dec 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test includes array that length > 1, which violate the ScalarValue::List constraint. So I rewrite the test.

Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Comment on lines 3220 to 3223
let scalars = build_list::<i64>(vec![
vec![Some(1), Some(2), Some(3)],
vec![Some(4), Some(5)],
]);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can add tests that contain null?

This comment was marked as resolved.

Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @jayzhan211 -- this is looking like a great start

)
}
}
fn build_list_array(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😍 Amazing @jayzhan211

I believe as written, this function will always return ListArray (never LargeListArray) so I think this probably needs to be templated on <O: OffsetSizeTrait> -- the way @Weijun-H has done in #8322 for example)

Copy link
Contributor Author

@jayzhan211 jayzhan211 Dec 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this fn handle large list too, otherwise the large list test should fail 🤔

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right

fn build_list_array(
scalars: impl IntoIterator<Item = ScalarValue>,
) -> Result<ArrayRef> {
let arrays = scalars
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it necessary to make a 1 element array for each ScalarValue?

I wonder if you could make the values array directly like

let values = ScalarValue::iter_to_array(scalars);
// and then make a single offset of 0

🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After iter_to_array, the values is not guaranteed to be single element, so I don't think it works

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the result of iter_to_array is not guaranteed to be single array like what ScalarValue is, so recursive calls in this function does not make sense to me.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see what you are saying -- sorry I was confused.

@jayzhan211 jayzhan211 requested a review from alamb December 6, 2023 00:08
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @jayzhan211 -- I took another look and this is a really nice change. Thank you very much 🙏

)
}
}
fn build_list_array(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right

fn build_list_array(
scalars: impl IntoIterator<Item = ScalarValue>,
) -> Result<ArrayRef> {
let arrays = scalars
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see what you are saying -- sorry I was confused.

@alamb
Copy link
Contributor

alamb commented Dec 6, 2023

I also did some ad hoc performance testing and this change did not change the performance for count distinct queries for me:

andrewlamb@Andrews-MacBook-Pro:~/Downloads$ /Users/andrewlamb/Software/arrow-datafusion/datafusion-cli/target/release/datafusion-cli -c 'select count(distinct trace_id), count(distinct "span.kind") from traces group by "service.name";'
DataFusion CLI v33.0.0
+---------------------------------+----------------------------------+
| COUNT(DISTINCT traces.trace_id) | COUNT(DISTINCT traces.span.kind) |
+---------------------------------+----------------------------------+
| 30714                           | 1                                |
| 30678                           | 1                                |
| 30714                           | 1                                |
| 30678                           | 1                                |
| 30673                           | 1                                |
| 3632710                         | 2                                |
+---------------------------------+----------------------------------+
6 rows in set. Query took 1.771 seconds.

And I verified (by putting a println in ) that this build array was getting called

@alamb alamb merged commit 107791a into apache:main Dec 6, 2023
22 checks passed
appletreeisyellow pushed a commit to appletreeisyellow/datafusion that referenced this pull request Dec 15, 2023
* introduce build_array_list_primitive

Signed-off-by: jayzhan211 <[email protected]>

* introduce large list

Signed-off-by: jayzhan211 <[email protected]>

* fix test

Signed-off-by: jayzhan211 <[email protected]>

* add null

Signed-off-by: jayzhan211 <[email protected]>

* clippy

Signed-off-by: jayzhan211 <[email protected]>

---------

Signed-off-by: jayzhan211 <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants