Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementation of array_intersect #8081

Merged
merged 2 commits into from
Nov 11, 2023
Merged

Conversation

Veeupup
Copy link
Contributor

@Veeupup Veeupup commented Nov 7, 2023

Which issue does this PR close?

Closes #6978

Rationale for this change

Using RowConverter to implement array_intersect, no duplicate code for different data types,

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

@github-actions github-actions bot added logical-expr Logical plan and expressions physical-expr Physical Expressions core Core DataFusion crate labels Nov 7, 2023
@Veeupup
Copy link
Contributor Author

Veeupup commented Nov 7, 2023

hi @edmondop @jayzhan211 , I have a draft implementation for array_intersect, I use a hashmap to store the first array values and find values that appear in the hashmap. And it works just fine and looks like a general approach?

But I encounter a problem for Float32 and Float64, we can not just use HashMap<f32/f64> because std::hash::Hash is not implemented for float types, so I want to fix it by introducing ordered-float to make it hashable, is it too heavy for this case ?
Or do you have any better ideas to make it? Very happy to see your advice !

@tustvold
Copy link
Contributor

tustvold commented Nov 7, 2023

Perhaps you could use https://docs.rs/arrow-row/latest/arrow_row/? This would have the added benefit of supporting more complex types, e.g. lists of lists, and not adding additional codegen (#7988). The notes I wrote on #6981 (comment) might be helpful for this

@Veeupup Veeupup changed the title Initial Implementation of array_intersect Implementation of array_intersect Nov 8, 2023
@Veeupup Veeupup marked this pull request as ready for review November 8, 2023 16:38
datafusion/expr/src/built_in_function.rs Outdated Show resolved Hide resolved
datafusion/physical-expr/src/array_expressions.rs Outdated Show resolved Hide resolved
datafusion/physical-expr/src/array_expressions.rs Outdated Show resolved Hide resolved
@Veeupup Veeupup force-pushed the array_interact branch 2 times, most recently from 6b7198f to f12525c Compare November 9, 2023 12:42
Copy link
Contributor

@jayzhan211 jayzhan211 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Roughly LGTM

@Veeupup
Copy link
Contributor Author

Veeupup commented Nov 9, 2023

@alamb PTAL : )

@xudong963
Copy link
Member

@Veeupup Triggered the ci, please fix it.

Signed-off-by: veeupup <[email protected]>

x
@Veeupup
Copy link
Contributor Author

Veeupup commented Nov 10, 2023

@xudong963 hi try fixing ci locally, maybe you can help me trigger it again or review it? : )

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like a nice implementation to me -- thank you @Veeupup @Dandandan @tustvold @xudong963 and @Dandandan -- what a team effort ❤️

Copy link
Member

@xudong963 xudong963 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, welcome to arrow-datafusion

@xudong963 xudong963 merged commit 8966dc0 into apache:main Nov 11, 2023
23 checks passed
@haohuaijin haohuaijin mentioned this pull request Nov 11, 2023
@Veeupup Veeupup deleted the array_interact branch November 21, 2023 15:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate logical-expr Logical plan and expressions physical-expr Physical Expressions sqllogictest SQL Logic Tests (.slt)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement array_intersect function
6 participants