Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support array_distinct function. #8268
Support array_distinct function. #8268
Changes from all commits
430caa2
2ffc9f8
c777184
009d6de
7f0152e
c2f5451
ac33215
18a6e84
9d8bc00
c5960dd
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not distinct array in columnar way,
arr
has only one column, using row format need extra encoding and decodingThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is great to distinct array without row converter, but I don't think we can do that without downcast to exact arr then do the distinction. Is there any recommended way?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also don't have another idea other than downcast arr, I was just wondering if it is worth to downcast to exact arr.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Downcasting to the exact array type can result in faster code in many cases, as the rust compiler can make specialized implemenations for each type. However, there are a lot of DataTypes, including nested ones like Dict, List, Struct, etc so making specialized implementations often requires a lot of work
The row converter handles all the types internally.
What we have typically done in the past with DataFusion is to use non type specific code like
RowConverter
for the general case, and then if we find a particular usecase needs faster performance we make special implementations. For example, we do so for grouing by single primtive columns (GROUP BY int32
) for exampleThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: put
let array = as_list_array(&args[0])?;
ingeneral_array_distinct
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I don't get your point. Iargelist differs with list, so I think it maybe better to handle it before generic function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean change the function signature:
Then cast array in the fucntion
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you are referring to something like
general_array_has_dispatch
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can merge this PR as is and then add support for
LargeList
(using theOffsetSize
trait) as a follow on PRSome generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.