-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: impl the basic string_agg
function
#8148
Conversation
a | ||
|
||
query TTTT | ||
SELECT STRING_AGG('a',','), STRING_AGG('a', NULL), STRING_AGG(NULL, ','), STRING_AGG(NULL, NULL) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In postgresql, select string_agg('a', NULL)
return a
but in duckdb, select string_agg('a', NULL)
return NULL
I implemented it according to postgresql.
@@ -2987,3 +2987,66 @@ NULL NULL 1 NULL 3 6 0 0 0 | |||
NULL NULL 1 NULL 5 15 0 0 0 | |||
3 0 2 1 5.5 16.5 0.5 4.5 1.5 | |||
3 0 3 1 6 18 2 18 6 | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I ported these tests over from duckdb.
cc @universalmind303 -- do you have time to help review this PR? |
merge main
Thank you! The implementation looks good to me, one minor suggestion is to add some empty string cases to sqllogictest like Also adopting test cases from mature systems like Postgres/Duckdb is a really good idea 👍🏼 , we should encourage contributors to do so when adding new functions |
Thanks for this example @2010YOUY01 , I found a bug in handling empty strings and nulls, which has now been fixed and added additional tests. |
the ci break maybe related to #8169 |
@haohuaijin, indeed seems like it. We will fix promptly |
@ozankabak I've submitted a PR to try to fix it #8186 |
Thank you for the hotfix, for some reason we didn't realize the tweak needed to accommodate #8034 |
I will approve the hotfix once CI passes |
Co-authored-by: universalmind303 <[email protected]>
Co-authored-by: universalmind303 <[email protected]>
PTAL @alamb |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @haohuaijin this looks good to me -- I have a few small suggestions, but overall I think it looks good.
One thing I thought of was is I think this function will be non deterministic when aggregating without an ORDER BY
-- specifically since the aggregate can see the inputs in any order when doing a multi-phase repartitioned group by.
I don't think we need to fix this now, but we should probably file a ticket to fix it
Thanks again
|
||
impl Accumulator for StringAggAccumulator { | ||
fn update_batch(&mut self, values: &[ArrayRef]) -> Result<()> { | ||
let string_array: Vec<_> = as_generic_string_array::<i64>(&values[0])? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
using i64
means this will always take LargeStringArray
, right? Shouldn't this use i32 for StringArray
and i64
for LargeStringArray? However, the tests seem to cover this so this one looks good to me
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do some cast like in sum i32
-> i64
, so when we input a StringArray
, cast it to a LargeStringArray
.
SELECT 'text1'::varchar(1000) as my_column union all | ||
SELECT 'text1'::varchar(1000) as my_column union all | ||
SELECT 'text1'::varchar(1000) as my_column |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SELECT 'text1'::varchar(1000) as my_column union all | |
SELECT 'text1'::varchar(1000) as my_column union all | |
SELECT 'text1'::varchar(1000) as my_column | |
SELECT 'text1'::varchar(1000) as my_column union all | |
SELECT 'text2'::varchar(1000) as my_column union all | |
SELECT 'text3'::varchar(1000) as my_column |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
with out order by
inside string_agg
, the result is non deterministic, so I keep text1
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense -- thank you
Co-authored-by: Andrew Lamb <[email protected]>
Co-authored-by: Andrew Lamb <[email protected]>
Co-authored-by: Andrew Lamb <[email protected]>
track in #8260 |
Thanks @haohuaijin @universalmind303 and @2010YOUY01 |
Which issue does this PR close?
part of #7910
Rationale for this change
add a new aggregate function
string_agg
What changes are included in this PR?
Implement the basic
string_agg
function, which currently does not supportdistinct
andorder by
clauses.Support for
distinct
andorder by
will be added in future pr.Are these changes tested?
Are there any user-facing changes?