-
Couldn't load subscription status.
- Fork 1.7k
feat: impl the basic string_agg function
#8148
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| a | ||
|
|
||
| query TTTT | ||
| SELECT STRING_AGG('a',','), STRING_AGG('a', NULL), STRING_AGG(NULL, ','), STRING_AGG(NULL, NULL) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In postgresql, select string_agg('a', NULL) return a
but in duckdb, select string_agg('a', NULL) return NULL
I implemented it according to postgresql.
| NULL NULL 1 NULL 5 15 0 0 0 | ||
| 3 0 2 1 5.5 16.5 0.5 4.5 1.5 | ||
| 3 0 3 1 6 18 2 18 6 | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I ported these tests over from duckdb.
|
cc @universalmind303 -- do you have time to help review this PR? |
merge main
|
Thank you! The implementation looks good to me, one minor suggestion is to add some empty string cases to sqllogictest like Also adopting test cases from mature systems like Postgres/Duckdb is a really good idea 👍🏼 , we should encourage contributors to do so when adding new functions |
Thanks for this example @2010YOUY01 , I found a bug in handling empty strings and nulls, which has now been fixed and added additional tests. |
|
the ci break maybe related to #8169 |
|
@haohuaijin, indeed seems like it. We will fix promptly |
|
@ozankabak I've submitted a PR to try to fix it #8186 |
|
Thank you for the hotfix, for some reason we didn't realize the tweak needed to accommodate #8034 |
|
I will approve the hotfix once CI passes |
Co-authored-by: universalmind303 <[email protected]>
Co-authored-by: universalmind303 <[email protected]>
|
PTAL @alamb |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @haohuaijin this looks good to me -- I have a few small suggestions, but overall I think it looks good.
One thing I thought of was is I think this function will be non deterministic when aggregating without an ORDER BY -- specifically since the aggregate can see the inputs in any order when doing a multi-phase repartitioned group by.
I don't think we need to fix this now, but we should probably file a ticket to fix it
Thanks again
|
|
||
| impl Accumulator for StringAggAccumulator { | ||
| fn update_batch(&mut self, values: &[ArrayRef]) -> Result<()> { | ||
| let string_array: Vec<_> = as_generic_string_array::<i64>(&values[0])? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
using i64 means this will always take LargeStringArray, right? Shouldn't this use i32 for StringArray and i64 for LargeStringArray? However, the tests seem to cover this so this one looks good to me
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do some cast like in sum i32 -> i64, so when we input a StringArray, cast it to a LargeStringArray.
| SELECT 'text1'::varchar(1000) as my_column union all | ||
| SELECT 'text1'::varchar(1000) as my_column union all | ||
| SELECT 'text1'::varchar(1000) as my_column |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| SELECT 'text1'::varchar(1000) as my_column union all | |
| SELECT 'text1'::varchar(1000) as my_column union all | |
| SELECT 'text1'::varchar(1000) as my_column | |
| SELECT 'text1'::varchar(1000) as my_column union all | |
| SELECT 'text2'::varchar(1000) as my_column union all | |
| SELECT 'text3'::varchar(1000) as my_column |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
with out order by inside string_agg, the result is non deterministic, so I keep text1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense -- thank you
Co-authored-by: Andrew Lamb <[email protected]>
Co-authored-by: Andrew Lamb <[email protected]>
Co-authored-by: Andrew Lamb <[email protected]>
track in #8260 |
|
Thanks @haohuaijin @universalmind303 and @2010YOUY01 |
Which issue does this PR close?
part of #7910
Rationale for this change
add a new aggregate function
string_aggWhat changes are included in this PR?
Implement the basic
string_aggfunction, which currently does not supportdistinctandorder byclauses.Support for
distinctandorder bywill be added in future pr.Are these changes tested?
Are there any user-facing changes?