-
Notifications
You must be signed in to change notification settings - Fork 608
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(udf): add support for builtin aggregate UDFs #7134
Conversation
6387b34
to
91b4ca2
Compare
91b4ca2
to
6c0ce35
Compare
cdda71e
to
1398acd
Compare
79d4668
to
4158ccc
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall this seems pretty good.
One thing I thought about while reviewing this is that maybe grouping functionality by UDF implementation (builtin, pyarrow, ...) rather than kind (scalar, agg, ...) would make more sense? So it'd be udf.builtin.scalar
instead of udf.scalar.builtin
?
Most/all backends should be able to support udf.builtin
for exposing builtin functionality, but the other UDF types would be rarer to support. If a user wants to expose a builtin function directing them to udf.builtin.*
rather than udf.*.builtin
might make things clearer since that functionality is all grouped together?
No need to deal with this now/here if we want to make this change, but it came up while reviewing it so I thought I'd mention it.
4158ccc
to
1e0b6a1
Compare
Yeah ... I went back and forth on this a bit when building the API. I don't have a super strong opinion, but my thinking is based on how people talk about the separate kinds of UDFs in other systems. The first point of departure in my experience has always been the shape of the output: scalar, aggregate, or tabular, and so that was the first level of namespacing I chose. I think but am not entirely sure, that this makes implementation a bit cleaner because there's more overlap in behavior in the scalar category than the builtin category. That's not really based on thorough analysis of anything though, just my intuition. I think when I am reading UDF signatures the first thing I want to know is where I can use it, and |
d95bfb6
to
6f3f4af
Compare
That's a good point - I think for reading existing udfs scalar/agg/tabular is more critical, but for writing whether it's builtin/pyarrow/pandas feels more critical. I also don't have strong thoughts on this, what we have now seems fine, we can always reorg later if it a different layout becomes clearly better. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excellent, excited to use this new functionality
] | ||
) | ||
) | ||
expr |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice example
6f3f4af
to
823befc
Compare
Adds support for builtin user-defined aggregate functions. This came up as a use case during a blog post I am authoring on the expanded array functionality for BigQuery that is landing in ibis 7.0.0. The blog post no longer uses this feature, but it's a useful one regardless.
Depends on #7122.