feat(udf): add support for builtin aggregate UDFs #7134

cpcloud · 2023-09-12T12:18:10Z

Adds support for builtin user-defined aggregate functions. This came up as a use case during a blog post I am authoring on the expanded array functionality for BigQuery that is landing in ibis 7.0.0. The blog post no longer uses this feature, but it's a useful one regardless.

Depends on #7122.

jcrist

Overall this seems pretty good.

One thing I thought about while reviewing this is that maybe grouping functionality by UDF implementation (builtin, pyarrow, ...) rather than kind (scalar, agg, ...) would make more sense? So it'd be udf.builtin.scalar instead of udf.scalar.builtin?

Most/all backends should be able to support udf.builtin for exposing builtin functionality, but the other UDF types would be rarer to support. If a user wants to expose a builtin function directing them to udf.builtin.* rather than udf.*.builtin might make things clearer since that functionality is all grouped together?

No need to deal with this now/here if we want to make this change, but it came up while reviewing it so I thought I'd mention it.

ibis/expr/operations/udf.py

docs/how-to/extending/builtin.qmd

cpcloud · 2023-09-14T16:51:57Z

One thing I thought about while reviewing this is that maybe grouping functionality by UDF implementation (builtin, pyarrow, ...) rather than kind (scalar, agg, ...) would make more sense? So it'd be udf.builtin.scalar instead of udf.scalar.builtin?

Yeah ...

I went back and forth on this a bit when building the API.

I don't have a super strong opinion, but my thinking is based on how people talk about the separate kinds of UDFs in other systems.

The first point of departure in my experience has always been the shape of the output: scalar, aggregate, or tabular, and so that was the first level of namespacing I chose.

I think but am not entirely sure, that this makes implementation a bit cleaner because there's more overlap in behavior in the scalar category than the builtin category. That's not really based on thorough analysis of anything though, just my intuition.

I think when I am reading UDF signatures the first thing I want to know is where I can use it, and scalar/agg/tabular is a stronger determinant of where I can call it than whether the thing is builtin or not.

jcrist · 2023-09-14T17:27:54Z

I think when I am reading UDF signatures the first thing I want to know is where I can use it, and scalar/agg/tabular is a stronger determinant of where I can call it than whether the thing is builtin or not.

That's a good point - I think for reading existing udfs scalar/agg/tabular is more critical, but for writing whether it's builtin/pyarrow/pandas feels more critical. I also don't have strong thoughts on this, what we have now seems fine, we can always reorg later if it a different layout becomes clearly better.

jcrist

Excellent, excited to use this new functionality

jcrist · 2023-09-14T17:29:06Z

docs/how-to/extending/builtin.qmd

+      ]
+   )
+)
+expr


Nice example

cpcloud added this to the 7.0 milestone Sep 12, 2023

cpcloud added feature Features or general enhancements udf Issues related to user-defined functions experimental Experimental features labels Sep 12, 2023

cpcloud changed the title ~~builtin udafs~~ feat(udf): add support for builtin aggregate UDFs Sep 12, 2023

cpcloud mentioned this pull request Sep 12, 2023

feat: allow specifying Optional and/or T | None in UDFs #7133

Closed

1 task

cpcloud force-pushed the builtin-udafs branch from 6387b34 to 91b4ca2 Compare September 12, 2023 17:33

cpcloud mentioned this pull request Sep 12, 2023

docs(blog): add bigquery arrays 7.0.0 blog post #7136

Merged

cpcloud force-pushed the builtin-udafs branch from 91b4ca2 to 6c0ce35 Compare September 12, 2023 17:37

cpcloud added the sql Backends that generate SQL label Sep 12, 2023

cpcloud force-pushed the builtin-udafs branch 2 times, most recently from cdda71e to 1398acd Compare September 13, 2023 10:44

cpcloud marked this pull request as ready for review September 13, 2023 10:46

cpcloud force-pushed the builtin-udafs branch 9 times, most recently from 79d4668 to 4158ccc Compare September 13, 2023 17:31

cpcloud requested a review from jcrist September 14, 2023 11:59

jcrist reviewed Sep 14, 2023

View reviewed changes

cpcloud force-pushed the builtin-udafs branch from 4158ccc to 1e0b6a1 Compare September 14, 2023 16:46

cpcloud force-pushed the builtin-udafs branch 2 times, most recently from d95bfb6 to 6f3f4af Compare September 14, 2023 17:26

jcrist approved these changes Sep 14, 2023

View reviewed changes

docs/how-to/extending/builtin.qmd

]

)

)

expr

Copy link

Member

jcrist Sep 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice example

cpcloud reacted with laugh emoji

cpcloud added 6 commits September 14, 2023 14:03

feat(udf): add support for builtin aggregate UDFs

cedcf89

test(sqlalchemy): add tests for builtin aggs

c9e0ef2

feat(sqlalchemy): support builtin aggregate functions

1d67172

feat(clickhouse): implement builtin agg functions

c5b0e84

feat(datafusion): implement builtin agg functions

46741f8

feat(polars): implement support for builtin aggregate udfs

823befc

cpcloud force-pushed the builtin-udafs branch from 6f3f4af to 823befc Compare September 14, 2023 18:04

cpcloud enabled auto-merge (rebase) September 14, 2023 18:18

cpcloud merged commit c383f62 into ibis-project:master Sep 14, 2023

cpcloud deleted the builtin-udafs branch September 14, 2023 18:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(udf): add support for builtin aggregate UDFs #7134

feat(udf): add support for builtin aggregate UDFs #7134

cpcloud commented Sep 12, 2023 •

edited

Loading

jcrist left a comment

cpcloud commented Sep 14, 2023 •

edited

Loading

jcrist commented Sep 14, 2023

jcrist left a comment

jcrist Sep 14, 2023

feat(udf): add support for builtin aggregate UDFs #7134

feat(udf): add support for builtin aggregate UDFs #7134

Conversation

cpcloud commented Sep 12, 2023 • edited Loading

jcrist left a comment

Choose a reason for hiding this comment

cpcloud commented Sep 14, 2023 • edited Loading

jcrist commented Sep 14, 2023

jcrist left a comment

Choose a reason for hiding this comment

jcrist Sep 14, 2023

Choose a reason for hiding this comment

cpcloud commented Sep 12, 2023 •

edited

Loading

cpcloud commented Sep 14, 2023 •

edited

Loading