Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SQL Logical Plan Drops ordering within UDAFs #7531

Open
jacksonrnewhouse opened this issue Sep 12, 2023 · 4 comments
Open

SQL Logical Plan Drops ordering within UDAFs #7531

jacksonrnewhouse opened this issue Sep 12, 2023 · 4 comments
Labels
bug Something isn't working

Comments

@jacksonrnewhouse
Copy link

Describe the bug

SQL supports ordering within an aggregate function, e.g. SUM(a order by b desc). DataFusion's logical planner does correctly identify this for built-in aggregates, e.g. sum(), max(), min(). However, for UDAFs, it seems to be dropped.

I've written up an example here: https://github.com/ArroyoSystems/arrow-datafusion/blob/92e92b9e971bc407b39de2ce8c9bc793355168f7/datafusion-examples/examples/simple_udaf.rs#L174.

The output is

built-in aggregate Logical plan, has ORDER BY:
Projection: SUM(t.a) ORDER BY [t.a DESC NULLS FIRST]
  Aggregate: groupBy=[[]], aggr=[[SUM(t.a) ORDER BY [t.a DESC NULLS FIRST]]]
    TableScan: t
UDAF Logical plan, missing ORDER BY:
Projection: geo_mean(t.a)
  Aggregate: groupBy=[[]], aggr=[[geo_mean(t.a)]]
    TableScan: t

To Reproduce

Register a UDAF, write a query with ordering inside it, and inspect the logical plan.

Expected behavior

The UDAF expression should have a non-empty order_by clause.

Additional context

No response

@jacksonrnewhouse jacksonrnewhouse added the bug Something isn't working label Sep 12, 2023
@alamb
Copy link
Contributor

alamb commented Sep 12, 2023

Thank you for the report @jacksonrnewhouse -- I hope to investigate more fully at some point. Any additional debugging insight you have would be most helpful

@jacksonrnewhouse
Copy link
Author

Okay, I think I've found where this is happening and come up with a patch at https://github.com/ArroyoSystems/arrow-datafusion/pull/new/bugfix_7531. Where should I be writing a test for this?

@alamb
Copy link
Contributor

alamb commented Sep 13, 2023

Nice find @jacksonrnewhouse !

Where should I be writing a test for this?

I would suggest somewhere in https://github.com/apache/arrow-datafusion/blob/main/datafusion/core/tests/user_defined/user_defined_aggregates.rs

@alamb
Copy link
Contributor

alamb commented Aug 5, 2024

I think we may be close to / done fixing this as part of #8708

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants