Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support "WindowFunction" with DuckDB backend #70

Open
pthatte1-bb opened this issue Aug 9, 2024 · 1 comment
Open

Support "WindowFunction" with DuckDB backend #70

pthatte1-bb opened this issue Aug 9, 2024 · 1 comment

Comments

@pthatte1-bb
Copy link
Contributor

pthatte1-bb commented Aug 9, 2024

Spark queries with Window functions fail in the gateway with error - "window expression type not supported"

Snippet to recreate error:

get_customer_database(spark_session)
.withColumn(
    "rank",
    rank().over(Window.partitionBy(col("c_mktsegment")).orderBy(col("c_custkey"))))
.collect()

AFAIK, DuckDB DOES support Window functions. Snippet with runnable DuckDB-SQL version of above query -

SELECT 
    row_number() OVER (PARTITION BY c_mktsegment ORDER BY c_custkey), 
    * 
from 
    read_parquet('{parquet_path}')

And the Substrait spec also DOES support Window functions - https://substrait.io/expressions/window_functions/

But it looks like DuckDB-Substrait extension doesn't yet support substrait production for Window functions either. The get_substrait_json(<window-fn-query>) call using above query fails with error "duckdb.duckdb.InternalException: INTERNAL Error: WINDOW"

@EpsilonPrime
Copy link
Contributor

gateway support for row_number has been added in #82. This works for Datafusion at the moment. Should automatically start working once DuckDB has window function support (and then we'll update the tests). Will leave this open until then to make sure.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants