Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: a CountStar operation / expression that isn't tied to a particular table/relation/group-by #8698

Open
1 task done
tswast opened this issue Mar 19, 2024 · 3 comments
Assignees
Labels
feature Features or general enhancements

Comments

@tswast
Copy link
Collaborator

tswast commented Mar 19, 2024

Is your feature request related to a problem?

When specifying COUNT(*) / COUNT(1) to get the size of a table expression, I actually need to have a table expression available.

Context: In BigQuery DataFrames, we defer the creation of ibis table expressions as long as possible to allow for some expression rewrites to avoid too many joins/subqueries. For example, we rewrite some joins on the DataFrame/Series index as projections, as in these cases pandas often uses row identity to join the rows instead of doing a full join on a possibly non-unique index.

Describe the solution you'd like

I'd like to be able to specify COUNT(*) / COUNT(1) without a table expression, similar to ibis.row_number().

In the meantime, I am using this workaround:

import ibis

print(ibis.__version__)

bq = ibis.bigquery.connect()
table = bq.table("usa_1910_2013", schema="usa_names", database="bigquery-public-data")

@ibis.udf.agg.builtin
def count(value: int) -> int:
    """Count of a scalar."""

print(table.aggregate(total_rows=count(1)).compile())
print(table.aggregate(total_rows=count(1)).limit(1).execute())
print(table.group_by(table.gender).aggregate(total_rows=count(1)).compile())
print(table.group_by(table.gender).aggregate(total_rows=count(1)).execute())

This works fine for SQL engines, but we are hoping to use a local engine like polars at some point in future, in which case this may not work.

What version of ibis are you running?

8.0.0

What backend(s) are you using, if any?

BigQuery

Code of Conduct

  • I agree to follow this project's Code of Conduct
@tswast tswast added the feature Features or general enhancements label Mar 19, 2024
@kszucs kszucs self-assigned this Mar 25, 2024
@kszucs
Copy link
Member

kszucs commented Mar 25, 2024

Thanks for the issue report! This is something I wanted to do since it requires some special handling of these operations internally as well.

@cpcloud
Copy link
Member

cpcloud commented Mar 25, 2024

There are problems with joins in dealing with an expression that isn't ultimately bound to a table. An expression that isn't bound to a table works fine if there's just a single table, but how this would be handled with joins is unclear.

What does count(1) mean when computing it as a projection from a join? How about if I combine it with another expression?

I don't think we'd want to have a special case just for count(*)/count(1), but we also shouldn't give users the ability to create ambiguous expressions.

@kszucs
Copy link
Member

kszucs commented Mar 25, 2024

Further thinking about it erasing the table information would mean that t.count() has no table associated with it so t.count().execute() wouldn't be able to construct an aggregation since t.count() would return with an unbound CountStar() node.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Features or general enhancements
Projects
Status: backlog
Development

No branches or pull requests

3 participants