You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When specifying COUNT(*) / COUNT(1) to get the size of a table expression, I actually need to have a table expression available.
Context: In BigQuery DataFrames, we defer the creation of ibis table expressions as long as possible to allow for some expression rewrites to avoid too many joins/subqueries. For example, we rewrite some joins on the DataFrame/Series index as projections, as in these cases pandas often uses row identity to join the rows instead of doing a full join on a possibly non-unique index.
Describe the solution you'd like
I'd like to be able to specify COUNT(*) / COUNT(1) without a table expression, similar to ibis.row_number().
In the meantime, I am using this workaround:
importibisprint(ibis.__version__)
bq=ibis.bigquery.connect()
table=bq.table("usa_1910_2013", schema="usa_names", database="bigquery-public-data")
@ibis.udf.agg.builtindefcount(value: int) ->int:
"""Count of a scalar."""print(table.aggregate(total_rows=count(1)).compile())
print(table.aggregate(total_rows=count(1)).limit(1).execute())
print(table.group_by(table.gender).aggregate(total_rows=count(1)).compile())
print(table.group_by(table.gender).aggregate(total_rows=count(1)).execute())
This works fine for SQL engines, but we are hoping to use a local engine like polars at some point in future, in which case this may not work.
What version of ibis are you running?
8.0.0
What backend(s) are you using, if any?
BigQuery
Code of Conduct
I agree to follow this project's Code of Conduct
The text was updated successfully, but these errors were encountered:
There are problems with joins in dealing with an expression that isn't ultimately bound to a table. An expression that isn't bound to a table works fine if there's just a single table, but how this would be handled with joins is unclear.
What does count(1) mean when computing it as a projection from a join? How about if I combine it with another expression?
I don't think we'd want to have a special case just for count(*)/count(1), but we also shouldn't give users the ability to create ambiguous expressions.
Further thinking about it erasing the table information would mean that t.count() has no table associated with it so t.count().execute() wouldn't be able to construct an aggregation since t.count() would return with an unbound CountStar() node.
Is your feature request related to a problem?
When specifying COUNT(*) / COUNT(1) to get the size of a table expression, I actually need to have a table expression available.
Context: In BigQuery DataFrames, we defer the creation of ibis table expressions as long as possible to allow for some expression rewrites to avoid too many joins/subqueries. For example, we rewrite some joins on the DataFrame/Series index as projections, as in these cases pandas often uses row identity to join the rows instead of doing a full join on a possibly non-unique index.
Describe the solution you'd like
I'd like to be able to specify COUNT(*) / COUNT(1) without a table expression, similar to
ibis.row_number()
.In the meantime, I am using this workaround:
This works fine for SQL engines, but we are hoping to use a local engine like polars at some point in future, in which case this may not work.
What version of ibis are you running?
8.0.0
What backend(s) are you using, if any?
BigQuery
Code of Conduct
The text was updated successfully, but these errors were encountered: