Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug: regression with percent_rank compilation in 7.x (Deferred related?) #7631

Closed
1 task done
tswast opened this issue Nov 28, 2023 · 3 comments · Fixed by #7790
Closed
1 task done

bug: regression with percent_rank compilation in 7.x (Deferred related?) #7631

tswast opened this issue Nov 28, 2023 · 3 comments · Fixed by #7790
Assignees
Labels
bigquery The BigQuery backend bug Incorrect behavior inside of ibis regression Issues related to things that used to work but don't anymore
Milestone

Comments

@tswast
Copy link
Collaborator

tswast commented Nov 28, 2023

What happened?

percent_rank is working fine in 6.x, but there seems to have been a reversion in 7.x. Here's the working version in 6.1.0:

import ibis
import google.cloud.bigquery

ibis.__version__

Out[2]: '6.1.0'

bq = ibis.bigquery.connect(project_id="swast-scratch")
bqclient = google.cloud.bigquery.Client(project="swast-scratch")

bqclient.load_table_from_json(
    [{"col": 1}, {"col": 2}, {"col": 3}, {"col": 4}, {"col": 5}],
    "swast-scratch.my_dataset.small_ints",
    job_config=google.cloud.bigquery.LoadJobConfig(write_disposition="WRITE_TRUNCATE"),
).result()

Out[8]: LoadJob<project=swast-scratch, location=US, id=09379d8f-d04c-4f8b-9acd-2d3b1655c914>

table = bq.table("swast-scratch.my_dataset.small_ints")
table["col"].percent_rank().over(ibis.window(group_by=table["col"].notnull()))


Out[14]: 
0    0.00
1    0.25
2    0.50
3    0.75
4    1.00
Name: PercentRank(col), dtype: float64

table["col"].percent_rank().over(ibis.window(group_by=table["col"].notnull())).compile()

Out[17]: 'SELECT percent_rank() OVER (PARTITION BY t0.`col` IS NOT NULL ORDER BY t0.`col` ASC) AS `PercentRank_col`\nFROM `swast-scratch.my_dataset.small_ints` t0

What version of ibis are you using?

7.1.0

What backend(s) are you using, if any?

BigQuery

Relevant log output

In [1]: import ibis
   ...: import google.cloud.bigquery
   ...: 
   ...: ibis.__version__
   ...: 
Out[1]: '7.1.0'

In [2]: 
   ...: bq = ibis.bigquery.connect(project_id="swast-scratch")
   ...: bqclient = google.cloud.bigquery.Client(project="swast-scratch")
   ...: 
   ...: bqclient.load_table_from_json(
   ...:     [{"col": 1}, {"col": 2}, {"col": 3}, {"col": 4}, {"col": 5}],
   ...:     "swast-scratch.my_dataset.small_ints",
   ...:     job_config=google.cloud.bigquery.LoadJobConfig(write_disposition="WRITE_TRUNCATE"),
   ...: ).result()
   ...: 
Out[2]: LoadJob<project=swast-scratch, location=US, id=407e1b98-a841-49cb-a704-9e20b95d593f>

In [3]: table = bq.table("swast-scratch.my_dataset.small_ints")
   ...: table["col"].percent_rank().over(ibis.window(group_by=table["col"].notnull())).to_pandas()
   ...: 
Out[3]: bind(_).to_pandas()

In [4]: table = bq.table("swast-scratch.my_dataset.small_ints")
   ...: table["col"].percent_rank().over(ibis.window(group_by=table["col"].notnull()))
Out[4]: bind(_)

In [5]: table["col"].percent_rank().over(ibis.window(group_by=table["col"].notnull())).compile()
Out[5]: bind(_).compile()

In [6]: table[table["col"].percent_rank().over(ibis.window(group_by=table["col"].notnull()))].compile()
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[6], line 1
----> 1 table[table["col"].percent_rank().over(ibis.window(group_by=table["col"].notnull()))].compile()

File ~/envs/bigframes/lib/python3.10/site-packages/ibis/expr/types/core.py:350, in Expr.compile(self, limit, timecontext, params)
    326 def compile(
    327     self,
    328     limit: int | None = None,
    329     timecontext: TimeContext | None = None,
    330     params: Mapping[ir.Value, Any] | None = None,
    331 ):
    332     """Compile to an execution target.
    333 
    334     Parameters
   (...)
    348         Mapping of scalar parameter expressions to value
    349     """
--> 350     return self._find_backend().compile(
    351         self, limit=limit, timecontext=timecontext, params=params
    352     )

File ~/envs/bigframes/lib/python3.10/site-packages/ibis/backends/bigquery/__init__.py:620, in Backend.compile(self, expr, limit, params, **_)
    601 """Compile an Ibis expression.
    602 
    603 Parameters
   (...)
    617     backend.
    618 """
    619 self._define_udf_translation_rules(expr)
--> 620 sql = self.compiler.to_ast_ensure_limit(expr, limit, params=params).compile()
    622 return ";\n\n".join(
    623     # convert unnest function calls to explode
    624     query.transform(_anonymous_unnest_to_explode)
   (...)
    634     for query in sg.parse(sql, read=self.name)
    635 )

File ~/envs/bigframes/lib/python3.10/site-packages/ibis/backends/base/sql/compiler/base.py:37, in QueryAST.compile(self)
     35 def compile(self):
     36     compiled_setup_queries = [q.compile() for q in self.setup_queries]
---> 37     compiled_queries = [q.compile() for q in self.queries]
     38     compiled_teardown_queries = [q.compile() for q in self.teardown_queries]
     39     return self.context.collapse(
     40         list(
     41             chain(
   (...)
     46         )
     47     )

File ~/envs/bigframes/lib/python3.10/site-packages/ibis/backends/base/sql/compiler/base.py:37, in <listcomp>(.0)
     35 def compile(self):
     36     compiled_setup_queries = [q.compile() for q in self.setup_queries]
---> 37     compiled_queries = [q.compile() for q in self.queries]
     38     compiled_teardown_queries = [q.compile() for q in self.teardown_queries]
     39     return self.context.collapse(
     40         list(
     41             chain(
   (...)
     46         )
     47     )

File ~/envs/bigframes/lib/python3.10/site-packages/ibis/backends/base/sql/compiler/query_builder.py:269, in Select.compile(self)
    266 with_frag = self.format_subqueries()
    268 # SELECT
--> 269 select_frag = self.format_select_set()
    271 # FROM, JOIN, UNION
    272 from_frag = self.format_table_set()

File ~/envs/bigframes/lib/python3.10/site-packages/ibis/backends/base/sql/compiler/query_builder.py:324, in Select.format_select_set(self)
    322 for node in self.select_set:
    323     if isinstance(node, ops.Value):
--> 324         expr_str = self._translate(node, named=True, permit_subquery=True)
    325     elif isinstance(node, ops.TableNode):
    326         alias = context.get_ref(node)

File ~/envs/bigframes/lib/python3.10/site-packages/ibis/backends/base/sql/compiler/query_builder.py:239, in Select._translate(self, expr, named, permit_subquery, within_where)
    231 def _translate(self, expr, named=False, permit_subquery=False, within_where=False):
    232     translator = self.translator_class(
    233         expr,
    234         context=self.context,
   (...)
    237         within_where=within_where,
    238     )
--> 239     return translator.get_result()

File ~/envs/bigframes/lib/python3.10/site-packages/ibis/backends/base/sql/compiler/translator.py:224, in ExprTranslator.get_result(self)
    222 def get_result(self):
    223     """Compile SQL expression into a string."""
--> 224     translated = self.translate(self.node)
    225     if self._needs_name(self.node):
    226         # TODO: this could fail in various ways
    227         translated = self.name(translated, self.node.name)

File ~/envs/bigframes/lib/python3.10/site-packages/ibis/backends/base/sql/compiler/translator.py:256, in ExprTranslator.translate(self, op)
    254 elif type(op) in self._registry:
    255     formatter = self._registry[type(op)]
--> 256     return formatter(self, op)
    257 else:
    258     raise com.OperationNotDefinedError(f"No translation rule for {type(op)}")

File ~/envs/bigframes/lib/python3.10/site-packages/ibis/backends/base/sql/registry/main.py:23, in alias(translator, op)
     20 def alias(translator, op):
     21     # just compile the underlying argument because the naming is handled
     22     # by the translator for the top level expression
---> 23     return translator.translate(op.arg)

File ~/envs/bigframes/lib/python3.10/site-packages/ibis/backends/base/sql/compiler/translator.py:256, in ExprTranslator.translate(self, op)
    254 elif type(op) in self._registry:
    255     formatter = self._registry[type(op)]
--> 256     return formatter(self, op)
    257 else:
    258     raise com.OperationNotDefinedError(f"No translation rule for {type(op)}")

File ~/envs/bigframes/lib/python3.10/site-packages/ibis/backends/base/sql/registry/window.py:109, in window(translator, op)
    107 frame = op.frame
    108 if isinstance(func, translator._require_order_by) and not frame.order_by:
--> 109     frame = frame.copy(order_by=(func.arg,))
    111 # Time ranges need to be converted to microseconds.
    112 if isinstance(frame, ops.RangeWindowFrame):

AttributeError: 'PercentRank' object has no attribute 'arg'

Code of Conduct

  • I agree to follow this project's Code of Conduct
@tswast tswast added the bug Incorrect behavior inside of ibis label Nov 28, 2023
@cpcloud
Copy link
Member

cpcloud commented Nov 28, 2023

Thanks for the report! This does look like a regression.

@cpcloud cpcloud added the bigquery The BigQuery backend label Nov 28, 2023
@cpcloud cpcloud added this to the 7.2 milestone Nov 28, 2023
@lostmygithubaccount lostmygithubaccount moved this from backlog to todo in Ibis planning and roadmap Dec 11, 2023
@tswast
Copy link
Collaborator Author

tswast commented Dec 15, 2023

I don't fully understand it, but #7327 is the PR that removed the arg attribute from PercentRank.

@tswast
Copy link
Collaborator Author

tswast commented Dec 15, 2023

I figured it out. After #7327, calling percent_rank() requires a window with an explicit order_by. The error message was not very helpful, though. It wasn't until I manually reverted some of those changes that I could get ibis to generate some invalid SQL and debug based on the BigQuery error.

E google.api_core.exceptions.BadRequest: 400 Window ORDER BY is required for analytic function percent_rank at [13:32]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bigquery The BigQuery backend bug Incorrect behavior inside of ibis regression Issues related to things that used to work but don't anymore
Projects
Archived in project
3 participants