-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
deps: migrate to ibis-framework >= "7.1.0"
#53
Conversation
This should unlock some bug fixes as well as potential `UNNEST` support in a future change.
We're getting a few failures, such as
I asked some friends over on the ibis project, and they shared that Deferred objects aren't bound to a table expression. Also, |
Marking as Draft again until we have a greater need to upgrade. |
ibis-framework >= "7.0.0"
ibis-framework >= "7.1.0"
qcut support currently blocked by what I believe to be a regression in ibis related to those Deferred objects: ibis-project/ibis#7631 Will investigate if there's a way to work around it. Seems we are ending up in this code path somehow: https://github.com/ibis-project/ibis/blob/d6a2f0922ea92b0e03a0431bf9011e7103565061/ibis/expr/types/generic.py#L760 |
@@ -62,7 +63,16 @@ def __init__( | |||
self._columns = tuple(columns) | |||
# To allow for more efficient lookup by column name, create a | |||
# dictionary mapping names to column values. | |||
self._column_names = {column.get_name(): column for column in self._columns} | |||
self._column_names = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apparently you are adding support for Deferred
objects. Currently we have a couple of objects interpretable as a value expression besides the expressions themselves, e.g. column names or selector in addition to deferred objects. The issue that ibis doesn't provide a clear, concise and backward compatible API function to bind those objects to a table. We have various protected functions and methods used in a hardly followable fashion.
Instead of providing a stable API for deferred, wouldn't it be better to have a stable function which convert any sort of ibis objects to an expression (given a table) if possible, something like ibis.bind(obj, table)
?
Asking because we have something similar under preparation, but could be a good idea to have a public variant of that: https://github.com/ibis-project/ibis/pull/7580/files#diff-4400549b2eaa4ca3a3107f13fa78210e37277659acfe2a1eb4129cfbfaa66531R105-R134
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of providing a stable API for deferred, wouldn't it be better to have a stable function which convert any sort of ibis objects to an expression (given a table) if possible, something like ibis.bind(obj, table)?
Yes, we'd definitely use something like that here.
Honestly, I'm not 100% how we are ending up with deferred objects in the first place. It seems to be with how we're building some of our windowed functions, but in all of the cases I know of there is an underlying table. I think we may just be hitting this bug: ibis-project/ibis#7631
@@ -516,17 +516,14 @@ def remote_function_node( | |||
"""Creates an Ibis node representing a remote function call.""" | |||
|
|||
fields = { | |||
name: rlz.value(type_) if type_ else rlz.any | |||
name: rlz.ValueOf(None if type_ == "ANY TYPE" else type_) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually we are using almost the same code internally, possibly it could worth to expose it in a more developer friendly way after we understand the actual requirements. We can open an issue on our side to track this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True. I was hoping to upstream this sooner, but now that I see https://ibis-project.org/reference/scalar-udfs#ibis.expr.operations.udf.scalar.builtin maybe I can migrate to that? I'll try that in a separate PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Started on this here: #277
I found a few issues already with UDF in the BQ backend. I'll get those documented soon.
from ibis.expr.operations.analytic import Analytic | ||
import ibis.expr.datatypes as dt | ||
import ibis.expr.operations.analytic as ibis_ops_analytic | ||
import ibis.expr.operations.core as ibis_ops_core |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest to just simply use import ibis.expr.operations as ops
since all op types are exposed there.
"""Retrieve the first element.""" | ||
|
||
arg = rlz.column(rlz.any) | ||
output_dtype = rlz.dtype_like("arg") | ||
arg: ibis_ops_core.Column[dt.Any] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can just simply use ops.Column
without the parameter, it means the same thing as ops.Column[dt.Any]
|
||
|
||
class LastNonNullValue(Analytic): | ||
class LastNonNullValue(ibis_ops_analytic.Analytic): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we have been breaking several internal parts of the codebase (sorry for that), maintaining custom operations this way is probably inconvenient. I wanted to have a documented and stable way of creating custom nodes for a long time now, just didn't have enough usage examples which we could draft a desired API from. This might be another topic we should discuss in our issue tracker.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call. Filed ibis-project/ibis#7767
I don't even know that it needs to be a stable interface for custom ops. Maybe just an escape hatch of some sort for raw SQL that's just a Value, not a full table expression like the current Backend.sql() escape hatch. That said, such an escape hatch is going to look a lot like a custom op, since ideally we'd propagate type info somehow so that expressions can continue to be built on it.
This should unlock some bug fixes as well as potential
UNNEST
support in a future change.Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:
Towards internal issue 301459557 🦕