deps: migrate to `ibis-framework >= "7.1.0"` #53

tswast · 2023-09-22T16:06:48Z

This should unlock some bug fixes as well as potential UNNEST support in a future change.

Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:

Make sure to open an issue as a bug/issue before writing your code! That way we can discuss the change, evaluate designs, and agree on the general idea
Ensure the tests and linter pass
Code coverage does not decrease (if any source code was changed)
Appropriate docs were updated (if necessary)

Towards internal issue 301459557 🦕

This should unlock some bug fixes as well as potential `UNNEST` support in a future change.

…lease

tswast · 2023-11-14T20:40:47Z

We're getting a few failures, such as

____________________________ test_df_nlargest[all] _____________________________
[gw0] linux -- Python 3.11.1 /tmpfs/src/github/python-bigquery-dataframes/.nox/system_prerelease/bin/python

scalars_df_index =           bool_col                                          bytes_col  \
rowindex                                     ....999999+00:00  
8                         T             <NA>                              <NA>  

[9 rows x 13 columns]
scalars_pandas_df_index =           bool_col  ...                     timestamp_col
rowindex            ...                                  
0 ... ...  2038-01-19 03:14:17.999999+00:00
8            False  ...                              <NA>

[9 rows x 13 columns]
keep = 'all'

    @pytest.mark.parametrize(
        ("keep",),
        [
            ("first",),
            ("last",),
            ("all",),
        ],
    )
    def test_df_nlargest(scalars_df_index, scalars_pandas_df_index, keep):
>       bf_result = scalars_df_index.nlargest(
            3, ["bool_col", "int64_too"], keep=keep
        ).to_pandas()

[tests/system/small/test_dataframe.py:156](https://cs.corp.google.com/piper///depot/google3/tests/system/small/test_dataframe.py?l=156): 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
[bigframes/dataframe.py:929](https://cs.corp.google.com/piper///depot/google3/bigframes/dataframe.py?l=929): in nlargest
    return DataFrame(block_ops.nlargest(self._block, n, column_ids, keep=keep))
[bigframes/core/block_transforms.py:548](https://cs.corp.google.com/piper///depot/google3/bigframes/core/block_transforms.py?l=548): in nlargest
    block, counter = block.apply_window_op(
[bigframes/core/blocks.py:768](https://cs.corp.google.com/piper///depot/google3/bigframes/core/blocks.py?l=768): in apply_window_op
    block = Block(
[bigframes/core/blocks.py:112](https://cs.corp.google.com/piper///depot/google3/bigframes/core/blocks.py?l=112): in __init__
    self._expr = self._normalize_expression(expr, self._index_columns)
[bigframes/core/blocks.py:1093](https://cs.corp.google.com/piper///depot/google3/bigframes/core/blocks.py?l=1093): in _normalize_expression
    col_id for col_id in expr.column_ids if col_id not in index_columns
[bigframes/core/__init__.py:82](https://cs.corp.google.com/piper///depot/google3/bigframes/core/__init__.py?l=82): in column_ids
    return self._compile_ordered().column_ids
[bigframes/core/__init__.py:95](https://cs.corp.google.com/piper///depot/google3/bigframes/core/__init__.py?l=95): in _compile_ordered
    return compiler.compile_ordered(self.node)
[bigframes/core/compile/compiler.py:33](https://cs.corp.google.com/piper///depot/google3/bigframes/core/compile/compiler.py?l=33): in compile_ordered
    return typing.cast(compiled.OrderedIR, compile_node(node, True))
[bigframes/core/compile/compiler.py:45](https://cs.corp.google.com/piper///depot/google3/bigframes/core/compile/compiler.py?l=45): in compile_node
    return _compile_node(node, ordered)
/[usr/local/lib/python3.11/functools.py:909](https://cs.corp.google.com/piper///depot/google3/usr/local/lib/python3.11/functools.py?l=909): in wrapper
    return dispatch(args[0].__class__)(*args, **kw)
[bigframes/core/compile/compiler.py:192](https://cs.corp.google.com/piper///depot/google3/bigframes/core/compile/compiler.py?l=192): in compile_window
    result = compile_ordered(node.child).project_window_op(
[bigframes/core/compile/compiled.py:868](https://cs.corp.google.com/piper///depot/google3/bigframes/core/compile/compiled.py?l=868): in project_window_op
    result = self._set_or_replace_by_id(output_name or column_name, window_op)
[bigframes/core/compile/compiled.py:1185](https://cs.corp.google.com/piper///depot/google3/bigframes/core/compile/compiled.py?l=1185): in _set_or_replace_by_id
    return builder.build()
[bigframes/core/compile/compiled.py:1344](https://cs.corp.google.com/piper///depot/google3/bigframes/core/compile/compiled.py?l=1344): in build
    return OrderedIR(
[bigframes/core/compile/compiled.py:632](https://cs.corp.google.com/piper///depot/google3/bigframes/core/compile/compiled.py?l=632): in __init__
    super().__init__(table, columns, predicates)
[bigframes/core/compile/compiled.py:65](https://cs.corp.google.com/piper///depot/google3/bigframes/core/compile/compiled.py?l=65): in __init__
    self._column_names = {column.get_name(): column for column in self._columns}
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

.0 = <tuple_iterator object at 0x7f117affa7d0>

>   self._column_names = {column.get_name(): column for column in self._columns}
E   TypeError: unhashable type: 'Deferred'

[bigframes/core/compile/compiled.py:65](https://cs.corp.google.com/piper///depot/google3/bigframes/core/compile/compiled.py?l=65): TypeError

I asked some friends over on the ibis project, and they shared that Deferred objects aren't bound to a table expression. Also, op() is hashable but the expression might not be. The "more future-proof way is to resolve the deferred through the table’s __getitem__" that is to say "table[deferred].op()"

tswast · 2023-11-14T22:43:12Z

Marking as Draft again until we have a greater need to upgrade.

…lease

tswast · 2023-11-28T18:42:53Z

qcut support currently blocked by what I believe to be a regression in ibis related to those Deferred objects: ibis-project/ibis#7631

Will investigate if there's a way to work around it. Seems we are ending up in this code path somehow: https://github.com/ibis-project/ibis/blob/d6a2f0922ea92b0e03a0431bf9011e7103565061/ibis/expr/types/generic.py#L760

kszucs · 2023-12-14T20:11:25Z

bigframes/core/compile/compiled.py

@@ -62,7 +63,16 @@ def __init__(
        self._columns = tuple(columns)
        # To allow for more efficient lookup by column name, create a
        # dictionary mapping names to column values.
-        self._column_names = {column.get_name(): column for column in self._columns}
+        self._column_names = {


Apparently you are adding support for Deferred objects. Currently we have a couple of objects interpretable as a value expression besides the expressions themselves, e.g. column names or selector in addition to deferred objects. The issue that ibis doesn't provide a clear, concise and backward compatible API function to bind those objects to a table. We have various protected functions and methods used in a hardly followable fashion.

Instead of providing a stable API for deferred, wouldn't it be better to have a stable function which convert any sort of ibis objects to an expression (given a table) if possible, something like ibis.bind(obj, table)?

Asking because we have something similar under preparation, but could be a good idea to have a public variant of that: https://github.com/ibis-project/ibis/pull/7580/files#diff-4400549b2eaa4ca3a3107f13fa78210e37277659acfe2a1eb4129cfbfaa66531R105-R134

Instead of providing a stable API for deferred, wouldn't it be better to have a stable function which convert any sort of ibis objects to an expression (given a table) if possible, something like ibis.bind(obj, table)?

Yes, we'd definitely use something like that here.

Honestly, I'm not 100% how we are ending up with deferred objects in the first place. It seems to be with how we're building some of our windowed functions, but in all of the cases I know of there is an underlying table. I think we may just be hitting this bug: ibis-project/ibis#7631

kszucs · 2023-12-14T20:13:14Z

bigframes/remote_function.py

@@ -516,17 +516,14 @@ def remote_function_node(
    """Creates an Ibis node representing a remote function call."""

    fields = {
-        name: rlz.value(type_) if type_ else rlz.any
+        name: rlz.ValueOf(None if type_ == "ANY TYPE" else type_)


Actually we are using almost the same code internally, possibly it could worth to expose it in a more developer friendly way after we understand the actual requirements. We can open an issue on our side to track this.

True. I was hoping to upstream this sooner, but now that I see https://ibis-project.org/reference/scalar-udfs#ibis.expr.operations.udf.scalar.builtin maybe I can migrate to that? I'll try that in a separate PR.

Started on this here: #277

I found a few issues already with UDF in the BQ backend. I'll get those documented soon.

kszucs · 2023-12-14T20:14:00Z

third_party/bigframes_vendored/ibis/expr/operations/analytic.py

-from ibis.expr.operations.analytic import Analytic
+import ibis.expr.datatypes as dt
+import ibis.expr.operations.analytic as ibis_ops_analytic
+import ibis.expr.operations.core as ibis_ops_core


I suggest to just simply use import ibis.expr.operations as ops since all op types are exposed there.

kszucs · 2023-12-14T20:14:39Z

third_party/bigframes_vendored/ibis/expr/operations/analytic.py

    """Retrieve the first element."""

-    arg = rlz.column(rlz.any)
-    output_dtype = rlz.dtype_like("arg")
+    arg: ibis_ops_core.Column[dt.Any]


You can just simply use ops.Column without the parameter, it means the same thing as ops.Column[dt.Any]

kszucs · 2023-12-14T20:17:50Z

third_party/bigframes_vendored/ibis/expr/operations/analytic.py



-class LastNonNullValue(Analytic):
+class LastNonNullValue(ibis_ops_analytic.Analytic):


Since we have been breaking several internal parts of the codebase (sorry for that), maintaining custom operations this way is probably inconvenient. I wanted to have a documented and stable way of creating custom nodes for a long time now, just didn't have enough usage examples which we could draft a desired API from. This might be another topic we should discuss in our issue tracker.

Good call. Filed ibis-project/ibis#7767

I don't even know that it needs to be a stable interface for custom ops. Maybe just an escape hatch of some sort for raw SQL that's just a Value, not a full table expression like the current Backend.sql() escape hatch. That said, such an escape hatch is going to look a lot like a custom op, since ideally we'd propagate type info somehow so that expressions can continue to be built on it.

…lease

deps: migrate to ibis-framework >= "7.0.0"

8929fef

This should unlock some bug fixes as well as potential `UNNEST` support in a future change.

product-auto-label bot added size: s Pull request size is small. api: bigquery Issues related to the googleapis/python-bigquery-dataframes API. labels Sep 22, 2023

tswast added 2 commits September 22, 2023 16:22

use dtype instead of output_dtype in custom ops

a4e995d

adjust type annotations

8fd8fd4

product-auto-label bot added size: m Pull request size is medium. and removed size: s Pull request size is small. labels Sep 22, 2023

tswast mentioned this pull request Sep 22, 2023

feat(bigquery): unnest redux ibis-project/ibis#7157

Merged

tswast added 6 commits October 13, 2023 16:27

Merge remote-tracking branch 'origin/main' into b301459557-ibis-prere…

ba3c5c4

…lease

Merge branch 'main' into b301459557-ibis-prerelease

f1ce176

Update noxfile.py

14fb7c8

Merge branch 'main' into b301459557-ibis-prerelease

ea8d3e5

Merge remote-tracking branch 'origin/main' into b301459557-ibis-prere…

191df69

…lease

update type annotations

b182453

tswast marked this pull request as ready for review November 13, 2023 22:32

tswast requested review from a team as code owners November 13, 2023 22:32

tswast requested a review from TrevorBergeron November 13, 2023 22:32

tswast marked this pull request as draft November 14, 2023 22:42

tswast added 2 commits November 27, 2023 20:04

Merge remote-tracking branch 'origin/main' into b301459557-ibis-prere…

174e735

…lease

fix for deferred values

0ab94cc

tswast marked this pull request as ready for review November 27, 2023 20:19

ashleyxuu approved these changes Nov 27, 2023

View reviewed changes

fix prerelease

4931923

TrevorBergeron approved these changes Nov 27, 2023

View reviewed changes

tswast added 3 commits November 27, 2023 20:42

minimum 7.1.0

9df0816

mypy

71f4889

revert presubmit changes

6c64ec5

tswast changed the title ~~deps: migrate to ibis-framework >= "7.0.0"~~ deps: migrate to ibis-framework >= "7.1.0" Nov 27, 2023

fix minimum sqlglot

e37571c

tswast mentioned this pull request Nov 28, 2023

feat: publicly export Deferred ibis-project/ibis#7613

Closed

1 task

fix custom op

fad36a8

parthea mentioned this pull request Dec 6, 2023

feat: Add support for Python 3.12 #231

Merged

kszucs reviewed Dec 14, 2023

View reviewed changes

tswast added 2 commits December 14, 2023 23:13

Merge remote-tracking branch 'origin/main' into b301459557-ibis-prere…

5770502

…lease

hack InMemoryTable formatter back in

c318b18

tswast mentioned this pull request Dec 14, 2023

feat: ibis.udf.analytic.builtin and ibis.udf.table.builtin to allow for more flexibility with engine-specific functions ibis-project/ibis#7767

Open

1 task

tswast added 3 commits December 15, 2023 16:08

use ops module to avoid breaking changes if ops move around

d3304b2

workaround nullscalar issue

33bd2e0

update usage of percent_rank to explicitly order by the value

c373dc0

product-auto-label bot added size: l Pull request size is large. and removed size: m Pull request size is medium. labels Dec 15, 2023

tswast added automerge Merge the pull request once unit tests and other checks pass. owlbot:run Add this label to trigger the Owlbot post processor. labels Dec 15, 2023

gcf-owl-bot bot removed the owlbot:run Add this label to trigger the Owlbot post processor. label Dec 15, 2023

tswast added 2 commits December 15, 2023 18:26

disable ibis prerelease tests for now

41364b5

fix unit_prerelease

7a8784c

tswast mentioned this pull request Dec 15, 2023

deps: update ibis to version 8.0.0 and refactor remote_function to use ibis UDF method #277

Merged

4 tasks

tswast merged commit 9798a2b into main Dec 15, 2023
15 checks passed

tswast deleted the b301459557-ibis-prerelease branch December 15, 2023 20:42

gcf-merge-on-green bot removed the automerge Merge the pull request once unit tests and other checks pass. label Dec 15, 2023

release-please bot mentioned this pull request Dec 15, 2023

chore(main): release 0.18.0 #279

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

deps: migrate to `ibis-framework >= "7.1.0"` #53

deps: migrate to `ibis-framework >= "7.1.0"` #53

tswast commented Sep 22, 2023

tswast commented Nov 14, 2023

tswast commented Nov 14, 2023

tswast commented Nov 28, 2023

kszucs Dec 14, 2023

tswast Dec 15, 2023

kszucs Dec 14, 2023

tswast Dec 15, 2023

tswast Dec 15, 2023

kszucs Dec 14, 2023

kszucs Dec 14, 2023

kszucs Dec 14, 2023

tswast Dec 14, 2023



		class LastNonNullValue(Analytic):
		class LastNonNullValue(ibis_ops_analytic.Analytic):

deps: migrate to ibis-framework >= "7.1.0" #53

deps: migrate to ibis-framework >= "7.1.0" #53

Conversation

tswast commented Sep 22, 2023

tswast commented Nov 14, 2023

tswast commented Nov 14, 2023

tswast commented Nov 28, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

deps: migrate to `ibis-framework >= "7.1.0"` #53

deps: migrate to `ibis-framework >= "7.1.0"` #53