feat: Allow join-free alignment of analytic expressions #1168

TrevorBergeron · 2024-11-22T00:57:59Z

Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:

Make sure to open an issue as a bug/issue before writing your code! That way we can discuss the change, evaluate designs, and agree on the general idea
Ensure the tests and linter pass
Code coverage does not decrease (if any source code was changed)
Appropriate docs were updated (if necessary)

Fixes #<issue_number_goes_here> 🦕

tswast · 2024-11-22T15:34:44Z

bigframes/core/nodes.py

+    def replace_child(
+        self, new_child: BigFrameNode, validate: bool = False
+    ) -> UnaryNode:
+        new_self = replace(self, child=new_child)  # type: ignore


I had to open up the file to figure out this is from dataclasses. Per https://google.github.io/styleguide/pyguide.html#s2.2-imports please import dataclasses, not individual functions / classes from it. dataclasses is not one of the allowed exceptions.

fixed this import

tswast · 2024-11-22T15:36:31Z

bigframes/core/nodes.py

+        self, new_child: BigFrameNode, validate: bool = False
+    ) -> UnaryNode:
+        new_self = replace(self, child=new_child)  # type: ignore
+        if validate:


I'm curious, in what cases is it necessary to have a validate argument instead of a public validate() method? If it's for method chaining, we could have validate() return self.

eh, I think I"ll remove that, its not being used, and yeah, could run validations it as a separate invocation

tswast · 2024-11-22T15:37:26Z

bigframes/core/__init__.py

@@ -449,14 +449,40 @@ def relational_join(
        )
        return ArrayValue(join_node), (l_mapping, r_mapping)

-    def try_align_as_projection(
+    def try_new_row_join(


Maybe just try_row_join since we plan on this being the only one early next year?

tswast · 2024-11-22T15:38:40Z

bigframes/core/__init__.py

-                genned_ids.append(attempted_id)
-            i = i + 1
-        return genned_ids
+        return [ids.ColumnId.unique().name for _ in range(n)]


Does this make the column IDs less deterministic than the previous logic?

Not any less deterministic than we are now. If we want an isomorphism between query structure and output syntax, there is a bit more work that needs to be done to the system, which I think basically amounts to late binding identifiers serially through the tree.

tswast · 2024-11-22T15:40:30Z

bigframes/core/blocks.py

@@ -2693,7 +2697,35 @@ def is_uniquely_named(self: BlockIndexProperties):
        return len(set(self.names)) == len(self.names)


-def try_row_join(
+def try_new_row_join(


Ditto, re try_row_join here, but perhaps in a separate PR since I see that it would make it harder to identify the one's that should be replaced with try_legacy_row_join if we did that.

tswast · 2024-11-22T16:13:50Z

bigframes/core/node_align.py

+) -> Optional[bigframes.core.nodes.BigFrameNode]:
+    """Joins the two nodes"""
+    if l_node.projection_base != r_node.projection_base:
+        return None


Since there are return Nones should the be try_join_as_projection?

tswast · 2024-11-22T16:14:26Z

bigframes/core/node_align.py

+        return None
+    # check join key
+    for l_key, r_key in join_keys:
+        # Caller is block, so they still work with raw strings rather than ids


Should we fix that? e.g. make block use ColumnId?

Yes, we should do that refactor, but its a bit of an undertaking. For now, block uses string ids, and ArrayValue is responsible for wrapping them up as ColumnIds.

tswast · 2024-11-22T16:17:30Z

bigframes/core/node_align.py

+import bigframes.core.window_spec
+import bigframes.operations.aggregations
+
+ADDITIVE_NODES = (


Would be helpful to have some comments about what these node types have in common. How would I decide if a node type should be added to this list?

added comment explaining

tswast · 2024-11-22T16:18:58Z

bigframes/core/node_align.py

+
+
+def pull_up_selection(
+    node: bigframes.core.nodes.BigFrameNode, rename_vars: bool = False


What's the purpose of the rename_vars parameter? A docstring would be helpful.

added docstring. its intended to make sure that when we combine columns from two sides, we don't get conflicts.

tswast · 2024-11-22T16:22:27Z

bigframes/core/node_align.py

+            (bigframes.core.expression.DerefOp(field.id), field.id)
+            for field in node.fields
+        )
+    assert isinstance(node, (bigframes.core.nodes.SelectionNode, *ADDITIVE_NODES))


Am I remembering correctly that SelectionNode represents a WHERE clause, based on Selection from relational algebra terminology?

Given the high-level that implicit joiner no longer handles these, I'm a little confused why we need to rewrite these? Is it so that they are all consolidated so that the nodes we can combine appear together in the tree? I would appreciate a bit more info.

Its actually poorly named. SelectionNode just renames/reorders existing columns, without any scalar transforms

feat: Allow join-free alignment of analytic expressions

4495c7f

TrevorBergeron requested a review from tswast November 22, 2024 00:57

TrevorBergeron requested review from a team as code owners November 22, 2024 00:58

product-auto-label bot added size: l Pull request size is large. api: bigquery Issues related to the googleapis/python-bigquery-dataframes API. labels Nov 22, 2024

blunderbuss-gcf bot assigned sycai Nov 22, 2024

tswast reviewed Nov 22, 2024

View reviewed changes

TrevorBergeron added 2 commits November 22, 2024 22:14

address pr comments

8db8520

fix bugs in pull_up_selection

5ae20db

product-auto-label bot added size: xl Pull request size is extra large. and removed size: l Pull request size is large. labels Nov 23, 2024

TrevorBergeron added 2 commits November 23, 2024 00:26

fix unit test and remove validations

4290229

fix test failures

ba077ee

TrevorBergeron requested a review from tswast November 25, 2024 21:44

Merge branch 'main' into align_analytic

def5b25

tswast approved these changes Nov 26, 2024

View reviewed changes

tswast merged commit daef4f0 into main Nov 26, 2024
23 checks passed

tswast deleted the align_analytic branch November 26, 2024 23:13

release-please bot mentioned this pull request Nov 26, 2024

chore(main): release 1.28.0 #1159

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Allow join-free alignment of analytic expressions #1168

feat: Allow join-free alignment of analytic expressions #1168

TrevorBergeron commented Nov 22, 2024

tswast Nov 22, 2024

TrevorBergeron Nov 23, 2024

tswast Nov 22, 2024

TrevorBergeron Nov 22, 2024

tswast Nov 22, 2024

TrevorBergeron Nov 23, 2024

tswast Nov 22, 2024

TrevorBergeron Nov 22, 2024

tswast Nov 22, 2024

tswast Nov 22, 2024

tswast Nov 22, 2024

TrevorBergeron Nov 22, 2024

tswast Nov 22, 2024

TrevorBergeron Nov 23, 2024

tswast Nov 22, 2024

TrevorBergeron Nov 22, 2024

tswast Nov 22, 2024

TrevorBergeron Nov 22, 2024



		def pull_up_selection(
		node: bigframes.core.nodes.BigFrameNode, rename_vars: bool = False

feat: Allow join-free alignment of analytic expressions #1168

feat: Allow join-free alignment of analytic expressions #1168

Conversation

TrevorBergeron commented Nov 22, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment