bug: maximum recursion depth with join operation #7124

nehanene15 · 2023-09-11T19:40:57Z

What happened?

We're seeing a RecursionError: maximum recursion depth exceeded while calling a Python object when running a JOIN: source_difference = source.join(differences, join_keys, how="outer")
Both 'source' and 'differences' are pandas.Table()s with many columns (~120).

We don't hit this error with smaller, less wide tables. I've provided a abridged version of the stack trace below - it does look like there is a cyclical portion of the code when testing if left and right tables have a common parent expr here.

Trying to understand if this is a Python limitation due to how wide the table is, or an Ibis bug. Appreciate the help!

What version of ibis are you using?

5.1.0

What backend(s) are you using, if any?

Pandas

Relevant log output

Traceback (most recent call last):
  File "/x/home/user/new_fl/dvt4/lib/python3.8/site-packages/ibis/common/grounds.py", line 210, in __cached_equals__
    result = self.__cache__[key]
  File "/x/home/user/new_fl/dvt4/lib/python3.8/site-packages/ibis/common/caching.py", line 46, in __getitem__
    value, _ = self._data[identifiers]
KeyError: (139856786236240, 139856788868000)

During handling of the above exception, another exception occurred:
...
During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/x/home/user/new_fl/dvt4/lib/python3.8/site-packages/ibis/common/grounds.py", line 210, in __cached_equals__
    result = self.__cache__[key]
  File "/x/home/user/new_fl/dvt4/lib/python3.8/site-packages/ibis/common/caching.py", line 46, in __getitem__
    value, _ = self._data[identifiers]
KeyError: (139856787188608, 139856789577136)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/x/home/user/new_fl/dvt4/bin/data-validation", line 11, in <module>
    load_entry_point('google-pso-data-validator==4.1.0', 'console_scripts', 'data-validation')()
  File "/x/home/user/new_fl/dvt4/lib/python3.8/site-packages/data_validation/__main__.py", line 581, in main
    run_validation_configs(args)
  File "/x/home/user/new_fl/dvt4/lib/python3.8/site-packages/data_validation/__main__.py", line 551, in run_validation_configs
    config_runner(args)
  File "/x/home/user/new_fl/dvt4/lib/python3.8/site-packages/data_validation/__main__.py", line 304, in config_runner
    run_validations(args, config_managers)
  File "/x/home/user/new_fl/dvt4/lib/python3.8/site-packages/data_validation/__main__.py", line 478, in run_validations
    run_validation(
  File "/x/home/user/new_fl/dvt4/lib/python3.8/site-packages/data_validation/__main__.py", line 461, in run_validation
    validator.execute()
  File "/x/home/user/new_fl/dvt4/lib/python3.8/site-packages/data_validation/data_validation.py", line 96, in execute
    result_df = self._execute_validation(
  File "/x/home/user/new_fl/dvt4/lib/python3.8/site-packages/data_validation/data_validation.py", line 314, in _execute_validation
    result_df = combiner.generate_report(
  File "/x/home/user/new_fl/dvt4/lib/python3.8/site-packages/data_validation/combiner.py", line 83, in generate_report
    joined = _join_pivots(source_pivot, target_pivot, differences_pivot, join_on_fields)
  File "/x/home/user/new_fl/dvt4/lib/python3.8/site-packages/data_validation/combiner.py", line 317, in _join_pivots
    source_difference = source.join(differences, join_keys, how="outer")[
  File "/x/home/user/new_fl/dvt4/lib/python3.8/site-packages/ibis/expr/types/relations.py", line 2497, in join
    expr = klass(left, right, predicates).to_expr()
  File "/x/home/user/new_fl/dvt4/lib/python3.8/site-packages/ibis/common/grounds.py", line 25, in __call__
    return cls.__create__(*args, **kwargs)
  File "/x/home/user/new_fl/dvt4/lib/python3.8/site-packages/ibis/common/grounds.py", line 99, in __create__
    return super().__create__(**kwargs)
  File "/x/home/user/new_fl/dvt4/lib/python3.8/site-packages/ibis/common/grounds.py", line 33, in __create__
    return type.__call__(cls, *args, **kwargs)
  File "/x/home/user/new_fl/dvt4/lib/python3.8/site-packages/ibis/expr/operations/relations.py", line 178, in __init__
    if left.equals(right):
  File "/x/home/user/new_fl/dvt4/lib/python3.8/site-packages/ibis/expr/operations/core.py", line 24, in equals
    return self.__cached_equals__(other)
  File "/x/home/user/new_fl/dvt4/lib/python3.8/site-packages/ibis/common/grounds.py", line 212, in __cached_equals__
    result = self.__equals__(other)
  File "/x/home/user/new_fl/dvt4/lib/python3.8/site-packages/ibis/common/grounds.py", line 239, in __equals__
    return self.__args__ == other.__args__
  File "/x/home/user/new_fl/dvt4/lib/python3.8/site-packages/ibis/common/grounds.py", line 187, in __eq__
    return self.__cached_equals__(other)
  File "/x/home/user/new_fl/dvt4/lib/python3.8/site-packages/ibis/common/grounds.py", line 212, in __cached_equals__
    result = self.__equals__(other)
  File "/x/home/user/new_fl/dvt4/lib/python3.8/site-packages/ibis/common/grounds.py", line 239, in __equals__
    return self.__args__ == other.__args__
  File "/x/home/user/new_fl/dvt4/lib/python3.8/site-packages/ibis/common/grounds.py", line 187, in __eq__
    return self.__cached_equals__(other)
  File "/x/home/user/new_fl/dvt4/lib/python3.8/site-packages/ibis/common/grounds.py", line 212, in __cached_equals__
    result = self.__equals__(other)
  File "/x/home/user/new_fl/dvt4/lib/python3.8/site-packages/ibis/common/grounds.py", line 239, in __equals__
    return self.__args__ == other.__args__
  File "/x/home/user/new_fl/dvt4/lib/python3.8/site-packages/ibis/common/grounds.py", line 187, in __eq__
    return self.__cached_equals__(other)
File "/x/home/user/new_fl/dvt4/lib/python3.8/site-packages/ibis/common/grounds.py", line 210, in __cached_equals__
    result = self.__cache__[key]
  File "/x/home/user/new_fl/dvt4/lib/python3.8/site-packages/ibis/common/caching.py", line 45, in __getitem__
    identifiers = tuple(id(item) for item in key)
RecursionError: maximum recursion depth exceeded while calling a Python object

Code of Conduct

I agree to follow this project's Code of Conduct

The text was updated successfully, but these errors were encountered:

cpcloud · 2023-09-11T19:49:36Z

Thanks for the report!

Can you show the code that produces the exception?

It'll be easier to write a regression test if we can get the exact case that raises the exception.

nehanene15 · 2023-09-11T19:54:38Z

This is the line that generates the exception: https://github.com/GoogleCloudPlatform/professional-services-data-validator/blob/47154b4139bf22358359c53fe25bbce44745589f/data_validation/combiner.py#L321

And this is where Pandas tables are instantiated before it gets to the combiner.py: https://github.com/GoogleCloudPlatform/professional-services-data-validator/blob/47154b4139bf22358359c53fe25bbce44745589f/data_validation/data_validation.py#L339

For context, the source and target are SQL query results and the combiner.generate_report() aims to find the differences between the two results, if any, for data validation.

cpcloud · 2023-09-11T20:03:32Z

Any chance you could dump a parquet file of the left and right tables somewhere?

nehanene15 · 2023-09-11T21:03:09Z

GitHub won't allow me to upload Parquet, but I'll upload the CSVs of the source and differences. We're running source_difference = source.join(differences, join_keys, how="outer")

source_pivot.csv
differences_pivot.csv

kszucs · 2023-09-11T21:28:08Z

Hi @nehanene15!

Could you please try out with the following patch applied (I assume you are using ibis 6.2):

diff --git a/ibis/common/grounds.py b/ibis/common/grounds.py
index 394bc4ccf..6f5bcd184 100644
--- a/ibis/common/grounds.py
+++ b/ibis/common/grounds.py
@@ -203,6 +203,8 @@ class Comparable(Base):
         if type(self) is not type(other):
             return False

+        return self.__equals__(other)
+
         # reduce space required for commutative operation
         if id(self) < id(other):
             key = (self, other)

This way we turn of an optimization which maintains a global cache for operation node equality checks. If it keeps failing we could get a clearer traceback.

kszucs · 2023-09-11T21:41:28Z

Another option would be to pickle the left and right arguments in case of a recursion error using the following snippet:

diff --git a/ibis/expr/operations/relations.py b/ibis/expr/operations/relations.py
index c444a7d88..a12ded3b0 100644
--- a/ibis/expr/operations/relations.py
+++ b/ibis/expr/operations/relations.py
@@ -220,6 +220,15 @@ class Join(TableNode):
             for pred in util.promote_list(predicates)
         ]

+        try:
+            left.equals(right)
+        except RecursionError:
+            import pickle
+            with open('left.pickle', 'wb') as fp:
+                pickle.dump(left, fp)
+            with open('right.pickle', 'wb') as fp:
+                pickle.dump(right, fp)
+
         if left.equals(right):
             # GH #667: If left and right table have a common parent expression,
             # e.g. they have different filters, we need to add a self-reference

Then I could try to load the two objects to reproduce the error.

cpcloud · 2023-09-12T14:03:34Z

@nehanene15 Can you show the value of join_keys?

cpcloud · 2023-09-12T14:06:55Z

@kszucs The bug report says @nehanene15 is using 5.1.0, if that helps debug at all.

I am looking into it as well, to see if it may have been fixed already in master.

cpcloud · 2023-09-12T14:15:13Z

I am not able to get the following test to fail on 5.1.0 or master:

def test_large_join():
    source = pd.read_csv(
        "https://github.com/ibis-project/ibis/files/12580336/source_pivot.csv",
        index_col=0,
    )
    diffs = pd.read_csv(
        "https://github.com/ibis-project/ibis/files/12580340/differences_pivot.csv",
        index_col=0,
    )
    con = ibis.pandas.connect({"source": source, "diffs": diffs})
    source = con.tables.source
    diffs = con.tables.diffs

    join_keys = set(source.columns) & set(diffs.columns)
    join = source.join(diffs, join_keys, how="outer").select(
        [source[key] for key in join_keys]
        + [
            source["validation_type"],
            source["aggregation_type"],
            source["table_name"],
            source["column_name"],
            source["primary_keys"],
            source["num_random_rows"],
            source["agg_value"],
            diffs["difference"],
            diffs["pct_difference"],
            diffs["pct_threshold"],
            diffs["validation_status"],
        ],
    )
    df = join.execute()
    assert not df.empty

@nehanene15 Any ideas?

nehanene15 · 2023-09-12T14:24:48Z

Hmm.. the join_keys value is ('validation_name',).

@kszucs When I try the patch, I get a similar cyclical error:

  File "/Users/nehanene/Projects/professional-services-data-validator/env/lib/python3.8/site-packages/ibis/common/grounds.py", line 203, in __cached_equals__
    return self.__equals__(other)
  File "/Users/nehanene/Projects/professional-services-data-validator/env/lib/python3.8/site-packages/ibis/common/grounds.py", line 241, in __equals__
    return self.__args__ == other.__args__
  File "/Users/nehanene/Projects/professional-services-data-validator/env/lib/python3.8/site-packages/ibis/common/grounds.py", line 187, in __eq__
    return self.__cached_equals__(other)
  File "/Users/nehanene/Projects/professional-services-data-validator/env/lib/python3.8/site-packages/ibis/common/grounds.py", line 203, in __cached_equals__
    return self.__equals__(other)
  File "/Users/nehanene/Projects/professional-services-data-validator/env/lib/python3.8/site-packages/ibis/common/grounds.py", line 241, in __equals__
    return self.__args__ == other.__args__
  File "/Users/nehanene/Projects/professional-services-data-validator/env/lib/python3.8/site-packages/ibis/common/grounds.py", line 187, in __eq__
    return self.__cached_equals__(other)
  File "/Users/nehanene/Projects/professional-services-data-validator/env/lib/python3.8/site-packages/ibis/common/grounds.py", line 203, in __cached_equals__
    return self.__equals__(other)
  File "/Users/nehanene/Projects/professional-services-data-validator/env/lib/python3.8/site-packages/ibis/common/grounds.py", line 241, in __equals__
    return self.__args__ == other.__args__
RecursionError: maximum recursion depth exceeded in comparison

nehanene15 · 2023-09-12T14:39:54Z

When I print(source), I get a large Ibis expression like below which might be the issue. I might need to try executing the expression before doing the source.join()

r356 := Selection[r355]
  selections:
    validation_name:  r355.validation_name
    validation_type:  r355.validation_type
    aggregation_type: r355.aggregation_type
    table_name:       r355.table_name
    column_name:      r355.column_name
    primary_keys:     r355.primary_keys
    num_random_rows:  r355.num_random_rows
    agg_value:        r355.agg_value

r357 := Union[r356, r10, distinct=False]

r358 := Selection[r357]
  selections:
    validation_name:  r357.validation_name
    validation_type:  r357.validation_type
    aggregation_type: r357.aggregation_type
    table_name:       r357.table_name
    column_name:      r357.column_name
    primary_keys:     r357.primary_keys
    num_random_rows:  r357.num_random_rows
    agg_value:        r357.agg_value

r359 := Union[r358, r9, distinct=False]

r360 := Selection[r359]
  selections:
    validation_name:  r359.validation_name
    validation_type:  r359.validation_type
    aggregation_type: r359.aggregation_type
    table_name:       r359.table_name
    column_name:      r359.column_name
    primary_keys:     r359.primary_keys
    num_random_rows:  r359.num_random_rows
    agg_value:        r359.agg_value

r361 := Union[r360, r8, distinct=False]

r362 := Selection[r361]
  selections:
    validation_name:  r361.validation_name
    validation_type:  r361.validation_type
    aggregation_type: r361.aggregation_type
    table_name:       r361.table_name
    column_name:      r361.column_name
    primary_keys:     r361.primary_keys
    num_random_rows:  r361.num_random_rows
    agg_value:        r361.agg_value

r363 := Union[r362, r7, distinct=False]

r364 := Selection[r363]
  selections:
    validation_name:  r363.validation_name
    validation_type:  r363.validation_type
    aggregation_type: r363.aggregation_type
    table_name:       r363.table_name
    column_name:      r363.column_name
    primary_keys:     r363.primary_keys
    num_random_rows:  r363.num_random_rows
    agg_value:        r363.agg_value

r365 := Union[r364, r6, distinct=False]

r366 := Selection[r365]
  selections:
    validation_name:  r365.validation_name
    validation_type:  r365.validation_type
    aggregation_type: r365.aggregation_type
    table_name:       r365.table_name
    column_name:      r365.column_name
    primary_keys:     r365.primary_keys
    num_random_rows:  r365.num_random_rows
    agg_value:        r365.agg_value

r367 := Union[r366, r5, distinct=False]

r368 := Selection[r367]
  selections:
    validation_name:  r367.validation_name
    validation_type:  r367.validation_type
    aggregation_type: r367.aggregation_type
    table_name:       r367.table_name
    column_name:      r367.column_name
    primary_keys:     r367.primary_keys
    num_random_rows:  r367.num_random_rows
    agg_value:        r367.agg_value

r369 := Union[r368, r4, distinct=False]

Selection[r369]
  selections:
    validation_name:  r369.validation_name
    validation_type:  r369.validation_type
    aggregation_type: r369.aggregation_type
    table_name:       r369.table_name
    column_name:      r369.column_name
    primary_keys:     r369.primary_keys
    num_random_rows:  r369.num_random_rows
    agg_value:        r369.agg_value

cpcloud · 2023-09-12T14:45:20Z

Ah, yeah it looks like there's around 370 tables in the mix there. There's nothing in principle preventing that, but it seems like it's related.

If you can pickle the unbound expression and dump that somewhere then we can probably reproduce it.

In the meantime, I will try to construct a big union of tables to see if I can reproduce this.

cpcloud · 2023-09-12T14:49:48Z

Ok, I can reproduce it with this

def test_large_join():
    source = pd.read_csv(
        "https://github.com/ibis-project/ibis/files/12580336/source_pivot.csv",
        index_col=0,
    )
    diffs = pd.read_csv(
        "https://github.com/ibis-project/ibis/files/12580340/differences_pivot.csv",
        index_col=0,
    )
    con = ibis.pandas.connect({"source": source, "diffs": diffs})
    n = 200
    source = ibis.union(*[con.tables.source for _ in range(n)])
    diffs = ibis.union(*[con.tables.diffs for _ in range(n)])

    join_keys = set(source.columns) & set(diffs.columns)
    join = source.join(diffs, join_keys, how="outer").select(
        [source[key] for key in join_keys]
        + [
            source["validation_type"],
            source["aggregation_type"],
            source["table_name"],
            source["column_name"],
            source["primary_keys"],
            source["num_random_rows"],
            source["agg_value"],
            diffs["difference"],
            diffs["pct_difference"],
            diffs["pct_threshold"],
            diffs["validation_status"],
        ],
    )
    df = join.execute()
    assert not df.empty

nehanene15 · 2023-09-12T14:52:55Z

It works if I execute the large expr before doing the join!

In this case differences_pivot and source_pivot are the large expressions with around 495 tables in the mix.
Working code:

differences_df = client.execute(differences_pivot)
source_df = client.execute(source_pivot)

con = ibis.pandas.connect({"source": source, "differences": differences, "target": target})
source = con.tables.source
differences = con.tables.differences

source_difference = source.join(differences, join_keys, how="outer")[
        [source[field] for field in join_keys]
        + [
            source["validation_type"],
            source["aggregation_type"],
            source["table_name"],
            source["column_name"],
            source["primary_keys"],
            source["num_random_rows"],
            source["agg_value"],
            differences["difference"],
            differences["pct_difference"],
            differences["pct_threshold"],
            differences["validation_status"],
        ]
    ]

cpcloud · 2023-09-12T14:52:58Z

It's failing for the same reason in that test, but at a slightly different location (the execute call)

nehanene15 · 2023-09-12T14:56:56Z

I see. Seems like it's best practice to execute the Ibis expr beforehand to avoid the 300+ table union/join so I'll update our code to reflect that if you agree.

cpcloud · 2023-09-12T14:57:48Z

@kszucs I suspect we can construct a failing example without joins.

I suspect that there may be some propertys we should turn into attributes in a few places, to avoid huge traversals, for example the schema attribute of Unions.

We can probably also avoid storing a huge tree for set operations

cpcloud · 2023-09-12T14:59:38Z

@nehanene15 I think for your case it's a viable workaround, but I don't think it's best practice 😄, I think it's a bug in ibis that we will try to address.

kszucs · 2023-09-12T16:24:34Z

I suspect that there may be some propertys we should turn into attributes in a few places, to avoid huge traversals, for example the schema attribute of Unions.

I was thinking of the same, I'm not sure how could we prevent call stacks like this, but we can certainly "delay" their occurrence.

cpcloud · 2023-09-12T16:59:22Z

Another thing that may help decrease call stack size is changing the representation of SetOp to be variadic.

cpcloud · 2023-09-13T21:45:50Z

@nehanene15 Can you try your code against #7148? That should give you some breathing room for huge unions, though see the PR description (points 3 and 4) that might explain any new issues that look similar 😅

nehanene15 · 2023-09-14T20:00:06Z

@cpcloud This definitely gives more wiggle room in addition to executing the ibis expr before the joins.

cpcloud · 2023-09-15T14:31:36Z

@nehanene15 You should have plenty of room for those big unions now :)

If anything else pops up don't hesitate to open another issue!

nehanene15 added the bug Incorrect behavior inside of ibis label Sep 11, 2023

cpcloud mentioned this issue Sep 13, 2023

perf(ops): allow larger table sets in expressions #7148

Merged

cpcloud closed this as completed in #7148 Sep 15, 2023

nehanene15 mentioned this issue Feb 6, 2024

fix: Increase upper limit on number of columns that can be validated GoogleCloudPlatform/professional-services-data-validator#1090

Merged

cpcloud mentioned this issue Feb 12, 2024

Large unions cannot be constructed tobymao/sqlglot#2961

Closed

jitingxu1 mentioned this issue May 14, 2024

perf(api): rewrite union and intersection construction to support more operands #9194

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bug: maximum recursion depth with join operation #7124

bug: maximum recursion depth with join operation #7124

nehanene15 commented Sep 11, 2023

cpcloud commented Sep 11, 2023

nehanene15 commented Sep 11, 2023

cpcloud commented Sep 11, 2023

nehanene15 commented Sep 11, 2023

kszucs commented Sep 11, 2023

kszucs commented Sep 11, 2023

cpcloud commented Sep 12, 2023

cpcloud commented Sep 12, 2023

cpcloud commented Sep 12, 2023 •

edited

Loading

nehanene15 commented Sep 12, 2023

nehanene15 commented Sep 12, 2023

cpcloud commented Sep 12, 2023

cpcloud commented Sep 12, 2023

nehanene15 commented Sep 12, 2023

cpcloud commented Sep 12, 2023

nehanene15 commented Sep 12, 2023

cpcloud commented Sep 12, 2023

cpcloud commented Sep 12, 2023

kszucs commented Sep 12, 2023

cpcloud commented Sep 12, 2023

cpcloud commented Sep 13, 2023 •

edited

Loading

nehanene15 commented Sep 14, 2023

cpcloud commented Sep 15, 2023

bug: maximum recursion depth with join operation #7124

bug: maximum recursion depth with join operation #7124

Comments

nehanene15 commented Sep 11, 2023

What happened?

What version of ibis are you using?

What backend(s) are you using, if any?

Relevant log output

Code of Conduct

cpcloud commented Sep 11, 2023

nehanene15 commented Sep 11, 2023

cpcloud commented Sep 11, 2023

nehanene15 commented Sep 11, 2023

kszucs commented Sep 11, 2023

kszucs commented Sep 11, 2023

cpcloud commented Sep 12, 2023

cpcloud commented Sep 12, 2023

cpcloud commented Sep 12, 2023 • edited Loading

nehanene15 commented Sep 12, 2023

nehanene15 commented Sep 12, 2023

cpcloud commented Sep 12, 2023

cpcloud commented Sep 12, 2023

nehanene15 commented Sep 12, 2023

cpcloud commented Sep 12, 2023

nehanene15 commented Sep 12, 2023

cpcloud commented Sep 12, 2023

cpcloud commented Sep 12, 2023

kszucs commented Sep 12, 2023

cpcloud commented Sep 12, 2023

cpcloud commented Sep 13, 2023 • edited Loading

nehanene15 commented Sep 14, 2023

cpcloud commented Sep 15, 2023

cpcloud commented Sep 12, 2023 •

edited

Loading

cpcloud commented Sep 13, 2023 •

edited

Loading