Clean up groupby to use the anchor. #1610

ueshin · 2020-06-23T23:16:22Z

Now that Series has a different anchor from the one it is originally derived from if it is modified, we don't need to check Spark column's equality in groupby.

Also fixing a bug when it needs compute.ops_on_diff_frames but the first group key is from the same DataFrame but modified.

E.g.,

>>> ks.options.compute.ops_on_diff_frames = True
>>> kdf1 = ks.DataFrame({"C": [0.362, 0.227, 1.267, -0.562], "B": [1, 2, 3, 4]})
>>> kdf2 = ks.DataFrame({"A": [1, 1, 2, 2]})
>>> kdf1.groupby([kdf1.C + 1, kdf2.A]).sum()
Traceback (most recent call last):
...
AttributeError: 'NoneType' object has no attribute '_internal'

ueshin · 2020-06-24T01:35:19Z

Let me merge this now. Please feel free to comment on this.

HyukjinKwon · 2020-07-01T10:30:17Z

Sorry for late review @ueshin. LGTM!

Clean up groupby to use the anchor.

04f30dd

ueshin requested a review from HyukjinKwon June 23, 2020 23:16

Fix.

203a12a

ueshin merged commit 58d0212 into databricks:master Jun 24, 2020

ueshin deleted the groupby branch June 24, 2020 01:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clean up groupby to use the anchor. #1610

Clean up groupby to use the anchor. #1610

ueshin commented Jun 23, 2020 •

edited

Loading

ueshin commented Jun 24, 2020

HyukjinKwon commented Jul 1, 2020

Clean up groupby to use the anchor. #1610

Clean up groupby to use the anchor. #1610

Conversation

ueshin commented Jun 23, 2020 • edited Loading

ueshin commented Jun 24, 2020

HyukjinKwon commented Jul 1, 2020

ueshin commented Jun 23, 2020 •

edited

Loading