Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clean up groupby to use the anchor. #1610

Merged
merged 2 commits into from
Jun 24, 2020
Merged

Conversation

ueshin
Copy link
Collaborator

@ueshin ueshin commented Jun 23, 2020

Now that Series has a different anchor from the one it is originally derived from if it is modified, we don't need to check Spark column's equality in groupby.

Also fixing a bug when it needs compute.ops_on_diff_frames but the first group key is from the same DataFrame but modified.

E.g.,

>>> ks.options.compute.ops_on_diff_frames = True
>>> kdf1 = ks.DataFrame({"C": [0.362, 0.227, 1.267, -0.562], "B": [1, 2, 3, 4]})
>>> kdf2 = ks.DataFrame({"A": [1, 1, 2, 2]})
>>> kdf1.groupby([kdf1.C + 1, kdf2.A]).sum()
Traceback (most recent call last):
...
AttributeError: 'NoneType' object has no attribute '_internal'

@ueshin ueshin requested a review from HyukjinKwon June 23, 2020 23:16
@ueshin
Copy link
Collaborator Author

ueshin commented Jun 24, 2020

Let me merge this now. Please feel free to comment on this.

@ueshin ueshin merged commit 58d0212 into databricks:master Jun 24, 2020
@ueshin ueshin deleted the groupby branch June 24, 2020 01:35
@HyukjinKwon
Copy link
Member

Sorry for late review @ueshin. LGTM!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants