Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Segmentation fault with groupby/transform #46566

Closed
3 tasks done
ian-r-rose opened this issue Mar 29, 2022 · 4 comments · Fixed by #46585
Closed
3 tasks done

BUG: Segmentation fault with groupby/transform #46566

ian-r-rose opened this issue Mar 29, 2022 · 4 comments · Fixed by #46585
Assignees
Labels
Apply Apply, Aggregate, Transform, Map Bug Groupby Regression Functionality that used to work in a prior pandas version
Milestone

Comments

@ian-r-rose
Copy link

ian-r-rose commented Mar 29, 2022

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas

df = pandas.DataFrame(
    {
        "A": [1, 2, 3, 4, 5] * 4,
        "B": [1, 2, 3, 4, 5] * 4,
        "C": [1, 2, 3, 4, 5] * 4,
    }
)

df.groupby(["A", "B"]).transform(lambda x: x)

Issue Description

👋 Since about February 27, the above snippet has been generating a segmentation fault in pandas main. As far as I can tell, this is coming from get_group_index_sorter() in pandas.core.sorting.

Based on the timing and git history, it may be related to #45953, though I've been unable to identify the source of the problem thus far.

A few observations:

  1. The length of the series seem to matter. If I shorten the sample df to have length 15, things work fine.
  2. It seems to matter if I groupby more than one field (just grouping by "A" works fine)
  3. The segfault only happens for transform. If I use apply it works.

Expected Behavior

No segfault should occur.

Installed Versions

This shows up on pandas main.

Based on the nightly builds here, it seems like the first affected version was this one.

@ian-r-rose ian-r-rose added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Mar 29, 2022
@ian-r-rose
Copy link
Author

Update, I ran a git bisect, and it seems to come from #45953

@ian-r-rose ian-r-rose changed the title BUG: Segmentation fault on pandas nightly with groupby/transform BUG: Segmentation fault with groupby/transform Mar 29, 2022
@rhshadrach rhshadrach added Groupby Regression Functionality that used to work in a prior pandas version Apply Apply, Aggregate, Transform, Map and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Mar 30, 2022
@rhshadrach rhshadrach added this to the 1.5 milestone Mar 30, 2022
@rhshadrach
Copy link
Member

Thanks for the report! Confirmed on main, tagging as 1.5 to track.

@rhshadrach rhshadrach self-assigned this Mar 30, 2022
@rhshadrach
Copy link
Member

I believe the method result_ilocs is missing a call to compress_group_index. Will put up a PR to fix shortly.

@ian-r-rose
Copy link
Author

Thanks for the quick fix!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Apply Apply, Aggregate, Transform, Map Bug Groupby Regression Functionality that used to work in a prior pandas version
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants