-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Are we allowing iterating over GroupBy? #131
Comments
I'm -1 on including this in a first revision of the standard. This is effectively to allow per group user defined functions and we've generally punted on user defined functions thus far. This has also been a historical performance footgun in cuDF since you can often run into situations where you have a large number of small groups. |
ok sounds good if seaborn wanted to use the standard, they'd have to do a pretty extensive refactor anyway, though that'd probably benefit them in the long-run anyway |
How would the example snippet be done without groupby? Getting the unique values, and then iterating over those and in the loop each time filter the data with a mask for equal to that one unique value? |
Yeah sounds fine |
Are we OK to just not do anything here? As in, leave it out of the Standard. then, if anyone wants to allow it, they're free to do so, but whatever, it won't be part of the Standard just like how for the Array API, if people want to allow extra things that aren't in the Standard (like string dtypes in numpy), they're free to do so |
That seems reasonable to me. |
sure, let's do that then, thanks |
For example in seaborn we see this:
https://github.com/mwaskom/seaborn/blob/5d9f37159bbd3ac44c8c8a06825583ba25648525/seaborn/_oldcore.py#L1253-L1261
Currently, both pandas and polars allow it: if you iterate of a
GroupBy
object, you get a tuple where the first element is the key and the second is the subset of the DataFrame corresponding to that keyThe text was updated successfully, but these errors were encountered: