Are we allowing iterating over GroupBy? #131

MarcoGorelli · 2023-04-04T10:14:01Z

For example in seaborn we see this:

https://github.com/mwaskom/seaborn/blob/5d9f37159bbd3ac44c8c8a06825583ba25648525/seaborn/_oldcore.py#L1253-L1261

            # Now actually update the matplotlib objects to do the conversion we want
            grouped = self.plot_data[var].groupby(self.converters[var], sort=False)
            for converter, seed_data in grouped:
                if self.var_types[var] == "categorical":
                    if self._var_ordered[var]:
                        order = self.var_levels[var]
                    else:
                        order = None
                    seed_data = categorical_order(seed_data, order)
                converter.update_units(seed_data)

Currently, both pandas and polars allow it: if you iterate of a GroupBy object, you get a tuple where the first element is the key and the second is the subset of the DataFrame corresponding to that key

The text was updated successfully, but these errors were encountered:

kkraus14 · 2023-04-05T03:23:54Z

I'm -1 on including this in a first revision of the standard. This is effectively to allow per group user defined functions and we've generally punted on user defined functions thus far.

This has also been a historical performance footgun in cuDF since you can often run into situations where you have a large number of small groups.

MarcoGorelli · 2023-04-05T10:34:55Z

ok sounds good

if seaborn wanted to use the standard, they'd have to do a pretty extensive refactor anyway, though that'd probably benefit them in the long-run anyway

jorisvandenbossche · 2023-04-05T11:57:10Z

if seaborn wanted to use the standard, they'd have to do a pretty extensive refactor, though that'd probably benefit them in the long-run anyway

How would the example snippet be done without groupby? Getting the unique values, and then iterating over those and in the loop each time filter the data with a mask for equal to that one unique value?

MarcoGorelli · 2023-04-05T12:47:21Z

Getting the unique values, and then iterating over those and in the loop each time filter the data with a mask for equal to that one unique value?

Yeah sounds fine

MarcoGorelli · 2023-09-25T18:50:37Z

Are we OK to just not do anything here? As in, leave it out of the Standard. then, if anyone wants to allow it, they're free to do so, but whatever, it won't be part of the Standard

just like how for the Array API, if people want to allow extra things that aren't in the Standard (like string dtypes in numpy), they're free to do so

rgommers · 2023-09-28T12:17:20Z

That seems reasonable to me.

MarcoGorelli · 2023-09-28T12:52:17Z

sure, let's do that then, thanks

MarcoGorelli mentioned this issue Apr 12, 2023

Column.unique #135

Closed

MarcoGorelli mentioned this issue Jul 11, 2023

Add DataFrame.unique_indices #194

Merged

MarcoGorelli closed this as completed Sep 28, 2023

MarcoGorelli mentioned this issue Oct 1, 2023

Separate eager and lazy APIs #249

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Are we allowing iterating over GroupBy? #131

Are we allowing iterating over GroupBy? #131

MarcoGorelli commented Apr 4, 2023

kkraus14 commented Apr 5, 2023

MarcoGorelli commented Apr 5, 2023 •

edited

Loading

jorisvandenbossche commented Apr 5, 2023

MarcoGorelli commented Apr 5, 2023

MarcoGorelli commented Sep 25, 2023

rgommers commented Sep 28, 2023

MarcoGorelli commented Sep 28, 2023

Are we allowing iterating over GroupBy? #131

Are we allowing iterating over GroupBy? #131

Comments

MarcoGorelli commented Apr 4, 2023

kkraus14 commented Apr 5, 2023

MarcoGorelli commented Apr 5, 2023 • edited Loading

jorisvandenbossche commented Apr 5, 2023

MarcoGorelli commented Apr 5, 2023

MarcoGorelli commented Sep 25, 2023

rgommers commented Sep 28, 2023

MarcoGorelli commented Sep 28, 2023

MarcoGorelli commented Apr 5, 2023 •

edited

Loading