Introduce Grouper objects internally #7561

dcherian · 2023-02-27T03:11:36Z

Builds on the refactoring in #7206

xref Update GroupBy constructor for grouping by multiple variables, dask arrays #6610
Use TimeResampleGrouper

Upstream bug pandas-dev/pandas#12813 is fixed

This reverts commit 2a36e21a031b9e061b932682758551956f3f06d2.

dcherian · 2023-03-30T04:13:10Z

@Illviljan could use some typing help here if you have the time :)

Illviljan

I'll look into it more later.

xarray/core/groupby.py

Illviljan · 2023-04-05T11:47:34Z

I'm not sure about the rest of the errors, @dcherian. Maybe IndexVariable needs to use the DataWithCoords mixin?

xarray/core/groupby.py:577: error: Value of type variable "DataAlignable" of "align" cannot be "Union[DataArray, IndexVariable]"  [type-var]
xarray/core/groupby.py:577: error: Value of type variable "DataAlignable" of "align" cannot be "Union[Dataset, DataArray, IndexVariable]"  [type-var]
xarray/tests/test_groupby.py:55: error: List item 1 has incompatible type "int"; expected "slice"  [list-item]

xarray/xarray/core/alignment.py

Lines 581 to 588 in d4db166

    
           def align( 
        
               *objects: DataAlignable, 
        
               join: JoinOptions = "inner", 
        
               copy: bool = True, 
        
               indexes=None, 
        
               exclude=frozenset(), 
        
               fill_value=dtypes.NA, 
        
           ) -> tuple[DataAlignable, ...]:

xarray/xarray/core/alignment.py

Line 31 in d4db166

DataAlignable = TypeVar("DataAlignable", bound=DataWithCoords)

xarray/xarray/core/common.py

Lines 376 to 377 in d4db166

    
           class DataWithCoords(AttrAccessMixin): 
        
               """Shared base class for Dataset and DataArray."""

dcherian · 2023-04-05T15:30:16Z

Variables don't have coordinates so that won't work.

mypy is correct here, it's a bug and we don't test for grouping by index variables. A commit reverting to the old len check would be great here, if you have the time.

It's not clear to me why we allow this actually. Seems like .groupby("DIMENSION") solves that use-case.

* main: (34 commits) Update whats-new.rst Fix binning by unsorted array (pydata#7762) Bump codecov/codecov-action from 3.1.1 to 3.1.2 (pydata#7760) Fix typing errors using mypy 1.2 (pydata#7752) [skip-ci] dev whats-new Add whats-new for v2023.04.0 (pydata#7757) remove the `black` hook (pydata#7756) reword the what's new entry for the `pandas` 2.0 dtype changes (pydata#7755) restructure the contributing guide (pydata#7681) Continue to use nanosecond-precision Timestamps in precision-sensitive areas (pydata#7731) minor doc updates to clarify extensions using accessors (pydata#7751) align: Avoid reindexing when join="exact" (pydata#7736) `pandas=2.0` support (pydata#7724) Clarify vectorized indexing documentation (pydata#7747) Avoid recasting a CFTimeIndex (pydata#7735) fix typo (pydata#7746) [pre-commit.ci] pre-commit autoupdate (pydata#7745) Bump pypa/gh-action-pypi-publish from 1.8.4 to 1.8.5 (pydata#7743) preserve boolean dtype in encoding (pydata#7720) [skip-ci] Add alignment benchmarks (pydata#7738) ...

* main: Bump codecov/codecov-action from 3.1.2 to 3.1.3 (pydata#7781) Fix whats-new [skip-ci] dev whats-new (pydata#7775) [skip-ci] Release 2023.04.2 (pydata#7774) Fix groupby_bins when labels are specified (pydata#7769) Docstrings examples for string methods (pydata#7669) Add dev whats-new Add benchmark against latest release on main. (pydata#7753)

dcherian · 2023-04-26T16:00:32Z

I'd like to merge this soon. It's an internal refactor with no public API changes.

I think we can expose the Grouper objects publicly in a new PR

Illviljan

I added some TODOs I've been thinking about, no stoppers.

xarray/core/groupby.py

Illviljan · 2023-04-26T16:10:29Z

xarray/core/groupby.py

+        return len(self)
+
+    def __len__(self) -> int:
+        return len(self.full_index)  # TODO: full_index not def, abstractmethod?


This will crash if .factorize hasn't been triggered before.

Yes but it wouldn't make sense without it, and this class is internal-only. Well it would make sense if the user told us the labels or bin edges, but again its internal-only so eh...

Illviljan · 2023-04-26T16:11:16Z

xarray/core/groupby.py

+
+@dataclass
+class BinGrouper(Grouper):
+    bins: Any  # TODO: What is the typing?


I didn't have the time to figure out the typing on this one.

At the moment it should be int | Sequence I think, so either number of bins, or actual bin edges in some Iterable where the order matters.

Never really like to use Sequence because np.ndarrays are not Sequences... And I think it could be quite common to supply the bins with numpy arrays?

Yes int or sequences or arrays

np.typing.ArrayLike?

Yes but "nested sequences" won't work here:https://numpy.org/doc/stable/reference/typing.html#numpy.typing.ArrayLike, but maybe that's a tiny detail

for more information, see https://pre-commit.ci

xarray/core/groupby.py

for more information, see https://pre-commit.ci

This reverts commit 917c77efb05bacffcf901e61eabb9defc9a429d7.

* main: Introduce Grouper objects internally (pydata#7561) [skip-ci] Add cftime groupby, resample benchmarks (pydata#7795) Fix groupby binary ops when grouped array is subset relative to other (pydata#7798) adjust the deprecation policy for python (pydata#7793) [pre-commit.ci] pre-commit autoupdate (pydata#7803) Allow the label run-upstream to run upstream CI (pydata#7787) Update asv links in contributing guide (pydata#7801) Implement DataArray.to_dask_dataframe() (pydata#7635) `ds.to_dict` with data as arrays, not lists (pydata#7739) Add lshift and rshift operators (pydata#7741) Use canonical name for set_horizonalalignment over alias set_ha (pydata#7786) Remove pandas<2 pin (pydata#7785) [pre-commit.ci] pre-commit autoupdate (pydata#7783)

mwtoews · 2023-05-26T02:06:49Z

xarray/core/groupby.py

+        else:
+            newgroup = group
+
+    if newgroup.size == 0:


With xarray 2023.5.0 I seem to now get "UnboundLocalError: local variable 'newgroup' referenced before assignment" when using groupby with a IndexVariable object.

Can you open a new issue please?

github-actions bot added the topic-groupby label Feb 27, 2023

dcherian added the run-benchmark Run the ASV benchmark workflow label Feb 27, 2023

keewis mentioned this pull request Mar 7, 2023

Preserve base and loffset arguments in resample #7444

Merged

3 tasks

dcherian force-pushed the grouper-objects branch 3 times, most recently from 1147411 to 78f6bda Compare March 9, 2023 21:29

dcherian force-pushed the grouper-objects branch from 78f6bda to f46f5d9 Compare March 10, 2023 04:23

dcherian force-pushed the grouper-objects branch from 94aca25 to 6fba398 Compare March 18, 2023 03:47

dcherian mentioned this pull request Mar 20, 2023

Save groupby codes after factorizing, pass to flox #7206

Merged

3 tasks

dcherian added 12 commits March 29, 2023 21:55

Introduce Grouper objects.

71f5e10

Remove a copy after stacking for a groupby.

b9500ce

Upstream bug pandas-dev/pandas#12813 is fixed

Fix typing

44f1325

[WIP] typing

1168ab7

Cleanup

c905b74

[WIP]

22ad7fa

group as Variable?

22ac6de

Revert "group as Variable?"

912e5c5

This reverts commit 2a36e21a031b9e061b932682758551956f3f06d2.

Small cleanup

60abafe

De-duplicate alignment check

c6bfdaa

Fix resampling

a2290aa

Bugfix

e863045

dcherian force-pushed the grouper-objects branch from 6fba398 to e863045 Compare March 30, 2023 03:55

Partial reverts commit 22ad7fa.

0d0b2cd

dcherian added 3 commits March 30, 2023 20:50

fix tests

c5daa47

small cleanup

dda40f5

more cleanup

eb43043

Illviljan reviewed Mar 31, 2023

View reviewed changes

xarray/core/groupby.py Outdated Show resolved Hide resolved

xarray/core/groupby.py Outdated Show resolved Hide resolved

xarray/core/groupby.py Outdated Show resolved Hide resolved

xarray/core/groupby.py Outdated Show resolved Hide resolved

xarray/core/groupby.py Show resolved Hide resolved

Apply suggestions from code review

8347313

dcherian mentioned this pull request Apr 6, 2023

Update GroupBy constructor for grouping by multiple variables, dask arrays #6610

Closed

dcherian and others added 4 commits April 18, 2023 16:04

Ignore type checking error.

fe0e421

Update groupby.py

0cc1ba3

dcherian changed the title ~~Introduce Grouper objects~~ Introduce Grouper objects internally Apr 26, 2023

Illviljan approved these changes Apr 26, 2023

View reviewed changes

Illviljan and others added 3 commits April 27, 2023 22:28

Move factorize to _factorize

2e10d3f

[pre-commit.ci] auto fixes from pre-commit.com hooks

d06bdeb

for more information, see https://pre-commit.ci

Update groupby.py

867629f

dcherian removed the needs review label Apr 28, 2023

dcherian commented Apr 28, 2023

View reviewed changes

xarray/core/groupby.py Show resolved Hide resolved

Update xarray/core/groupby.py

89ab508

dcherian added the plan to merge Final call for comments label Apr 28, 2023

pre-commit-ci bot and others added 5 commits April 28, 2023 14:22

[pre-commit.ci] auto fixes from pre-commit.com hooks

8d7e6b8

for more information, see https://pre-commit.ci

Merge branch 'main' into grouper-objects

afe41db

Calculate group_indices only when necessary

dde8866

Revert "Calculate group_indices only when necessary"

b719976

This reverts commit 917c77efb05bacffcf901e61eabb9defc9a429d7.

Fix regression from deep copy

265f1dd

dcherian merged commit fde773e into pydata:main May 4, 2023

dcherian deleted the grouper-objects branch May 4, 2023 02:35

TomNicholas mentioned this pull request May 4, 2023

Generalize handling of chunked array types #7019

Merged

15 tasks

aulemahal mentioned this pull request May 9, 2023

⚠️ Nightly upstream-dev CI failed ⚠️ Ouranosinc/xclim#1368

Closed

mwtoews reviewed May 26, 2023

View reviewed changes

mwtoews mentioned this pull request Jun 14, 2023

Grouper object does not handle IndexVariable #7919

Closed

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introduce Grouper objects internally #7561

Introduce Grouper objects internally #7561

dcherian commented Feb 27, 2023 •

edited

Loading

dcherian commented Mar 30, 2023

Illviljan left a comment

Illviljan commented Apr 5, 2023

dcherian commented Apr 5, 2023

dcherian commented Apr 26, 2023

Illviljan left a comment

Illviljan Apr 26, 2023

dcherian Apr 26, 2023 •

edited

Loading

Illviljan Apr 26, 2023

dcherian Apr 26, 2023

headtr1ck Apr 27, 2023 •

edited

Loading

dcherian Apr 27, 2023 •

edited

Loading

Illviljan Apr 27, 2023

dcherian Apr 28, 2023 •

edited

Loading

mwtoews May 26, 2023

dcherian Jun 13, 2023

mwtoews Jun 14, 2023

Introduce Grouper objects internally #7561

Introduce Grouper objects internally #7561

Conversation

dcherian commented Feb 27, 2023 • edited Loading

dcherian commented Mar 30, 2023

Illviljan left a comment

Choose a reason for hiding this comment

Illviljan commented Apr 5, 2023

dcherian commented Apr 5, 2023

dcherian commented Apr 26, 2023

Illviljan left a comment

Choose a reason for hiding this comment

Illviljan Apr 26, 2023

Choose a reason for hiding this comment

dcherian Apr 26, 2023 • edited Loading

Choose a reason for hiding this comment

Illviljan Apr 26, 2023

Choose a reason for hiding this comment

dcherian Apr 26, 2023

Choose a reason for hiding this comment

headtr1ck Apr 27, 2023 • edited Loading

Choose a reason for hiding this comment

dcherian Apr 27, 2023 • edited Loading

Choose a reason for hiding this comment

Illviljan Apr 27, 2023

Choose a reason for hiding this comment

dcherian Apr 28, 2023 • edited Loading

Choose a reason for hiding this comment

mwtoews May 26, 2023

Choose a reason for hiding this comment

dcherian Jun 13, 2023

Choose a reason for hiding this comment

mwtoews Jun 14, 2023

Choose a reason for hiding this comment

dcherian commented Feb 27, 2023 •

edited

Loading

dcherian Apr 26, 2023 •

edited

Loading

headtr1ck Apr 27, 2023 •

edited

Loading

dcherian Apr 27, 2023 •

edited

Loading

dcherian Apr 28, 2023 •

edited

Loading