-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce Grouper objects internally #7561
Conversation
1147411
to
78f6bda
Compare
78f6bda
to
f46f5d9
Compare
94aca25
to
6fba398
Compare
Upstream bug pandas-dev/pandas#12813 is fixed
This reverts commit 2a36e21a031b9e061b932682758551956f3f06d2.
6fba398
to
e863045
Compare
@Illviljan could use some typing help here if you have the time :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll look into it more later.
I'm not sure about the rest of the errors, @dcherian. Maybe IndexVariable needs to use the DataWithCoords mixin?
xarray/xarray/core/alignment.py Lines 581 to 588 in d4db166
xarray/xarray/core/alignment.py Line 31 in d4db166
Lines 376 to 377 in d4db166
|
Variables don't have coordinates so that won't work. mypy is correct here, it's a bug and we don't test for grouping by index variables. A commit reverting to the old It's not clear to me why we allow this actually. Seems like |
* main: (34 commits) Update whats-new.rst Fix binning by unsorted array (pydata#7762) Bump codecov/codecov-action from 3.1.1 to 3.1.2 (pydata#7760) Fix typing errors using mypy 1.2 (pydata#7752) [skip-ci] dev whats-new Add whats-new for v2023.04.0 (pydata#7757) remove the `black` hook (pydata#7756) reword the what's new entry for the `pandas` 2.0 dtype changes (pydata#7755) restructure the contributing guide (pydata#7681) Continue to use nanosecond-precision Timestamps in precision-sensitive areas (pydata#7731) minor doc updates to clarify extensions using accessors (pydata#7751) align: Avoid reindexing when join="exact" (pydata#7736) `pandas=2.0` support (pydata#7724) Clarify vectorized indexing documentation (pydata#7747) Avoid recasting a CFTimeIndex (pydata#7735) fix typo (pydata#7746) [pre-commit.ci] pre-commit autoupdate (pydata#7745) Bump pypa/gh-action-pypi-publish from 1.8.4 to 1.8.5 (pydata#7743) preserve boolean dtype in encoding (pydata#7720) [skip-ci] Add alignment benchmarks (pydata#7738) ...
* main: Bump codecov/codecov-action from 3.1.2 to 3.1.3 (pydata#7781) Fix whats-new [skip-ci] dev whats-new (pydata#7775) [skip-ci] Release 2023.04.2 (pydata#7774) Fix groupby_bins when labels are specified (pydata#7769) Docstrings examples for string methods (pydata#7669) Add dev whats-new Add benchmark against latest release on main. (pydata#7753)
I'd like to merge this soon. It's an internal refactor with no public API changes. I think we can expose the Grouper objects publicly in a new PR |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added some TODOs I've been thinking about, no stoppers.
return len(self) | ||
|
||
def __len__(self) -> int: | ||
return len(self.full_index) # TODO: full_index not def, abstractmethod? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will crash if .factorize
hasn't been triggered before.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes but it wouldn't make sense without it, and this class is internal-only. Well it would make sense if the user told us the labels or bin edges, but again its internal-only so eh...
|
||
@dataclass | ||
class BinGrouper(Grouper): | ||
bins: Any # TODO: What is the typing? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't have the time to figure out the typing on this one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At the moment it should be int | Sequence
I think, so either number of bins, or actual bin edges in some Iterable where the order matters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Never really like to use Sequence
because np.ndarray
s are not Sequences... And I think it could be quite common to supply the bins with numpy arrays?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes int or sequences or arrays
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
np.typing.ArrayLike?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes but "nested sequences" won't work here:https://numpy.org/doc/stable/reference/typing.html#numpy.typing.ArrayLike, but maybe that's a tiny detail
for more information, see https://pre-commit.ci
This reverts commit 917c77efb05bacffcf901e61eabb9defc9a429d7.
* main: Introduce Grouper objects internally (pydata#7561) [skip-ci] Add cftime groupby, resample benchmarks (pydata#7795) Fix groupby binary ops when grouped array is subset relative to other (pydata#7798) adjust the deprecation policy for python (pydata#7793) [pre-commit.ci] pre-commit autoupdate (pydata#7803) Allow the label run-upstream to run upstream CI (pydata#7787) Update asv links in contributing guide (pydata#7801) Implement DataArray.to_dask_dataframe() (pydata#7635) `ds.to_dict` with data as arrays, not lists (pydata#7739) Add lshift and rshift operators (pydata#7741) Use canonical name for set_horizonalalignment over alias set_ha (pydata#7786) Remove pandas<2 pin (pydata#7785) [pre-commit.ci] pre-commit autoupdate (pydata#7783)
else: | ||
newgroup = group | ||
|
||
if newgroup.size == 0: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With xarray 2023.5.0 I seem to now get "UnboundLocalError: local variable 'newgroup' referenced before assignment" when using groupby with a IndexVariable object.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you open a new issue please?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See #7919
Builds on the refactoring in #7206