Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor GroupBy internals #9389

Merged
merged 3 commits into from
Aug 21, 2024
Merged

Conversation

dcherian
Copy link
Contributor

  1. Simplify ResolvedGrouper by moving logic to EncodedGroups
  2. Stack outside ResolvedGrouper in GroupBy.init to prepare for
    multi-variable GroupBy

Broken out form #9372

  • Tests added

1. Simplify ResolvedGrouper by moving logic to EncodedGroups
2. Stack outside ResolvedGrouper in GroupBy.__init__ to prepare for
   multi-variable GroupBy
@dcherian dcherian force-pushed the groupby-refactoring-2 branch from 5b40ae3 to b264dc1 Compare August 21, 2024 04:32
# assign coord when the applied function does not return that coord
if coord is not None and dim not in applied_example.dims:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice simplification!

unique_coord: Variable | _DummyGroup | None = None,
coords: Coordinates | None = None,
):
from xarray.core.groupby import _codes_to_group_indices
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic moved from resolvedgrouper

Coordinates(
coords={name: (name, np.array(output_index))},
indexes={name: PandasIndex(output_index, dim=name)},
group_dims = set(grouper.group.dims)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the only main logic change. It is needed to get the new tests to pass

@dcherian dcherian mentioned this pull request Aug 21, 2024
10 tasks
@dcherian dcherian requested a review from max-sixty August 21, 2024 05:04
Copy link
Collaborator

@max-sixty max-sixty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM (was a quick scan, can look more tomorrow if helpful)

@dcherian
Copy link
Contributor Author

Thanks! That's good enough. It mostly undoes a previous refactor, haha.

@dcherian dcherian merged commit a56a407 into pydata:main Aug 21, 2024
28 checks passed
@dcherian dcherian deleted the groupby-refactoring-2 branch August 21, 2024 22:08
dcherian added a commit to TomNicholas/xarray that referenced this pull request Aug 22, 2024
* main: (214 commits)
  Adds copy parameter to __array__ for numpy 2.0 (pydata#9393)
  `numpy 2` compatibility in the `pydap` backend (pydata#9391)
  pyarrow dependency added to doc environment (pydata#9394)
  Extend padding functionalities (pydata#9353)
  refactor GroupBy internals (pydata#9389)
  Combine `UnsignedIntegerCoder` and `CFMaskCoder` (pydata#9274)
  passing missing parameters to ZarrStore.open_store when opening a datatree (pydata#9377)
  Fix tests on big-endian systems (pydata#9380)
  Improve error message on `ds['x', 'y']` (pydata#9375)
  Improve error message for missing coordinate index (pydata#9370)
  Add flaky to TestNetCDF4ViaDaskData (pydata#9373)
  Make chunk manager an option in `set_options` (pydata#9362)
  Revise (pydata#9371)
  Remove duplicate word from docs (pydata#9367)
  Adding open_groups to BackendEntryPointEngine, NetCDF4BackendEntrypoint, and H5netcdfBackendEntrypoint (pydata#9243)
  Revise (pydata#9366)
  Fix rechunking to a frequency with empty bins. (pydata#9364)
  whats-new entry for dropping python 3.9 (pydata#9359)
  drop support for `python=3.9` (pydata#8937)
  Revise (pydata#9357)
  ...
dcherian added a commit to dcherian/xarray that referenced this pull request Aug 30, 2024
* main:
  Adds copy parameter to __array__ for numpy 2.0 (pydata#9393)
  `numpy 2` compatibility in the `pydap` backend (pydata#9391)
  pyarrow dependency added to doc environment (pydata#9394)
  Extend padding functionalities (pydata#9353)
  refactor GroupBy internals (pydata#9389)
  Combine `UnsignedIntegerCoder` and `CFMaskCoder` (pydata#9274)
  passing missing parameters to ZarrStore.open_store when opening a datatree (pydata#9377)
  Fix tests on big-endian systems (pydata#9380)
  Improve error message on `ds['x', 'y']` (pydata#9375)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants