Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduction APIs for groupby, groupby_bins, resample, rolling #2363

Closed
fujiisoup opened this issue Aug 13, 2018 · 1 comment · Fixed by #2366
Closed

Reduction APIs for groupby, groupby_bins, resample, rolling #2363

fujiisoup opened this issue Aug 13, 2018 · 1 comment · Fixed by #2366

Comments

@fujiisoup
Copy link
Member

From #2356

APIs for groupby, groupby_bins, resample, rolling are different, especially for multi-dimensional array.

import numpy as np
import xarray as xr
import pandas as pd

time = pd.date_range('2000-01-01', freq='6H', periods=365 * 4)
ds = xr.Dataset({'foo': (('time', 'x'), np.random.randn(365 * 4, 5)), 'time': time, 
                 'x': [0, 1, 2, 1, 0]})

ds.rolling(time=2).mean()  # result dims : ('time', 'x')
ds.resample(time='M').mean()  # result dims : ('time', 'x')
ds['foo'].resample(time='M').mean()  # result dims : ('time', )  maybe a bug #2362
ds.groupby('time.month').mean()  # result dims : ('month', )
ds.groupby_bins('time', 3).mean()  # result dims : ('time_bins', )
  • In rolling and resample(for Dataset), reduction without argument is carried out along grouped dimension
  • In rolling, reduction along other dimesnion is not possible
  • In groupby and groupby_bins, reduction is applied to the grouped objects and if without argument, it reduces alongall the dimensions of each grouped object.

I think rollings API is most clean, but I am not sure it is worth to change these APIs.

The possible options would be

  1. Change APIs of groupby and groupby_bins so that they share similar API with rolling.
  2. Document clearly how to perform resample or groupby with multidimensional arrays.
@shoyer
Copy link
Member

shoyer commented Aug 13, 2018

This does seem to be a little inconsistent currently.

My original reasoning for the default groupby behavior was that that this felt more consistent with the behavior for non-grouped reductions, which reduces across all dimensions.

But it's probably less useful, and results in a lot of redundant code. I can only think of a few times when I've actually wanted this behavior, rather than summing over only the grouped dimension. Especially when going from 1D -> ND, this is a likely source of errors.

So instead, we could change this to:

ds.groupby('time.month').mean()  # result dims : ('month', 'x')
ds.groupby('time.month').mean(dim=None)  # result dims : ('month',)

Or maybe we could add a special constant xarray.ALL_DIMS to indicate all dimensions? This is probably the most readable version:

ds.groupby('time.month').mean(dim=xarray.ALL_DIMS)  # result dims : ('month',)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants