Reduction APIs for groupby, groupby_bins, resample, rolling #2363

fujiisoup · 2018-08-13T00:30:10Z

APIs for groupby, groupby_bins, resample, rolling are different, especially for multi-dimensional array.

import numpy as np
import xarray as xr
import pandas as pd

time = pd.date_range('2000-01-01', freq='6H', periods=365 * 4)
ds = xr.Dataset({'foo': (('time', 'x'), np.random.randn(365 * 4, 5)), 'time': time, 
                 'x': [0, 1, 2, 1, 0]})

ds.rolling(time=2).mean()  # result dims : ('time', 'x')
ds.resample(time='M').mean()  # result dims : ('time', 'x')
ds['foo'].resample(time='M').mean()  # result dims : ('time', )  maybe a bug #2362
ds.groupby('time.month').mean()  # result dims : ('month', )
ds.groupby_bins('time', 3).mean()  # result dims : ('time_bins', )

In rolling and resample(for Dataset), reduction without argument is carried out along grouped dimension
In rolling, reduction along other dimesnion is not possible
In groupby and groupby_bins, reduction is applied to the grouped objects and if without argument, it reduces alongall the dimensions of each grouped object.

I think rollings API is most clean, but I am not sure it is worth to change these APIs.

The possible options would be

Change APIs of groupby and groupby_bins so that they share similar API with rolling.
Document clearly how to perform resample or groupby with multidimensional arrays.

The text was updated successfully, but these errors were encountered:

shoyer · 2018-08-13T20:51:14Z

This does seem to be a little inconsistent currently.

My original reasoning for the default groupby behavior was that that this felt more consistent with the behavior for non-grouped reductions, which reduces across all dimensions.

But it's probably less useful, and results in a lot of redundant code. I can only think of a few times when I've actually wanted this behavior, rather than summing over only the grouped dimension. Especially when going from 1D -> ND, this is a likely source of errors.

So instead, we could change this to:

ds.groupby('time.month').mean()  # result dims : ('month', 'x')
ds.groupby('time.month').mean(dim=None)  # result dims : ('month',)

Or maybe we could add a special constant xarray.ALL_DIMS to indicate all dimensions? This is probably the most readable version:

ds.groupby('time.month').mean(dim=xarray.ALL_DIMS)  # result dims : ('month',)

fujiisoup mentioned this issue Aug 13, 2018

New Resample-Syntax leading to cancellation of dimensions #2356

Closed

fujiisoup mentioned this issue Aug 14, 2018

Future warning for default reduction dimension of groupby #2366

Merged

4 tasks

fujiisoup closed this as completed in #2366 Sep 28, 2018

dcherian mentioned this issue Jul 3, 2023

Computational pattern tutorial edits xarray-contrib/xarray-tutorial#186

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduction APIs for groupby, groupby_bins, resample, rolling #2363

Reduction APIs for groupby, groupby_bins, resample, rolling #2363

fujiisoup commented Aug 13, 2018

shoyer commented Aug 13, 2018

Reduction APIs for groupby, groupby_bins, resample, rolling #2363

Reduction APIs for groupby, groupby_bins, resample, rolling #2363

Comments

fujiisoup commented Aug 13, 2018

shoyer commented Aug 13, 2018