Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New Resample-Syntax leading to cancellation of dimensions #2356

Closed
rpnaut opened this issue Aug 9, 2018 · 8 comments
Closed

New Resample-Syntax leading to cancellation of dimensions #2356

rpnaut opened this issue Aug 9, 2018 · 8 comments

Comments

@rpnaut
Copy link

rpnaut commented Aug 9, 2018

Example

Starting with the dataset located here: https://swiftbrowser.dkrz.de/public/dkrz_c0725fe8741c474b97f291aac57f268f/GregorMoeller/,
I want to calculate monthly sums of precipitation for each gridpoint in the daily data:

In [39]: data = array.open_dataset("eObs_gridded_0.22deg_rot_v14.0.TOT_PREC.1950-2016.nc_CutParamTimeUnitCor_FinalEvalGrid")
In [40]: data
Out[13]: 
<xarray.Dataset>
Dimensions:       (rlat: 136, rlon: 144, time: 153)
Coordinates:
  * rlon          (rlon) float32 -22.6 -22.38 -22.16 -21.94 -21.72 -21.5 ...
  * rlat          (rlat) float32 -12.54 -12.32 -12.1 -11.88 -11.66 -11.44 ...
  * time          (time) datetime64[ns] 2006-05-01T12:00:00 ...
Data variables:
    rotated_pole  int32 ...
    TOT_PREC      (time, rlat, rlon) float32 ...
Attributes:
    CDI:                       Climate Data Interface version 1.8.0 (http://m...
    Conventions:               CF-1.6
    history:                   Thu Jun 14 12:34:59 2018: cdo -O -s -P 4 remap...
    CDO:                       Climate Data Operators version 1.8.0 (http://m...
    cdo_openmp_thread_number:  4

In [41]: datamonth = data["TOT_PREC"].resample(time="M").sum()
In [42]: datamonth
Out[42]: 
<xarray.DataArray 'TOT_PREC' (time: 5)>
array([ 551833.25   ,  465640.09375,  328445.90625,  836892.1875 ,  503601.5    ], dtype=float32)
Coordinates:
  time     (time) datetime64[ns] 2006-05-31 2006-06-30 2006-07-31 ...

Problem description

The problem is that the dimensions 'rlon' and 'rlat' and the corresponding coordinates have not survived the resample process. Only the time is present in the result.

Expected Output

I expect to have the spatial dimensions still in the output of monthly sums. The surprise is, that this is the case using the old syntax:

In [41]: datamonth = data["TOT_PREC"].resample(dim="time",freq="M",how="sum")
/usr/bin/ipython3:1: FutureWarning: 
.resample() has been modified to defer calculations. Instead of passing 'dim' and how="sum", instead consider using .resample(time="M").sum('time') 
  #!/usr/bin/env python3

In [42]: datamonth
Out[42]: 
<xarray.DataArray 'TOT_PREC' (time: 5, rlat: 136, rlon: 144)>
array([[[  0.      ,   0.      , ...,   0.      ,   0.      ],
        [  0.      ,   0.      , ...,   0.      ,   0.      ],
        ..., 
        [  0.      ,   0.      , ...,  44.900028,  41.400024],
        [  0.      ,   0.      , ...,  49.10001 ,  46.5     ]]], dtype=float32)
Coordinates:
  * time     (time) datetime64[ns] 2006-05-31 2006-06-30 2006-07-31 ...
  * rlon     (rlon) float32 -22.6 -22.38 -22.16 -21.94 -21.72 -21.5 -21.28 ...
  * rlat     (rlat) float32 -12.54 -12.32 -12.1 -11.88 -11.66 -11.44 -11.22 ...

What is wrong here?

And maybe I can also ask the question why the new syntax did not consider use cases with high complex scripting? I do not like to use in my programs a hardcoded dimension name, i.e. time=${freq} instead of dim=${dim}; freq=${freq}.

@dcherian
Copy link
Contributor

dcherian commented Aug 9, 2018

datamonth = data["TOT_PREC"].resample(time="M").sum(dim='time') should do what you want.

@rpnaut
Copy link
Author

rpnaut commented Aug 10, 2018

Thank you @dcherian . Do you think, that two times giving the dimension time as argument is useful?

OR MAYBE i understand everything wrong:
Is the argument time='M' only mean to be freqency='M'? And the name for the time dimension is now given by the argument "dim"?
Or let me ask the question different: what would be the syntax of your command, if the time dimension has the name 'TIMES'?

@dcherian
Copy link
Contributor

The repeated dimension follows pandas syntax. It's nice because the syntax is similar to the usual reduction DataArray.sum().

.resample(TIMES='M').sum(dim='TIMES') should work as long as TIMES is datetime64.

@fujiisoup
Copy link
Member

Is the argument time='M' only mean to be freqency='M'?

It means resampling with frequency='M' *along coordinate named 'time'.

what would be the syntax of your command, if the time dimension has the name 'TIMES'?

It should be

data["TOT_PREC"].resample(TIMES="M").sum(dim='TIMES')

@fujiisoup
Copy link
Member

BTW, is this API (repeated dimension names) intended?
I am slightly wondering that this is a little different from the rolling counterpart.
In rolling, we do
ds.rolling(time=3).sum(),
we do not need to (cannot) specify the dimension name in sum.

@dcherian
Copy link
Contributor

Oh sorry, i was mistaken. Looks like pandas does not require the repeated dimension. https://pandas.pydata.org/pandas-docs/stable/timeseries.html#resampling

The current API works like groupby but i don't think that was the original intent (#1269 #1272).

@fujiisoup
Copy link
Member

Thanks, @dcherian.
It looks the original API was designed mainly for 1d arrays, and documentation does not describe clearly how to apply them to multi-dimensional arrays.

I split this issue into two, #2362 and #2363.

@dcherian
Copy link
Contributor

Fixed in 0.13.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants