add support for monthly-resolution predictions #172

bradyrx · 2019-06-07T19:52:48Z

Description

This PR adds support for prediction ensembles at monthly resolution.

Fixes https://github.com/bradyrx/climpred/issues/44

Type of change

Please delete options that are not relevant.

New feature (non-breaking change which adds functionality)
This change requires a documentation update

How Has This Been Tested?

# Load in global average SST for monthly DPLE 
hind = xr.open_dataset('/glade/u/home/rbrady/scratch/monthly_SST_global_avg.nc')
hind = hind.rename({'anom': 'SST'})
print(hind)
>>> <xarray.Dataset>
>>> Dimensions:  (init: 62, lead: 122)
>>> Coordinates:
>>>   * init     (init) int32 1955 1956 1957 1958 1959 ... 2012 2013 2014 2015 2016
>>>   * lead     (lead) int32 1 2 3 4 5 6 7 8 9 ... 115 116 117 118 119 120 121 122
>>>     member   int32 ...
>>> Data variables:
>>>     SST      (init, lead) float32 ...

reference = xr.open_dataset('/glade/u/home/rbrady/scratch/FOSI_global_SST.nc')
# Convert FOSI to anomalies
reference = reference - reference.sel(time=slice('1964', '2014')).mean('time')
reference = reference.groupby('time.month') - reference.groupby('time.month').mean('time')

>>>  <xarray.Dataset>
>>>  Dimensions:  (time: 840)
>>>  Coordinates:
>>>      z_t      float32 500.0
>>>    * time     (time) datetime64[ns] 1948-01-01 1948-02-01 ... 2017-12-01
>>>      month    (time) int64 1 2 3 4 5 6 7 8 9 10 11 ... 2 3 4 5 6 7 8 9 10 11 12
>>>  Data variables:
>>>      SST      (time) float32 0.099155456 0.042655587 ... 0.29147333 0.17652681

# Add actual initialization dates to DPLE. November initializations.
hind['init'] = np.arange('1954-11', '2016-11', dtype='datetime64[M]')[::12]
reference['time'] = np.arange('1948-01', '2018-01', dtype='datetime64[M]')

# Subtract one from hindcast lead. The first timestep is actually the November comparison, 
# since initialization was November 1 and we're looking at the full month of November.
hind['lead'] = hind['lead'] - 1
cp.prediction.compute_hindcast(hind, reference, resolution='M').SST.plot()

Checklist (while developing)

I have added docstrings to all new functions.
I have commented my code, particularly in hard-to-understand areas
Tests added for pytest, if necessary.

Pre-Merge Checklist (final steps)

I have rebased onto master or develop (wherever I am merging) and dealt with any conflicts.
I have squashed commits to a reasonable amount, and force-pushed the squashed commits.
All notebooks run with this branch; tested via treon.
pytest runs without breaking.

Todo

Add testing for monthly resolution
Add temporal resolution to classes
Check functions outside of compute_hindcast
Add datetime module?

bradyrx · 2019-06-07T19:56:36Z

The key here is that xarray only works with datetime64 for time coordinates and will only do this in [ns] (nanoseconds).

So, with datetime objects, this line:
imax = min(forecast.time.max(), reference.time.max() - nlags)

Will try to subtract 1 month but instead subtracts 1 nanosecond. So by forcing the following:
imax = min(forecast.time.max(), reference.time.max().values.astype('datetime64[M]') - nlags)

We actually subtract off the correct amount of time. [M] for month, [Y] for year.

This line then has to be changed to accomodate for adding the appropriate time as well:
a['time'] = [t + i for t in a.time.values.astype('datetime64[M]')]

The "real time" approach is great for this. However, I suggest then that we require all objects to have datetime coordinates and do not accept integers for instance. This ensures that we can use the 'real time' approach everywhere and can use datetime64 operations on the output. This means some more pre-processing will have to go in for the user, but it's good for them to have the actual initialization dates in.

We could add a datetime module to do some of what I did in the test code in the main PR text.

bradyrx · 2019-06-07T19:58:00Z

Note -- I plan to just implement monthly resolution with this PR. I think daily would actually be easier than seasonal here. But I want to get it working for monthly, then we can adapt for daily/seasonal?

bradyrx · 2019-06-07T19:59:33Z

@aaronspring, can you test compute_perfect_model with monthly output? I imagine that this won't be a problem since it's just comparing back to the model/control run itself. And there's nowhere in that function we're currently doing slicing like in the compute_hindcast function.

bradyrx · 2019-06-07T20:03:44Z

Another issue is that DPLE monthly has a first lead time of zero, truly. Since initialization occurs November 1st, the first lead is the month of November. I.e., a lead of zero in the eyes of this package. This works fine for compute_hindcast, but is an issue for compute_persistence due to this line:

ctrl_inits = reference.isel(time=slice(0, -lag))['time'].values
which returns an empty object with a lag of zero, of course.

aaronspring · 2019-06-08T13:52:17Z

Regarding the initial comment. When computing monthly skill, what kind of inputs do we require? I naively thought of those:

Hindcast: lead int, init int or datetime
Reference: time same as init?

While it might be more tricky to implement, i think allowing ints is rather easy for the decidable prediction user to provide. When the user should provide datetime s/he has to think about whether to put the November init date there and so on, even if it’s only the decidable prediction thing.

@bradyrx to do you think it’s useful to have ints and datetime? Or rather require one in the calculation but before a conversion if ints are involved?

aaronspring · 2019-06-08T13:57:31Z

Can you please show the annual skill in your example also? Thanks

aaronspring · 2019-06-08T15:13:47Z

Regarding time format. My perfect model runs are in the year 3000. That is changeable but some control runs might be 1000 years. For that purpose I suggest we use cftime. It has all the groupby features as datetime.

ahuang11 · 2019-06-08T16:03:06Z

climpred/prediction.py


    Returns:
        skill (xarray object): Predictability with main dimension `lag`.
    """
    nlags = hind.lead.size
+    if resolution not in ['Y', 'M']:
+        raise ValueError(


Maybe NotImplementedError

ahuang11 · 2019-06-08T16:03:39Z

climpred/prediction.py


    Returns:
        skill (xarray object): Predictability with main dimension `lag`.
    """
    nlags = hind.lead.size
+    if resolution not in ['Y', 'M']:


Maybe when constants.py is added in my PR, add TEMPORAL_RES = ['Y', 'M']

Also add this if ... to checks.py since I see it used in multiple places

aaronspring · 2019-06-08T18:43:31Z

monthly data added: https://github.com/bradyrx/climpred-data

control_mm
<xarray.Dataset>
Dimensions:  (area: 2, time: 3612)
Coordinates:
  * area     (area) object 'global' 'North_Atlantic'
  * time     (time) object 3000-01-31 00:00:00 ... 3300-12-31 00:00:00
Data variables:
    tos      (area, time) float32 ...
    sos      (area, time) float32 ...

ds_mm
<xarray.Dataset>
Dimensions:  (area: 2, init: 12, lead: 252, member: 10)
Coordinates:
  * area     (area) object 'global' 'North_Atlantic'
  * lead     (lead) int64 1 2 3 4 5 6 7 8 9 ... 245 246 247 248 249 250 251 252
  * init     (init) object 3014-01-31 00:00:00 ... 3257-01-31 00:00:00
  * member   (member) int64 0 1 2 3 4 5 6 7 8 9

now I realize, init should maybe be one month earlier. well thats what the discussion is about. but this at least is data to test with. probably we will have to define more clearly how the data needs to be formatted.

bradyrx · 2019-06-10T01:52:31Z

Will get back to the other comments, but on this:

Regarding time format. My perfect model runs are in the year 3000. That is changeable but some control runs might be 1000 years. For that purpose I suggest we use cftime. It has all the groupby features as datetime.

I agree. I think we can just force all xarray objects to have cftime to only have to anticipate one type of datetime. I'll look into this module.

ahuang11 · 2019-06-10T02:31:15Z

It'd be good to test this with NMME as well once ready.

…

On Sun, Jun 9, 2019, 6:52 PM Riley Brady ***@***.***> wrote: Will get back to the other comments, but on this: Regarding time format. My perfect model runs are in the year 3000. That is changeable but some control runs might be 1000 years. For that purpose I suggest we use cftime. It has all the groupby features as datetime. I agree. I think we can just force all xarray objects to have cftime to only have to anticipate one type of datetime. I'll look into this module. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <https://github.com/bradyrx/climpred/pull/172?email_source=notifications&email_token=ADU7FFSYGJX3QNVQU3GDP5DPZWXV7A5CNFSM4HVZ3RIKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODXIXLKA#issuecomment-500266408>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ADU7FFTGUKL7LJZX5K4XR6DPZWXV7ANCNFSM4HVZ3RIA> .

aaronspring · 2019-06-10T10:44:02Z

I agree. I think we can just force all xarray objects to have cftime to only have to anticipate one type of datetime. I'll look into this module.

I am glad to have cftime. Maybe we can convert all ints to cftime. another question that I ask myself now:
If we have inits as cftime, what should be the init and what time of reference would match the first lead?

If initialization took place at the turn of the year jan 1st xxxx-01-01 00:00:00, then lets use this as init and mid-jan xxxx-01-15 00:00:00 matching the first lead? Or rather end of the month, because output is usually rather formatted as this? Better if we can somehow make it independent of timescales lower than a month when calculating monthly skill, so we can worry less.

Its difficult matching the best setup for us internally but also consistent and useful for the user.

bradyrx · 2019-06-10T17:00:31Z

@aaronspring @ahuang11

using cftime in xarray

So xarray actually has a cftime_range function that we could use here. (http://xarray.pydata.org/en/stable/generated/xarray.cftime_range.html?highlight=cftime)

I loaded in FOSI (spans monthly from 1948-2015) and can add a cftime object by the following:
ds['time'] = xr.cftime_range('1948-01', '2015-12', freq='MS', name='time')

You could also imagine setting up initializations on every November 1st:
xr.cftime_range('1948-11', '2015-11', freq='MS')[::12].

cftime for `climpred`

If we have inits as cftime, what should be the init and what time of reference would match the first lead? If initialization took place at the turn of the year jan 1st xxxx-01-01 00:00:00, then lets use this as init and mid-jan xxxx-01-15 00:00:00 matching the first lead? Or rather end of the month, because output is usually rather formatted as this? Better if we can somehow make it independent of timescales lower than a month when calculating monthly skill, so we can worry less.

I'm not sure here.

inits should align with their actual initialization in my opinion. So DPLE is initialized on November 1 at midnight, I'd want cftime.datetime(2014, 1, 1).
Other simulations (like a control run) tend to handle their cftime as month-end as you said. For example, FOSI month one comes out as cftime.DatetimeNoLeap(249, 1, 31, 23, 59, 59, 0, 6, 31). This makes sense, since it's the full month average.

I have to play around with your "real time" setup in our functions to see how we can make this align. We should implement a cftime/time/datetime module to handle some of the tricky bits here. Here's a thread on ideas of how to shift time in cftime format (pydata/xarray#2244).

Some questions:
a. How do we shift the init cftime appropriately to align with the referfences?
b. What does lead-1 tend to really mean for modeling centers?

On (b), some comments for DPLE. We initialize Nov. 1 of each year at midnight. At annual resolution, this means lead-1 from Nov. 2014 initialization corresponds to the 2015 average. At monthly resolution, it gets tricky. Since we initialize at midnight on Nov 1., lead-1 monthly is actually the full November average. So that would align with (2) from above.

Initialization: 2015-11-01 00:00:00
Month-average from reference: 2015-11-30 00:00:00

So here, lead-1 should bring 2015-11-01 00:00:00 to 2015-11-30 00:00:00.

Thoughts? Want to get this straightened out before doing any more work here. I should also read some cftime documentation (I mean, outside of the package). How do modeling centers handle netCDF time? There's all sorts of different calendars, but usually their attribute says if it's Gregorian, noleap, etc.

References

cftime API for python: https://unidata.github.io/cftime/api.html
xarray cftime_range: http://xarray.pydata.org/en/stable/generated/xarray.cftime_range.html?highlight=cftime
xarray open_dataset has use_cftime flag: http://xarray.pydata.org/en/stable/generated/xarray.open_dataset.html#xarray.open_dataset
xarray example notebook with use of cftime: http://xarray.pydata.org/en/stable/weather-climate.html?highlight=cftime

aaronspring · 2019-06-11T11:22:23Z

I agree with all of this.

Inits should be exactly when initialization happened. 2015-11-01 00:00:00.
For MPI prediction systems its the same: init 1st nov and then first annual lead the next calender year. for seasonal, DJF is the first seasonal lead. for monthly probably, november would be the first.

if we put init to the timestep 1 before the first lead, it might work sometimes, but not all. that would have been easier.

I dont care whether we choose end of the month or the mid of the month as timestep for reference and then internally to compare. probably 2015-11-15 is easier because its static. (month end might be tricky for the leap years)

So in principle I am in favor of your suggestion:

Initialization: 2015-11-01 00:00:00
Month-average from reference: 2015-11-30 00:00:00
So here, lead-1 should bring 2015-11-01 00:00:00 to 2015-11-30 00:00:00

Are there functions like endofmonth, startofmonth,addmonths(1)?

bradyrx · 2019-06-11T21:29:05Z

I agree with your thoughts, but we have to be careful due to sections of the code like this:

Persistence finds common inits, based on the assumption that the monthly means for a reference are set as the first of the month, just as with inits.

bradyrx · 2019-06-12T20:38:46Z

@aaronspring , @ahuang11

I have to focus on writing a manuscript over the next couple weeks. I've been putting it off. I won't have time to work on this til then. So if either of you have ideas, feel free to push to this branch and play around with cftime.

ahuang11 · 2019-06-19T05:09:42Z

Are there functions like endofmonth, startofmonth,addmonths(1)?

Woops, missed your message @aaronspring, but pandas is amazing for timeseries (and already a requirement of xarray)

pd.datetime(2015, 5, 15) + pd.offsets.MonthBegin(1)
pd.datetime(2015, 5, 15) + pd.offsets.MonthEnd(1)
pd.datetime(2015, 5, 15) + pd.DateOffset(months=1)

Timestamp('2015-06-01 00:00:00')
Timestamp('2015-05-31 00:00:00')
Timestamp('2015-06-15 00:00:00')

Unfortunately, it doesn't support cftime.

ahuang11 · 2019-06-19T05:11:57Z

climpred/prediction.py

+    )
+    # Some lags force this into a numpy array for some reason.
+    imax = xr.DataArray(imax).rename('time')
+    forecast = forecast.where(forecast.time <= imax, drop=True)


I wonder if you could do:

forecast.sel(time=slice(imin, imax))

bradyrx · 2019-06-19T05:15:10Z

Woops, missed your message @aaronspring, but pandas is amazing for timeseries (and already a requirement of xarray)

Great point @ahuang11. I've played with this in the past and those addoffset functions are great.

ahuang11 · 2019-06-19T05:22:19Z

I think we should force everything to use cftime. Some decorator check to see if integers are provided; if so, convert to cftime. Once we use cftime, the temporal resolution (daily/monthly/yearly) should become irrelevant.

bradyrx · 2019-06-25T21:28:48Z

Closing for now since we will address this as a primary goal of v2.

aaronspring · 2019-09-21T08:23:13Z

when going for monthly I developed a function that generates uninitialized ensembles from a control, based on a start month to be specified. to be used in my perfect-model framework.

def bootstrap_uninit_pm_ensemble_from_control_mm(ds, control, init_month):
    """
    Create a pseudo-ensemble from control run.
    Note:
        Needed for block bootstrapping confidence intervals of a metric in perfect
        model framework. Takes randomly segments of length of ensemble dataset from
        control and rearranges them into ensemble and member dimensions.
    Args:
        ds (xarray object): ensemble simulation.
        control (xarray object): control simulation.
    Returns:
        ds_e (xarray object): pseudo-ensemble generated from control run.
    """
    nens = ds.init.size
    nmember = ds.member.size
    length = ds.lead.size
    c_start = control.time.dt.year[0]
    c_end = control.time.dt.year[-1]
    lead_time = ds['lead']

    def sel_years(control, year_s, length):
        new = control.sel(time=slice(str(year_s)+'-'+str(init_month).zfill(2),
                                     str(year_s + length//12 + 1)+'-'+str(init_month).zfill(2)))
        new = new.rename({'time': 'lead'}).isel(lead=slice(0, lead_time.size))
        new['lead'] = lead_time
        return new

    def create_pseudo_members(control):
        startlist = np.random.randint(c_start, c_end - length - 1, nmember)
        return xr.concat(
            (sel_years(control, start, length)
             for start in startlist), 'member'
        )

bradyrx added 3 commits June 7, 2019 13:33

make time slicing a function

db55ebb

docstrings and errors

68fb5dc

add support for monthly for compute_hindcast

881cc31

bradyrx added don't merge enhancement labels Jun 7, 2019

Merge branch 'master' into add_sub_annual

595b6d0

bradyrx closed this Jun 7, 2019

bradyrx reopened this Jun 7, 2019

ahuang11 reviewed Jun 8, 2019

View reviewed changes

Merge branch 'master' into add_sub_annual

b420f0b

ahuang11 reviewed Jun 19, 2019

View reviewed changes

bradyrx closed this Jun 25, 2019

bradyrx mentioned this pull request Aug 15, 2019

adapt for sub-annual predictions #44

Closed

4 tasks

aaronspring reopened this Sep 21, 2019

aaronspring closed this Sep 21, 2019

bradyrx deleted the add_sub_annual branch September 23, 2019 19:54

aaronspring mentioned this pull request Nov 24, 2019

Subannual #265

Closed

8 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add support for monthly-resolution predictions #172

add support for monthly-resolution predictions #172

bradyrx commented Jun 7, 2019

bradyrx commented Jun 7, 2019

bradyrx commented Jun 7, 2019

bradyrx commented Jun 7, 2019

bradyrx commented Jun 7, 2019

aaronspring commented Jun 8, 2019 •

edited

Loading

aaronspring commented Jun 8, 2019

aaronspring commented Jun 8, 2019

ahuang11 Jun 8, 2019

ahuang11 Jun 8, 2019

ahuang11 Jun 8, 2019

aaronspring commented Jun 8, 2019

bradyrx commented Jun 10, 2019

ahuang11 commented Jun 10, 2019 via email

aaronspring commented Jun 10, 2019

bradyrx commented Jun 10, 2019

aaronspring commented Jun 11, 2019

bradyrx commented Jun 11, 2019

bradyrx commented Jun 12, 2019

ahuang11 commented Jun 19, 2019 •

edited

Loading

ahuang11 Jun 19, 2019

bradyrx commented Jun 19, 2019

ahuang11 commented Jun 19, 2019

bradyrx commented Jun 25, 2019

aaronspring commented Sep 21, 2019

add support for monthly-resolution predictions #172

add support for monthly-resolution predictions #172

Conversation

bradyrx commented Jun 7, 2019

Description

Type of change

How Has This Been Tested?

Checklist (while developing)

Pre-Merge Checklist (final steps)

Todo

bradyrx commented Jun 7, 2019

bradyrx commented Jun 7, 2019

bradyrx commented Jun 7, 2019

bradyrx commented Jun 7, 2019

aaronspring commented Jun 8, 2019 • edited Loading

aaronspring commented Jun 8, 2019

aaronspring commented Jun 8, 2019

ahuang11 Jun 8, 2019

Choose a reason for hiding this comment

ahuang11 Jun 8, 2019

Choose a reason for hiding this comment

ahuang11 Jun 8, 2019

Choose a reason for hiding this comment

aaronspring commented Jun 8, 2019

bradyrx commented Jun 10, 2019

ahuang11 commented Jun 10, 2019 via email

aaronspring commented Jun 10, 2019

bradyrx commented Jun 10, 2019

using cftime in xarray

cftime for climpred

References

aaronspring commented Jun 11, 2019

bradyrx commented Jun 11, 2019

bradyrx commented Jun 12, 2019

ahuang11 commented Jun 19, 2019 • edited Loading

ahuang11 Jun 19, 2019

Choose a reason for hiding this comment

bradyrx commented Jun 19, 2019

ahuang11 commented Jun 19, 2019

bradyrx commented Jun 25, 2019

aaronspring commented Sep 21, 2019

aaronspring commented Jun 8, 2019 •

edited

Loading

cftime for `climpred`

ahuang11 commented Jun 19, 2019 •

edited

Loading