bootstrap p_value, CI for hindcasts #108

aaronspring · 2019-05-03T17:03:38Z

CESM: uninitialized DPLE from LENS
MPI: uninitialized from historical

based on bootstrap_perfect_model (Goddard et al. 2013)
gives: p_value, skill, ci for init, uninit, pers

aaronspring · 2019-05-03T17:04:29Z

https://github.com/bradyrx/climpred/issues/60

aaronspring · 2019-05-03T17:15:05Z

@bradyrx could you please also provide the CESM-LE with all members?

cesm_le = open_dataset('CESM-LE')
I would use this to resample from similar forcing years as suggested in Goddard 2013.

bradyrx · 2019-05-03T17:16:04Z

Yes, I'll add that if I have time today. This is not required for #107, correct?

aaronspring · 2019-05-03T17:20:56Z

no, just for this future PR about bootstraping uninit. I will also try to add some miklip global output.

aaronspring · 2019-05-10T19:27:09Z

@bradyrx
compute_reference uses _shift before applying the metric. In that way there are different number of init/time going into each lead. For each lead obs/ref.time.size decreases. Is this ok? Looks a bit odd to me.

For the perfect-model I don't have different input lenghts. Therefore I also cannot reproduce compute_reference with compute_perfect_model.

aaronspring · 2019-05-11T10:59:36Z

I wrote a few tests which a compute_reference function needs to pass to allow for bootstrapping with replacement.

The existing function based on _shift fails to give reasonable results for the last one. Therefore I re-wrote it to compute_reference_mod below.

# check for useful output? now only checks if gets through
def test_compute_reference_non_continuous():
    """Test non continuous inits."""
    assert not compute_reference_mod(dple.sel(init=dple.init.values[::2]), ersst).isnull().any()

test_compute_reference_non_continuous()


def test_compute_reference_random_order():
    """Test random inits as in resampling without replacement."""
    inits=np.copy(dple.init.values)
    np.random.shuffle(inits)
    assert not compute_reference_mod(dple.sel(init=inits), ersst.sel(time=inits)).isnull().any()

test_compute_reference_random_order()


# compute_reference gives results around 0 on not detrended inputs
def test_compute_reference_inits_with_replacement():
    """Test resampling with replacement as needed for bootstrapping."""
    inits = dple.init.values
    inits = np.random.choice(inits, size=len(inits))
    assert not compute_reference_mod(dple.sel(init=inits), ersst).isnull().any()

test_compute_reference_inits_with_replacement()


def compute_reference_mod(ds,
                      reference,
                      metric='pearson_r',
                      comparison='e2r'):
    check_xarray(ds)
    check_xarray(reference)
    nlags = ds.lead.size

    comparison = get_comparison_function(comparison)
    if comparison not in [_e2r, _m2r]:
        raise ValueError("""Please input either 'e2r' or 'm2r' for your
            comparison.""")
    forecast, reference = comparison(ds, reference)

    metric = get_metric_function(metric)
    if metric not in [_pearson_r, _rmse, _mse, _mae]:
        raise ValueError("""Please input 'pearson_r', 'rmse', 'mse', or
            'mae' for your metric.""")

    # think in real time dimension: real time = init + lag
    forecast = forecast.rename({'init':'time'})

    # take only inits for which we have references at all leads
    imin = max(forecast.time.min(), reference.time.min())
    imax = min(forecast.time.max(), reference.time.max()-nlags)
    forecast = forecast.where(forecast.time <= imax, drop=True)
    forecast = forecast.where(forecast.time >= imin, drop=True)
    reference = reference.where(reference.time >= imin, drop=True)

    assert forecast.time.max() - nlags <= reference.time.max()

    plag = []
    # iterate over all leads (accounts for lead.min() in [0,1])
    for i in forecast.lead.values:
        # take lead year i timeseries and convert to real time
        a = forecast.sel(lead=i)
        a['time']=[t+i for t in a.time.values]
        # take real time reference of real time forecast years
        b = reference.sel(time=a.time.values)
        plag.append(metric(a, b, dim='time'))
    skill = xr.concat(plag, 'lead')
    skill['lead'] = dple.lead.values
    return skill

This new function gives (slightly) different results, as all lead years have the same number of init/time (see comment above).

@bradyrx Is this new function acceptable to you? It follows the same line of thought as before, but doesnt trim time/init as before to get the most possible for each lead but rather for all inits equally.

The bootstrapping then just follows naturally.

aaronspring · 2019-05-11T11:20:21Z

Overview of results. I am unsure which of the results is more important to you. I tend to ignore the not detrended results as this skill is artificial due to external forcing.

For CESM output the skill degrades compared to before (Just this GMSST case here!). I tried to compare DPLE and MPI miklip input. I didnt find huge differences. Maybe in the postprocessing of output we handle init/time differently. Thats my only explanation.

The main reasoning behind my approach is that real time = init+lead. But thats also the case for your compute_reference function, isn't it?

@bradyrx: Please have a look into the compute_reference_mod code and check for flaws in the design. I am quite sure I did it right, but maybe I got some part of the matching in real time space wrong.

bradyrx · 2019-05-15T19:40:53Z

This was in the gitter, reproduced here. In answer to why I wrote it the way I did. Hopefully this is what you were asking.

So the size of the reference should decrease at each lead. For DPLE, we initialize every year in November from 1954-2017.

Let's say your reconstruction/assimilation covers 1955-2017 (which it does for CESM). At lead-one, we lose the last initialization of DPLE since there is no 2018 data for FOSI.
At lead-two, we lose the last two initializations of DPLE since there is no 2018 or 2019 for FOSI (i.e., we lose the Nov 2016 and Nov 2017 initialization since those correspond to 2018 and 2019 at lead-two).

And so on. So you reduce by one time step for each lead. I think we used to trim the time series down to the common length before. So if you are computing to lead-10 you just trim 10 steps before and compare. But that didn't seem like we were doing statistics to their full capacity, so I made it so that at each lead it adjusts. So we get 62 comparisons at lead-1, 61 at lead-2, and so on.

bradyrx · 2019-05-15T19:41:07Z

If I have time at home tonight I will review your code snippet. Otherwise I will review once it is a PR.

bradyrx · 2019-05-22T03:52:55Z

Addressed in https://github.com/bradyrx/climpred/pull/127

aaronspring added the enhancement label May 3, 2019

aaronspring self-assigned this May 3, 2019

bradyrx mentioned this issue May 8, 2019

update CESM-LE (uninitialized) to have 34 ensemble members for SST #113

Merged

aaronspring changed the title ~~bootstrap uninitialized forecasts from historical~~ bootstrap p_value, CI for hindcasts May 9, 2019

aaronspring added the high priority label May 10, 2019

aaronspring mentioned this issue May 16, 2019

Reconciliation_hindcast_pm #127

Closed

14 tasks

aaronspring mentioned this issue May 18, 2019

Reconciliation hindcast pm again #128

Merged

15 tasks

bradyrx closed this as completed May 22, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bootstrap p_value, CI for hindcasts #108

bootstrap p_value, CI for hindcasts #108

aaronspring commented May 3, 2019

aaronspring commented May 3, 2019

aaronspring commented May 3, 2019

bradyrx commented May 3, 2019

aaronspring commented May 3, 2019

aaronspring commented May 10, 2019

aaronspring commented May 11, 2019

aaronspring commented May 11, 2019

bradyrx commented May 15, 2019

bradyrx commented May 15, 2019

bradyrx commented May 22, 2019

bootstrap p_value, CI for hindcasts #108

bootstrap p_value, CI for hindcasts #108

Comments

aaronspring commented May 3, 2019

aaronspring commented May 3, 2019

aaronspring commented May 3, 2019

bradyrx commented May 3, 2019

aaronspring commented May 3, 2019

aaronspring commented May 10, 2019

aaronspring commented May 11, 2019

aaronspring commented May 11, 2019

bradyrx commented May 15, 2019

bradyrx commented May 15, 2019

bradyrx commented May 22, 2019