Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bootstrap p_value, CI for hindcasts #108

Closed
aaronspring opened this issue May 3, 2019 · 10 comments
Closed

bootstrap p_value, CI for hindcasts #108

aaronspring opened this issue May 3, 2019 · 10 comments
Assignees

Comments

@aaronspring
Copy link
Collaborator

  • CESM: uninitialized DPLE from LENS
  • MPI: uninitialized from historical

based on bootstrap_perfect_model (Goddard et al. 2013)
gives: p_value, skill, ci for init, uninit, pers

@aaronspring aaronspring self-assigned this May 3, 2019
@aaronspring
Copy link
Collaborator Author

@aaronspring
Copy link
Collaborator Author

@bradyrx could you please also provide the CESM-LE with all members?

cesm_le = open_dataset('CESM-LE')
I would use this to resample from similar forcing years as suggested in Goddard 2013.

@bradyrx
Copy link
Collaborator

bradyrx commented May 3, 2019

Yes, I'll add that if I have time today. This is not required for #107, correct?

@aaronspring
Copy link
Collaborator Author

no, just for this future PR about bootstraping uninit. I will also try to add some miklip global output.

@aaronspring aaronspring changed the title bootstrap uninitialized forecasts from historical bootstrap p_value, CI for hindcasts May 9, 2019
@aaronspring
Copy link
Collaborator Author

@bradyrx
compute_reference uses _shift before applying the metric. In that way there are different number of init/time going into each lead. For each lead obs/ref.time.size decreases. Is this ok? Looks a bit odd to me.

For the perfect-model I don't have different input lenghts. Therefore I also cannot reproduce compute_reference with compute_perfect_model.

@aaronspring
Copy link
Collaborator Author

I wrote a few tests which a compute_reference function needs to pass to allow for bootstrapping with replacement.

The existing function based on _shift fails to give reasonable results for the last one. Therefore I re-wrote it to compute_reference_mod below.

# check for useful output? now only checks if gets through
def test_compute_reference_non_continuous():
    """Test non continuous inits."""
    assert not compute_reference_mod(dple.sel(init=dple.init.values[::2]), ersst).isnull().any()

test_compute_reference_non_continuous()


def test_compute_reference_random_order():
    """Test random inits as in resampling without replacement."""
    inits=np.copy(dple.init.values)
    np.random.shuffle(inits)
    assert not compute_reference_mod(dple.sel(init=inits), ersst.sel(time=inits)).isnull().any()

test_compute_reference_random_order()


# compute_reference gives results around 0 on not detrended inputs
def test_compute_reference_inits_with_replacement():
    """Test resampling with replacement as needed for bootstrapping."""
    inits = dple.init.values
    inits = np.random.choice(inits, size=len(inits))
    assert not compute_reference_mod(dple.sel(init=inits), ersst).isnull().any()

test_compute_reference_inits_with_replacement()


def compute_reference_mod(ds,
                      reference,
                      metric='pearson_r',
                      comparison='e2r'):
    check_xarray(ds)
    check_xarray(reference)
    nlags = ds.lead.size

    comparison = get_comparison_function(comparison)
    if comparison not in [_e2r, _m2r]:
        raise ValueError("""Please input either 'e2r' or 'm2r' for your
            comparison.""")
    forecast, reference = comparison(ds, reference)

    metric = get_metric_function(metric)
    if metric not in [_pearson_r, _rmse, _mse, _mae]:
        raise ValueError("""Please input 'pearson_r', 'rmse', 'mse', or
            'mae' for your metric.""")

    # think in real time dimension: real time = init + lag
    forecast = forecast.rename({'init':'time'})

    # take only inits for which we have references at all leads
    imin = max(forecast.time.min(), reference.time.min())
    imax = min(forecast.time.max(), reference.time.max()-nlags)
    forecast = forecast.where(forecast.time <= imax, drop=True)
    forecast = forecast.where(forecast.time >= imin, drop=True)
    reference = reference.where(reference.time >= imin, drop=True)

    assert forecast.time.max() - nlags <= reference.time.max()

    plag = []
    # iterate over all leads (accounts for lead.min() in [0,1])
    for i in forecast.lead.values:
        # take lead year i timeseries and convert to real time
        a = forecast.sel(lead=i)
        a['time']=[t+i for t in a.time.values]
        # take real time reference of real time forecast years
        b = reference.sel(time=a.time.values)
        plag.append(metric(a, b, dim='time'))
    skill = xr.concat(plag, 'lead')
    skill['lead'] = dple.lead.values
    return skill

This new function gives (slightly) different results, as all lead years have the same number of init/time (see comment above).

@bradyrx Is this new function acceptable to you? It follows the same line of thought as before, but doesnt trim time/init as before to get the most possible for each lead but rather for all inits equally.

The bootstrapping then just follows naturally.

@aaronspring
Copy link
Collaborator Author

Overview of results. I am unsure which of the results is more important to you. I tend to ignore the not detrended results as this skill is artificial due to external forcing.
4
3
2
1

For CESM output the skill degrades compared to before (Just this GMSST case here!). I tried to compare DPLE and MPI miklip input. I didnt find huge differences. Maybe in the postprocessing of output we handle init/time differently. Thats my only explanation.

The main reasoning behind my approach is that real time = init+lead. But thats also the case for your compute_reference function, isn't it?

@bradyrx: Please have a look into the compute_reference_mod code and check for flaws in the design. I am quite sure I did it right, but maybe I got some part of the matching in real time space wrong.

@bradyrx
Copy link
Collaborator

bradyrx commented May 15, 2019

This was in the gitter, reproduced here. In answer to why I wrote it the way I did. Hopefully this is what you were asking.

So the size of the reference should decrease at each lead. For DPLE, we initialize every year in November from 1954-2017.

Let's say your reconstruction/assimilation covers 1955-2017 (which it does for CESM). At lead-one, we lose the last initialization of DPLE since there is no 2018 data for FOSI.
At lead-two, we lose the last two initializations of DPLE since there is no 2018 or 2019 for FOSI (i.e., we lose the Nov 2016 and Nov 2017 initialization since those correspond to 2018 and 2019 at lead-two).

And so on. So you reduce by one time step for each lead. I think we used to trim the time series down to the common length before. So if you are computing to lead-10 you just trim 10 steps before and compare. But that didn't seem like we were doing statistics to their full capacity, so I made it so that at each lead it adjusts. So we get 62 comparisons at lead-1, 61 at lead-2, and so on.

@bradyrx
Copy link
Collaborator

bradyrx commented May 15, 2019

If I have time at home tonight I will review your code snippet. Otherwise I will review once it is a PR.

@bradyrx
Copy link
Collaborator

bradyrx commented May 22, 2019

@bradyrx bradyrx closed this as completed May 22, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants