refactor bootstrap: first resample, then metric #355

aaronspring · 2020-04-19T11:52:18Z

Description

first resample, then metric except for Hindcast resample_dim=init which fails otherwise
small modifications inasv
fix transform_coords=False only in xr.DataArrays (not in xr.Datasets)
linting: l--> lead fix, and f-string only if needed
no changes in classes yet (this should be another PR)
replace _get_chunksize with da.data.npartitions
put bootstrap stats functions to end of file

Type of change

Please delete options that are not relevant.

Performance (if you touched existing code run asv to detect performance changes)
refactoring

How Has This Been Tested?

Please describe the tests that you ran to verify your changes. This could point to a cell in the updated notebooks. Or a snippet of code with accompanying figures here.

Checklist (while developing)

I have added docstrings to all new functions.
I have commented my code, particularly in hard-to-understand areas
Tests added for pytest, if necessary.
I have updated the sphinx documentation, if necessary.

Pre-Merge Checklist (final steps)

I have rebased onto master or develop (wherever I am merging) and dealt with any conflicts.
I have squashed commits to a reasonable amount, and force-pushed the squashed commits.

References

Please add any references to manuscripts, textbooks, etc.

bradyrx · 2020-04-19T17:59:25Z

I'll review this once #354 is merged.

coveralls · 2020-04-23T12:30:01Z

Coverage increased (+0.1%) to 90.975% when pulling 2d4fa2b on AS_bootstrap_metric_refactor into 416c214 on master.

aaronspring · 2020-04-23T13:44:50Z

I am unsure why I dont see the speedup here.

it is definitely faster than before the resample PR.
Now for perfect-model I implemented the bootstrap uninit in a way that I generate more members than needed and then resample from those.

asv continuous -f 1.1 d82f115748f325c240ea7f2f185081ee365f1701 HEAD -b benchmarks_perfect_model.ComputeSmall.time_compute_perfect_model

[ 50.00%] · For climpred commit 3a7cf0a8 <pr-355/bradyrx/AS_bootstrap_metric_refactor> (round 2/2):
[ 50.00%] ·· Benchmarking conda-py3.6-cftime-dask-numpy-xarray
[ 75.00%] ··· benchmarks_perfect_model.ComputeSmall.time_compute_perfect_model                                                                ok
[ 75.00%] ··· =========== =========== ============
              --                 comparison
              ----------- ------------------------
                 metric       m2m         m2c
              =========== =========== ============
                  rmse     52.7±10ms   13.0±0.5ms
               pearson_r   46.5±40ms    17.5±4ms
                 crpss     45.2±10ms   27.3±10ms
              =========== =========== ============

[ 75.00%] · For climpred commit d82f1157 <master~6> (round 2/2):
[ 75.00%] ·· Building for conda-py3.6-cftime-dask-numpy-xarray..
[ 75.00%] ·· Benchmarking conda-py3.6-cftime-dask-numpy-xarray
[100.00%] ··· benchmarks_perfect_model.ComputeSmall.time_compute_perfect_model                                                                ok
[100.00%] ··· =========== =========== ===========
              --                 comparison
              ----------- -----------------------
                 metric       m2m         m2c
              =========== =========== ===========
                  rmse      38.3±4ms   29.0±20ms
               pearson_r    44.2±8ms    16.7±1ms
                 crpss     49.2±20ms    21.1±6ms
              =========== =========== ===========


BENCHMARKS NOT SIGNIFICANTLY CHANGED.

aaronspring · 2020-04-23T14:23:44Z

do you get a speedup @bradyrx ? we should somehow compare a version from two weeks ago with this branch. hwoever, I change the keyword bootstrap to iterations. I wonder how asv wants to test this

climpred/bootstrap.py

aaronspring · 2020-04-23T14:45:00Z

climpred/bootstrap.py

+        else:
+            pers_output = False
+
+    else:  # if resample_dim == 'member':


this in the new stuff. first bootstrapping, then metric. later on and in the classes we can tear those calls apart

climpred/comparisons.py

aaronspring · 2020-04-23T15:10:52Z

incredibly fast

fosi = load_dataset('FOSI-SST')
le = load_dataset('CESM-LE')
dple = load_dataset('CESM-DP-SST')


%time bs = climpred.bootstrap.bootstrap_hindcast(dple,le,fosi,dim='init',resample_dim='member',iterations=1000)
/Users/aaron.spring/Coding/climpred/climpred/utils.py:141: UserWarning: Assuming annual resolution due to numeric inits. Change init to a datetime if it is another resolution.
  'Assuming annual resolution due to numeric inits. '
CPU times: user 1.89 s, sys: 82.9 ms, total: 1.97 s
Wall time: 1.99 s

bradyrx · 2020-04-23T16:25:27Z

Sorry for the iterations keyword change. It seems to be causing problems.

Awesome that this ran so fast! 1.99s for 1000 iterations. Now, that's just 1D. Do you have a comparison of the time for an earlier version? I.e. what is the time of that call on the currently released version?

Maybe you can make a fake hindcast case (or use an MPI hindcast) with a 2D grid and many iterations using dask if you have time.

bradyrx · 2020-04-23T16:26:03Z

This is great. And yes, we can separate calls, etc. on the classes implementation. Is this ready for review? Ping me when it is. I'll be gone this afternoon through Sunday, so I can review first thing Monday.

aaronspring · 2020-04-23T19:28:33Z

Ready for review.

aaronspring · 2020-04-24T10:33:30Z

Sorry for the iterations keyword change. It seems to be causing problems.

Awesome that this ran so fast! 1.99s for 1000 iterations. Now, that's just 1D. Do you have a comparison of the time for an earlier version? I.e. what is the time of that call on the currently released version?

Maybe you can make a fake hindcast case (or use an MPI hindcast) with a 2D grid and many iterations using dask if you have time.

asv hindcast is doing that

aaronspring · 2020-04-24T13:52:24Z

I think we should also check whether the new reaample works one a dim without labels. I remember this failed.

… big_chunked data

climpred/bootstrap.py

aaronspring · 2020-05-23T14:26:55Z

on may macbook:


       before           after         ratio
     [416c2148]       [ef189ad2]
     <AS_progressbar>       <AS_bootstrap_metric_refactor>
-       13.5±0.7s          269±2ms     0.02  benchmarks_perfect_model.ComputeSmall.time_bootstrap_perfect_model('crpss', 'm2c')
-         18.2±1s          292±2ms     0.02  benchmarks_perfect_model.ComputeSmall.time_bootstrap_perfect_model('rmse', 'm2c')
-       19.4±0.5s        298±0.7ms     0.02  benchmarks_perfect_model.ComputeSmall.time_bootstrap_perfect_model('pearson_r', 'm2c')
-         25.0±2s         312±10ms     0.01  benchmarks_perfect_model.ComputeSmall.time_bootstrap_perfect_model('crpss', 'm2m')
-         29.6±1s          335±1ms     0.01  benchmarks_perfect_model.ComputeSmall.time_bootstrap_perfect_model('rmse', 'm2m')
-       31.0±0.3s          344±1ms     0.01  benchmarks_perfect_model.ComputeSmall.time_bootstrap_perfect_model('pearson_r', 'm2m')

SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.
PERFORMANCE INCREASED.

on supercomputer:

       before           after         ratio
     [416c2148]       [721f86db]
     <355~19^2^2>       <355>
-       15.8±0.1s        334±0.6ms     0.02  benchmarks_perfect_model.ComputeSmall.time_bootstrap_perfect_model('crpss', 'm2c')
-       21.6±0.1s        362±0.9ms     0.02  benchmarks_perfect_model.ComputeSmall.time_bootstrap_perfect_model('rmse', 'm2c')
-      24.0±0.07s          369±1ms     0.02  benchmarks_perfect_model.ComputeSmall.time_bootstrap_perfect_model('pearson_r', 'm2c')
-       29.0±0.1s          384±2ms     0.01  benchmarks_perfect_model.ComputeSmall.time_bootstrap_perfect_model('crpss', 'm2m')
-       36.2±0.1s          415±9ms     0.01  benchmarks_perfect_model.ComputeSmall.time_bootstrap_perfect_model('rmse', 'm2m')
-       38.5±0.4s        423±0.8ms     0.01  benchmarks_perfect_model.ComputeSmall.time_bootstrap_perfect_model('pearson_r', 'm2m')

SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.
PERFORMANCE INCREASED.

climpred/bootstrap.py

bradyrx · 2020-05-28T17:57:37Z

@aaronspring, thanks for all of the work on this. You've done a lot here! Glad we can speed up the bootstrapping a lot in this manner. Go ahead and squash & merge if it's ready, unless there's anything else that needs to be done here.

aaronspring · 2020-05-28T19:03:34Z

This won’t be the last commit on bootstrapping parallel but it’s a bi g leap

bradyrx · 2020-05-28T21:11:13Z

Congrats on getting through this. Check #145, #60, #369 since they auto-closed. Might want some to stay open since there's still more to do here eventually.

aaronspring · 2020-05-28T21:21:33Z

I didn’t know about the autoclose feature. Nice but unexpected and potentially dangerous... fine with me now

bradyrx · 2020-05-28T21:26:27Z

Yeah I think it looks for "closes #" in the PR or anything similar.

refactor bootstrap: first resample, then metric

c65c04d

aaronspring marked this pull request as draft April 19, 2020 11:52

aaronspring self-assigned this Apr 19, 2020

aaronspring mentioned this pull request Apr 19, 2020

resample idx #354

Merged

6 tasks

bradyrx added the performance label Apr 22, 2020

AS and others added 5 commits April 23, 2020 11:46

Merge branch 'master' into pr-355/bradyrx/AS_bootstrap_metric_refactor

151c942

Merge branch 'master' into AS_bootstrap_metric_refactor

fd35e8e

fix signif notebook

8526ca9

fix asv warnings

11c140f

fix too large iteration number for bootstrapping CI

3a7cf0a

rm print

d2f4038

aaronspring commented Apr 23, 2020

View reviewed changes

climpred/bootstrap.py Show resolved Hide resolved

aaronspring commented Apr 23, 2020

View reviewed changes

climpred/comparisons.py Show resolved Hide resolved

rm print

bb3d71b

aaronspring marked this pull request as ready for review April 23, 2020 14:53

aaronspring requested a review from bradyrx April 23, 2020 14:58

This was linked to issues Apr 24, 2020

parallelize bootstrap #145

Closed

When refactoring bootstrap, only do time alignment once for 'member' #340

Closed

aaronspring removed a link to an issue Apr 24, 2020

When refactoring bootstrap, only do time alignment once for 'member' #340

Closed

AS added 6 commits May 23, 2020 11:55

resample_iterations_idx for small eager data, resample_iterations for…

6eadd93

… big_chunked data

cleanup a bit

1941d6d

cleanup

e417d2d

fix get_path and cleanup bootstrap

92ded3a

refactor bootstrap.py: bootstrap stats to end of file

ab8444d

add bootstrap tests and rm get_chunksize

80870b1

aaronspring linked an issue May 23, 2020 that may be closed by this pull request

_resample_iterations_idx memory, chunking, ... #369

Closed

fix performance warning ds

ef189ad

aaronspring commented May 23, 2020

View reviewed changes

climpred/bootstrap.py Show resolved Hide resolved

aaronspring requested a review from bradyrx May 23, 2020 14:27

AS added 3 commits May 23, 2020 18:03

asv distributed

8e9170a

fix hindcast iterations label

03838f0

bootstrap_hindcast uninit fix

ef2d332

aaronspring changed the title ~~WIP: refactor bootstrap: first resample, then metric~~ refactor bootstrap: first resample, then metric May 23, 2020

aaronspring mentioned this pull request May 23, 2020

Extremely long task graph times for resampling with replacement pangeo-data/pangeo#764

Closed

aaronspring commented May 23, 2020

View reviewed changes

climpred/bootstrap.py Show resolved Hide resolved

AS added 2 commits May 24, 2020 13:46

increase coverage

8bbba54

bootstrap uninit resample less memory consumption, dim_max

5e3fc1e

aaronspring commented May 25, 2020

View reviewed changes

climpred/bootstrap.py Show resolved Hide resolved

crease testing and fix persistence coords copy iterations bug

2d4fa2b

aaronspring requested a review from ahuang11 May 25, 2020 16:06

bradyrx approved these changes May 28, 2020

View reviewed changes

aaronspring merged commit fc5bcb7 into master May 28, 2020

aaronspring deleted the AS_bootstrap_metric_refactor branch July 6, 2020 19:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor bootstrap: first resample, then metric #355

refactor bootstrap: first resample, then metric #355

aaronspring commented Apr 19, 2020 •

edited

Loading

bradyrx commented Apr 19, 2020

coveralls commented Apr 23, 2020 •

edited

Loading

aaronspring commented Apr 23, 2020 •

edited

Loading

aaronspring commented Apr 23, 2020

aaronspring Apr 23, 2020

aaronspring commented Apr 23, 2020

bradyrx commented Apr 23, 2020

bradyrx commented Apr 23, 2020

aaronspring commented Apr 23, 2020

aaronspring commented Apr 24, 2020

aaronspring commented Apr 24, 2020

aaronspring commented May 23, 2020 •

edited

Loading

bradyrx commented May 28, 2020

aaronspring commented May 28, 2020

bradyrx commented May 28, 2020

aaronspring commented May 28, 2020

bradyrx commented May 28, 2020

refactor bootstrap: first resample, then metric #355

refactor bootstrap: first resample, then metric #355

Conversation

aaronspring commented Apr 19, 2020 • edited Loading

Description

Type of change

How Has This Been Tested?

Checklist (while developing)

Pre-Merge Checklist (final steps)

References

bradyrx commented Apr 19, 2020

coveralls commented Apr 23, 2020 • edited Loading

aaronspring commented Apr 23, 2020 • edited Loading

aaronspring commented Apr 23, 2020

aaronspring Apr 23, 2020

Choose a reason for hiding this comment

aaronspring commented Apr 23, 2020

bradyrx commented Apr 23, 2020

bradyrx commented Apr 23, 2020

aaronspring commented Apr 23, 2020

aaronspring commented Apr 24, 2020

aaronspring commented Apr 24, 2020

aaronspring commented May 23, 2020 • edited Loading

bradyrx commented May 28, 2020

aaronspring commented May 28, 2020

bradyrx commented May 28, 2020

aaronspring commented May 28, 2020

bradyrx commented May 28, 2020

aaronspring commented Apr 19, 2020 •

edited

Loading

coveralls commented Apr 23, 2020 •

edited

Loading

aaronspring commented Apr 23, 2020 •

edited

Loading

aaronspring commented May 23, 2020 •

edited

Loading