Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor bootstrap: first resample, then metric #355

Merged
merged 43 commits into from
May 28, 2020

Conversation

aaronspring
Copy link
Collaborator

@aaronspring aaronspring commented Apr 19, 2020

Description

  • first resample, then metric except for Hindcast resample_dim=init which fails otherwise
  • small modifications inasv
  • fix transform_coords=False only in xr.DataArrays (not in xr.Datasets)
  • linting: l--> lead fix, and f-string only if needed
  • no changes in classes yet (this should be another PR)
  • replace _get_chunksize with da.data.npartitions
  • put bootstrap stats functions to end of file

works on #145 #60 #369

Type of change

Please delete options that are not relevant.

  • Performance (if you touched existing code run asv to detect performance changes)
  • refactoring

How Has This Been Tested?

Please describe the tests that you ran to verify your changes. This could point to a cell in the updated notebooks. Or a snippet of code with accompanying figures here.

Checklist (while developing)

  • I have added docstrings to all new functions.
  • I have commented my code, particularly in hard-to-understand areas
  • Tests added for pytest, if necessary.
  • I have updated the sphinx documentation, if necessary.

Pre-Merge Checklist (final steps)

  • I have rebased onto master or develop (wherever I am merging) and dealt with any conflicts.
  • I have squashed commits to a reasonable amount, and force-pushed the squashed commits.

References

Please add any references to manuscripts, textbooks, etc.

@aaronspring aaronspring marked this pull request as draft April 19, 2020 11:52
@aaronspring aaronspring self-assigned this Apr 19, 2020
@aaronspring aaronspring mentioned this pull request Apr 19, 2020
6 tasks
@bradyrx
Copy link
Collaborator

bradyrx commented Apr 19, 2020

I'll review this once #354 is merged.

@coveralls
Copy link

coveralls commented Apr 23, 2020

Coverage Status

Coverage increased (+0.1%) to 90.975% when pulling 2d4fa2b on AS_bootstrap_metric_refactor into 416c214 on master.

@aaronspring
Copy link
Collaborator Author

aaronspring commented Apr 23, 2020

I am unsure why I dont see the speedup here.

it is definitely faster than before the resample PR.
Now for perfect-model I implemented the bootstrap uninit in a way that I generate more members than needed and then resample from those.

asv continuous -f 1.1 d82f115748f325c240ea7f2f185081ee365f1701 HEAD -b benchmarks_perfect_model.ComputeSmall.time_compute_perfect_model

[ 50.00%] · For climpred commit 3a7cf0a8 <pr-355/bradyrx/AS_bootstrap_metric_refactor> (round 2/2):
[ 50.00%] ·· Benchmarking conda-py3.6-cftime-dask-numpy-xarray
[ 75.00%] ··· benchmarks_perfect_model.ComputeSmall.time_compute_perfect_model                                                                ok
[ 75.00%] ··· =========== =========== ============
              --                 comparison
              ----------- ------------------------
                 metric       m2m         m2c
              =========== =========== ============
                  rmse     52.7±10ms   13.0±0.5ms
               pearson_r   46.5±40ms    17.5±4ms
                 crpss     45.2±10ms   27.3±10ms
              =========== =========== ============

[ 75.00%] · For climpred commit d82f1157 <master~6> (round 2/2):
[ 75.00%] ·· Building for conda-py3.6-cftime-dask-numpy-xarray..
[ 75.00%] ·· Benchmarking conda-py3.6-cftime-dask-numpy-xarray
[100.00%] ··· benchmarks_perfect_model.ComputeSmall.time_compute_perfect_model                                                                ok
[100.00%] ··· =========== =========== ===========
              --                 comparison
              ----------- -----------------------
                 metric       m2m         m2c
              =========== =========== ===========
                  rmse      38.3±4ms   29.0±20ms
               pearson_r    44.2±8ms    16.7±1ms
                 crpss     49.2±20ms    21.1±6ms
              =========== =========== ===========


BENCHMARKS NOT SIGNIFICANTLY CHANGED.

@aaronspring
Copy link
Collaborator Author

do you get a speedup @bradyrx ? we should somehow compare a version from two weeks ago with this branch. hwoever, I change the keyword bootstrap to iterations. I wonder how asv wants to test this

else:
pers_output = False

else: # if resample_dim == 'member':
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this in the new stuff. first bootstrapping, then metric. later on and in the classes we can tear those calls apart

@aaronspring aaronspring marked this pull request as ready for review April 23, 2020 14:53
@aaronspring aaronspring requested a review from bradyrx April 23, 2020 14:58
@aaronspring
Copy link
Collaborator Author

incredibly fast

fosi = load_dataset('FOSI-SST')
le = load_dataset('CESM-LE')
dple = load_dataset('CESM-DP-SST')


%time bs = climpred.bootstrap.bootstrap_hindcast(dple,le,fosi,dim='init',resample_dim='member',iterations=1000)
/Users/aaron.spring/Coding/climpred/climpred/utils.py:141: UserWarning: Assuming annual resolution due to numeric inits. Change init to a datetime if it is another resolution.
  'Assuming annual resolution due to numeric inits. '
CPU times: user 1.89 s, sys: 82.9 ms, total: 1.97 s
Wall time: 1.99 s

@bradyrx
Copy link
Collaborator

bradyrx commented Apr 23, 2020

Sorry for the iterations keyword change. It seems to be causing problems.

Awesome that this ran so fast! 1.99s for 1000 iterations. Now, that's just 1D. Do you have a comparison of the time for an earlier version? I.e. what is the time of that call on the currently released version?

Maybe you can make a fake hindcast case (or use an MPI hindcast) with a 2D grid and many iterations using dask if you have time.

@bradyrx
Copy link
Collaborator

bradyrx commented Apr 23, 2020

This is great. And yes, we can separate calls, etc. on the classes implementation. Is this ready for review? Ping me when it is. I'll be gone this afternoon through Sunday, so I can review first thing Monday.

@aaronspring
Copy link
Collaborator Author

Ready for review.

@aaronspring
Copy link
Collaborator Author

Sorry for the iterations keyword change. It seems to be causing problems.

Awesome that this ran so fast! 1.99s for 1000 iterations. Now, that's just 1D. Do you have a comparison of the time for an earlier version? I.e. what is the time of that call on the currently released version?

Maybe you can make a fake hindcast case (or use an MPI hindcast) with a 2D grid and many iterations using dask if you have time.

asv hindcast is doing that

@aaronspring
Copy link
Collaborator Author

I think we should also check whether the new reaample works one a dim without labels. I remember this failed.

@aaronspring aaronspring linked an issue May 23, 2020 that may be closed by this pull request
@aaronspring
Copy link
Collaborator Author

aaronspring commented May 23, 2020

on may macbook:


       before           after         ratio
     [416c2148]       [ef189ad2]
     <AS_progressbar>       <AS_bootstrap_metric_refactor>
-       13.5±0.7s          269±2ms     0.02  benchmarks_perfect_model.ComputeSmall.time_bootstrap_perfect_model('crpss', 'm2c')
-         18.2±1s          292±2ms     0.02  benchmarks_perfect_model.ComputeSmall.time_bootstrap_perfect_model('rmse', 'm2c')
-       19.4±0.5s        298±0.7ms     0.02  benchmarks_perfect_model.ComputeSmall.time_bootstrap_perfect_model('pearson_r', 'm2c')
-         25.0±2s         312±10ms     0.01  benchmarks_perfect_model.ComputeSmall.time_bootstrap_perfect_model('crpss', 'm2m')
-         29.6±1s          335±1ms     0.01  benchmarks_perfect_model.ComputeSmall.time_bootstrap_perfect_model('rmse', 'm2m')
-       31.0±0.3s          344±1ms     0.01  benchmarks_perfect_model.ComputeSmall.time_bootstrap_perfect_model('pearson_r', 'm2m')

SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.
PERFORMANCE INCREASED.

on supercomputer:

       before           after         ratio
     [416c2148]       [721f86db]
     <355~19^2^2>       <355>
-       15.8±0.1s        334±0.6ms     0.02  benchmarks_perfect_model.ComputeSmall.time_bootstrap_perfect_model('crpss', 'm2c')
-       21.6±0.1s        362±0.9ms     0.02  benchmarks_perfect_model.ComputeSmall.time_bootstrap_perfect_model('rmse', 'm2c')
-      24.0±0.07s          369±1ms     0.02  benchmarks_perfect_model.ComputeSmall.time_bootstrap_perfect_model('pearson_r', 'm2c')
-       29.0±0.1s          384±2ms     0.01  benchmarks_perfect_model.ComputeSmall.time_bootstrap_perfect_model('crpss', 'm2m')
-       36.2±0.1s          415±9ms     0.01  benchmarks_perfect_model.ComputeSmall.time_bootstrap_perfect_model('rmse', 'm2m')
-       38.5±0.4s        423±0.8ms     0.01  benchmarks_perfect_model.ComputeSmall.time_bootstrap_perfect_model('pearson_r', 'm2m')

SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.
PERFORMANCE INCREASED.

@aaronspring aaronspring requested a review from bradyrx May 23, 2020 14:27
@aaronspring aaronspring changed the title WIP: refactor bootstrap: first resample, then metric refactor bootstrap: first resample, then metric May 23, 2020
@aaronspring aaronspring requested a review from ahuang11 May 25, 2020 16:06
@bradyrx
Copy link
Collaborator

bradyrx commented May 28, 2020

@aaronspring, thanks for all of the work on this. You've done a lot here! Glad we can speed up the bootstrapping a lot in this manner. Go ahead and squash & merge if it's ready, unless there's anything else that needs to be done here.

@aaronspring
Copy link
Collaborator Author

This won’t be the last commit on bootstrapping parallel but it’s a bi g leap

@aaronspring aaronspring merged commit fc5bcb7 into master May 28, 2020
@bradyrx
Copy link
Collaborator

bradyrx commented May 28, 2020

Congrats on getting through this. Check #145, #60, #369 since they auto-closed. Might want some to stay open since there's still more to do here eventually.

@aaronspring
Copy link
Collaborator Author

I didn’t know about the autoclose feature. Nice but unexpected and potentially dangerous... fine with me now

@bradyrx
Copy link
Collaborator

bradyrx commented May 28, 2020

Yeah I think it looks for "closes #" in the PR or anything similar.

@aaronspring aaronspring deleted the AS_bootstrap_metric_refactor branch July 6, 2020 19:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

_resample_iterations_idx memory, chunking, ... parallelize bootstrap
4 participants