Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

better bootstrapping #285

Merged
merged 34 commits into from
Feb 4, 2020
Merged
Show file tree
Hide file tree
Changes from 10 commits
Commits
Show all changes
34 commits
Select commit Hold shift + click to select a range
fd68215
bootstrap.stats parallelized properly
Dec 29, 2019
9da77a2
bootstrap_compute properly parallelized
Dec 29, 2019
21f4199
CL PR_template
Dec 29, 2019
28bc13a
dask demo
Jan 1, 2020
fd29223
dask notebook macbook timings
Jan 2, 2020
9c74bea
dask nb timings mistral
Jan 2, 2020
0690392
tried to fix my_quantile
Jan 2, 2020
45db8e9
my_quantile q=list
Jan 4, 2020
71859fe
comments
Jan 4, 2020
b21bcfb
Merge branch 'master' into AS_fix_bootstrapping
aaronspring Jan 5, 2020
1daf22f
Merge branch 'master' into AS_fix_bootstrapping
bradyrx Jan 12, 2020
ad55f63
minor changes requested
Jan 14, 2020
cd89817
no .load() in bootstrap
Jan 14, 2020
840597f
Merge branch 'master' into AS_fix_bootstrapping
aaronspring Jan 14, 2020
b006b25
contrib asv added
Jan 14, 2020
62521ec
new hindcast benchmarking
Jan 14, 2020
e0cb8b6
rm commented out _load_mem
Jan 14, 2020
96fb4eb
persist smp_hind speedup?
Jan 20, 2020
95150b9
not persist, fixed benchmarks
Jan 24, 2020
c11dd69
Merge branch 'master' into AS_fix_bootstrapping
aaronspring Jan 24, 2020
c25cca2
bf ds.get_axis_num
Jan 24, 2020
d96172d
Merge branch 'master' into AS_fix_bootstrapping
aaronspring Jan 30, 2020
22fee69
PR template and contrib asv
Jan 30, 2020
ec14d50
req changes
Jan 30, 2020
0f87ff0
dask examples docs
Jan 30, 2020
fcd493f
Update test_stats.py
aaronspring Jan 30, 2020
6bcc96a
formatting requested
Feb 3, 2020
c790afa
alphabetical order
Feb 3, 2020
2f2f5a1
require pandas 0.25.3
Feb 3, 2020
0fbb8b1
lint
Feb 3, 2020
f295aaf
resolved comments
Feb 3, 2020
5a27833
benchmarks prob dim=member
Feb 3, 2020
ba85852
get_chunksize takes xr.ds
Feb 4, 2020
abc8e3a
warning caption MB added
Feb 4, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/PULL_REQUEST_TEMPLATE.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

Please include a summary of the change and which issue is fixed. Please also include relevant motivation and context. List any dependencies that are required for this change.

Fixes # (issue)
Closes #(issue)

## Type of change

Expand Down
8 changes: 8 additions & 0 deletions CHANGELOG.rst
Original file line number Diff line number Diff line change
Expand Up @@ -58,6 +58,14 @@ Documentation
- Update `terminology page <terminology.html>`_ with more information on metrics
terminology. (:pr:`283`) `Riley X. Brady`_

New Features
aaronspring marked this conversation as resolved.
Show resolved Hide resolved
------------

- speed-up in bootstrap functions: (:pr:`2xx`) `Aaron Spring`_.
- `xr.quantile` exchanged for `dask.map_blocks(np.percentile)`
- properly implemented handling for lazy results when chunked inputs
- user gets warned when chunking potentially (un)-necessary
aaronspring marked this conversation as resolved.
Show resolved Hide resolved

climpred v1.2.0 (2019-12-17)
============================

Expand Down
2 changes: 1 addition & 1 deletion asv_bench/asv.conf.json
Original file line number Diff line number Diff line change
Expand Up @@ -80,7 +80,7 @@
"numpy": [""],
"xarray": [""],
"dask": [""],
aaronspring marked this conversation as resolved.
Show resolved Hide resolved
"xskillscore": [""],
"xskillscore": ["0.0.9"],
aaronspring marked this conversation as resolved.
Show resolved Hide resolved
},

// Combinations of libraries/python versions can be excluded/included
Expand Down
81 changes: 59 additions & 22 deletions asv_bench/benchmarks/benchmarks_perfect_model.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,17 +2,28 @@
# See "Writing benchmarks" in the asv docs for more information.


import xarray as xr
import dask
import numpy as np
from . import randn, parameterized
import xarray as xr

from climpred.prediction import compute_perfect_model
from climpred.bootstrap import bootstrap_perfect_model
from climpred.constants import PM_COMPARISONS, PM_METRICS
from climpred.prediction import compute_perfect_model

from . import parameterized, randn, requires_dask

# faster than
# from climpred.constants import PM_COMPARISONS, PM_METRICS as METRICS
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can also be updated

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only want to take a subselection here. this applies to the two asv files. comment added

METRICS = ['rmse', 'pearson_r', 'crpss']
PM_COMPARISONS = ['m2m', 'm2c']

# faster
PM_METRICS = ['rmse', 'pearson_r', 'crpss']
# PM_COMPARISONS = ['m2e', 'e2c']
bootstrap = 4
aaronspring marked this conversation as resolved.
Show resolved Hide resolved


def _ensure_loaded(res):
aaronspring marked this conversation as resolved.
Show resolved Hide resolved
"""Compute no lazy results."""
if dask.is_dask_collection(res):
res = res.compute()
return res


class Generate:
Expand All @@ -25,13 +36,13 @@ class Generate:

def make_ds(self):

# ds
# ds and control mimick smaller MPI perfect-model experiment
aaronspring marked this conversation as resolved.
Show resolved Hide resolved
self.ds = xr.Dataset()
self.nmember = 3
self.ninit = 4
self.nlead = 3
self.nx = 90 # 4 deg
self.ny = 45 # 4 deg
self.nx = 64
self.ny = 64
self.control_start = 3000
self.control_end = 3300
self.ntime = 300
aaronspring marked this conversation as resolved.
Show resolved Hide resolved
Expand Down Expand Up @@ -95,29 +106,55 @@ class Compute(Generate):
def setup(self, *args, **kwargs):
self.make_ds()

@parameterized(['metric', 'comparison'], (PM_METRICS, PM_COMPARISONS))
@parameterized(['metric', 'comparison'], (METRICS, PM_COMPARISONS))
def time_compute_perfect_model(self, metric, comparison):
"""Take time for compute_perfect_model."""
compute_perfect_model(
self.ds, self.control, metric=metric, comparison=comparison
_ensure_loaded(
compute_perfect_model(
self.ds, self.control, metric=metric, comparison=comparison
)
)

@parameterized(['metric', 'comparison'], (['pearson_r', 'crpss'], PM_COMPARISONS))
@parameterized(['metric', 'comparison'], (METRICS, PM_COMPARISONS))
def peakmem_compute_perfect_model(self, metric, comparison):
"""Take memory peak for compute_perfect_model for all comparisons."""
compute_perfect_model(
self.ds, self.control, metric=metric, comparison=comparison
_ensure_loaded(
compute_perfect_model(
self.ds, self.control, metric=metric, comparison=comparison
)
)

def time_bootstrap_perfect_model(self):
@parameterized(['metric', 'comparison'], (METRICS, PM_COMPARISONS))
def time_bootstrap_perfect_model(self, metric, comparison):
"""Take time for bootstrap_perfect_model for one metric."""
bootstrap_perfect_model(
self.ds, self.control, metric='mae', comparison='e2c', bootstrap=5
_ensure_loaded(
bootstrap_perfect_model(
self.ds,
self.control,
metric=metric,
comparison=comparison,
bootstrap=bootstrap,
)
)

@parameterized(['metric', 'comparison'], (['pearson_r', 'crpss'], PM_COMPARISONS))
@parameterized(['metric', 'comparison'], (METRICS, PM_COMPARISONS))
def peakmem_bootstrap_perfect_model(self, metric, comparison):
"""Take memory peak for bootstrap_perfect_model."""
bootstrap_perfect_model(
self.ds, self.control, metric=metric, comparison=comparison, bootstrap=5
_ensure_loaded(
bootstrap_perfect_model(
self.ds,
self.control,
metric=metric,
comparison=comparison,
bootstrap=bootstrap,
)
)


class ComputeDask(Compute):
def setup(self, *args, **kwargs):
requires_dask()
super().setup(**kwargs)
# chunk along a spatial dimension to enable embarrasingly parallel computation
self.ds = self.ds.chunk({'lon': self.nx // bootstrap})
self.control = self.control.chunk({'lon': self.nx // bootstrap})
1 change: 1 addition & 0 deletions ci/environment-dev-3.6.yml
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ dependencies:
# IDE
- ipywidgets
- jupyterlab
- nb_conda_kernels
aaronspring marked this conversation as resolved.
Show resolved Hide resolved
# Input/Output
- netcdf4
# Miscellaneous
Expand Down
Loading