Improve bootstrapping efficiency (#285)

* Improves implementation of `quantile` to speed up bootstrapping * Adds substantial testing to `asv` to ensure we get speedups in code * Adds `dask` example notebook for bootstrapping
pangeo-data · Feb 4, 2020 · 5310e82 · 5310e82
1 parent d599eb0
commit 5310e82
Show file tree

Hide file tree

Showing 16 changed files with 1,422 additions and 236 deletions.
diff --git a/.github/PULL_REQUEST_TEMPLATE.md b/.github/PULL_REQUEST_TEMPLATE.md
@@ -2,7 +2,7 @@
 
 Please include a summary of the change and which issue is fixed. Please also include relevant motivation and context. List any dependencies that are required for this change.
 
-Fixes # (issue)
+Closes #(issue)
 
 ## Type of change
 
@@ -12,6 +12,7 @@ Please delete options that are not relevant.
 -   [ ]  New feature (non-breaking change which adds functionality)
 -   [ ]  Breaking change (fix or feature that would cause existing functionality to not work as expected)
 -   [ ]  This change requires a documentation update
+-   [ ]  Performance (if you touched existing code run `asv` to detect performance changes)
 
 # How Has This Been Tested?
 

diff --git a/CHANGELOG.rst b/CHANGELOG.rst
@@ -2,14 +2,28 @@
 What's New
 ==========
 
-climpred v2.0.1 (2020-01-xx)
+
+climpred v2.0.1 (2020-01-##)
 ============================
 
+New Features
+------------
+
+- speed-up in bootstrap functions: (:pr:`285`) `Aaron Spring`_.
+
+    *  ``xr.quantile`` exchanged for ``dask.map_blocks(np.percentile)``
+
+    *  properly implemented handling for lazy results when chunked inputs
+
+    *  user gets warned when chunking potentially (un)-necessary
+
+
 Internals/Minor Fixes
 ---------------------
 - Gather all ``pytest.fixture``s in ``conftest.py``. (:pr:`313`) `Aaron Spring`_.
 - Move ``x_METRICS`` and ``COMPARISONS`` to ``metrics.py`` and ``comparisons.py`` in
   order to avoid circular import dependencies. (:pr:`315`) `Aaron Spring`_.
+- ``asv`` benchmarks for ``HindcastEnsemble`` (:pr:`285`) `Aaron Spring`_.
 
 
 climpred v2.0.0 (2020-01-22)

diff --git a/HOWTOCONTRIBUTE.rst b/HOWTOCONTRIBUTE.rst
@@ -165,6 +165,39 @@ Preparing Pull Requests
 
   Please stick to `xarray <http://xarray.pydata.org/en/stable/contributing.html>`_'s testing recommendations.
 
+#. Running the performance test suite
+
+Performance matters and it is worth considering whether your code has introduced
+performance regressions. `climpred` is starting to write a suite of benchmarking tests
+using `asv <https://asv.readthedocs.io/en/stable/>`__
+to enable easy monitoring of the performance of critical `climpred` operations.
+These benchmarks are all found in the ``asv_bench`` directory.
+
+If you need to run a benchmark, change your directory to ``asv_bench/`` and run::
+
+    $ asv continuous -f 1.1 upstream/master HEAD
+
+You can replace ``HEAD`` with the name of the branch you are working on,
+and report benchmarks that changed by more than 10%.
+The command uses ``conda`` by default for creating the benchmark
+environments.
+
+Running the full benchmark suite can take up to half an hour and use up a few GBs of
+RAM. Usually it is sufficient to paste only a subset of the results into the pull
+request to show that the committed changes do not cause unexpected performance
+regressions.  You can run specific benchmarks using the ``-b`` flag, which
+takes a regular expression.  For example, this will only run tests from a
+``asv_bench/benchmarks/benchmarks_perfect_model.py`` file::
+
+    $ asv continuous -f 1.1 upstream/master HEAD -b ^benchmarks_perfect_model
+
+If you want to only run a specific group of tests from a file, you can do it
+using ``.`` as a separator. For example::
+
+    $ asv continuous -f 1.1 upstream/master HEAD -b benchmarks_perfect_model.Compute.time_bootstrap_perfect_model
+
+will only run the ``time_bootstrap_perfect_model`` benchmark of class ``Compute``
+defined in ``benchmarks_perfect_model.py``.
 
 #. Create a new changelog entry in ``CHANGELOG.rst``:
 

diff --git a/asv_bench/asv.conf.json b/asv_bench/asv.conf.json
@@ -80,7 +80,7 @@
       "numpy": [""],
       "xarray": [""],
       "dask": [""],
-      "xskillscore": [""],
+      "cftime": [""],
     },
 
     // Combinations of libraries/python versions can be excluded/included

diff --git a/asv_bench/benchmarks/__init__.py b/asv_bench/benchmarks/__init__.py
@@ -1,6 +1,7 @@
 # https://github.com/pydata/xarray/blob/master/asv_bench/benchmarks/__init__.py
 import itertools
 
+import dask
 import numpy as np
 
 _counter = itertools.count()
@@ -47,3 +48,10 @@ def randint(low, high=None, size=None, frac_minus=None, seed=0):
         x.flat[inds] = -1
 
     return x
+
+
+def ensure_loaded(res):
+    """Compute no lazy results."""
+    if dask.is_dask_collection(res):
+        res = res.compute()
+    return res
diff --git a/asv_bench/benchmarks/benchmarks_hindcast.py b/asv_bench/benchmarks/benchmarks_hindcast.py
@@ -0,0 +1,175 @@
+import numpy as np
+import xarray as xr
+
+from climpred.bootstrap import bootstrap_hindcast
+from climpred.prediction import compute_hindcast
+
+from . import ensure_loaded, parameterized, randn, requires_dask
+
+# only take subselection of all possible metrics
+METRICS = ['rmse', 'pearson_r', 'crpss']
+# only take comparisons compatible with probabilistic metrics
+HINDCAST_COMPARISONS = ['m2o']
+
+BOOTSTRAP = 8
+
+
+class Generate:
+    """
+    Generate input data for benchmark.
+    """
+
+    timeout = 600
+    repeat = (2, 5, 20)
+
+    def make_hind_obs(self):
+        """Generates initialized hindcast, uninitialized historical and observational
+        data, mimicking a hindcast experiment."""
+        self.hind = xr.Dataset()
+        self.observations = xr.Dataset()
+        self.uninit = xr.Dataset()
+
+        self.nmember = 3
+        self.nlead = 3
+        self.nx = 64
+        self.ny = 64
+        self.init_start = 1960
+        self.init_end = 2000
+        self.ninit = self.init_end - self.init_start
+
+        FRAC_NAN = 0.0
+
+        inits = np.arange(self.init_start, self.init_end)
+        leads = np.arange(1, 1 + self.nlead)
+        members = np.arange(1, 1 + self.nmember)
+
+        lons = xr.DataArray(
+            np.linspace(0.5, 359.5, self.nx),
+            dims=('lon',),
+            attrs={'units': 'degrees east', 'long_name': 'longitude'},
+        )
+        lats = xr.DataArray(
+            np.linspace(-89.5, 89.5, self.ny),
+            dims=('lat',),
+            attrs={'units': 'degrees north', 'long_name': 'latitude'},
+        )
+        self.hind['var'] = xr.DataArray(
+            randn(
+                (self.nmember, self.ninit, self.nlead, self.nx, self.ny),
+                frac_nan=FRAC_NAN,
+            ),
+            coords={
+                'member': members,
+                'init': inits,
+                'lon': lons,
+                'lat': lats,
+                'lead': leads,
+            },
+            dims=('member', 'init', 'lead', 'lon', 'lat'),
+            name='var',
+            attrs={'units': 'var units', 'description': 'a description'},
+        )
+        self.observations['var'] = xr.DataArray(
+            randn((self.ninit, self.nx, self.ny), frac_nan=FRAC_NAN),
+            coords={'lon': lons, 'lat': lats, 'time': inits},
+            dims=('time', 'lon', 'lat'),
+            name='var',
+            attrs={'units': 'var units', 'description': 'a description'},
+        )
+
+        self.uninit['var'] = xr.DataArray(
+            randn(
+                (self.ninit, self.nx, self.ny, self.nmember), frac_nan=FRAC_NAN
+            ),
+            coords={
+                'lon': lons,
+                'lat': lats,
+                'time': inits,
+                'member': members,
+            },
+            dims=('time', 'lon', 'lat', 'member'),
+            name='var',
+            attrs={'units': 'var units', 'description': 'a description'},
+        )
+
+        self.hind.attrs = {'history': 'created for xarray benchmarking'}
+
+
+class Compute(Generate):
+    """
+    Benchmark time and peak memory of `compute_hindcast` and `bootstrap_hindcast`.
+    """
+
+    def setup(self, *args, **kwargs):
+        self.make_hind_obs()
+
+    @parameterized(['metric', 'comparison'], (METRICS, HINDCAST_COMPARISONS))
+    def time_compute_hindcast(self, metric, comparison):
+        """Take time for `compute_hindcast`."""
+        ensure_loaded(
+            compute_hindcast(
+                self.hind,
+                self.observations,
+                metric=metric,
+                comparison=comparison,
+            )
+        )
+
+    @parameterized(['metric', 'comparison'], (METRICS, HINDCAST_COMPARISONS))
+    def peakmem_compute_hindcast(self, metric, comparison):
+        """Take memory peak for `compute_hindcast`."""
+        ensure_loaded(
+            compute_hindcast(
+                self.hind,
+                self.observations,
+                metric=metric,
+                comparison=comparison,
+            )
+        )
+
+    @parameterized(['metric', 'comparison'], (METRICS, HINDCAST_COMPARISONS))
+    def time_bootstrap_hindcast(self, metric, comparison):
+        """Take time for `bootstrap_hindcast`."""
+        ensure_loaded(
+            bootstrap_hindcast(
+                self.hind,
+                self.uninit,
+                self.observations,
+                metric=metric,
+                comparison=comparison,
+                bootstrap=BOOTSTRAP,
+                dim='member',
+            )
+        )
+
+    @parameterized(['metric', 'comparison'], (METRICS, HINDCAST_COMPARISONS))
+    def peakmem_bootstrap_hindcast(self, metric, comparison):
+        """Take memory peak for `bootstrap_hindcast`."""
+        ensure_loaded(
+            bootstrap_hindcast(
+                self.hind,
+                self.uninit,
+                self.observations,
+                metric=metric,
+                comparison=comparison,
+                bootstrap=BOOTSTRAP,
+                dim='member',
+            )
+        )
+
+
+class ComputeDask(Compute):
+    def setup(self, *args, **kwargs):
+        """Benchmark time and peak memory of `compute_hindcast` and
+        `bootstrap_hindcast`. This executes the same tests as `Compute` but on chunked
+        data."""
+        requires_dask()
+        # magic taken from
+        # https://github.com/pydata/xarray/blob/stable/asv_bench/benchmarks/rolling.py
+        super().setup(**kwargs)
+        # chunk along a spatial dimension to enable embarrasingly parallel computation
+        self.hind = self.hind['var'].chunk({'lon': self.nx // BOOTSTRAP})
+        self.observations = self.observations['var'].chunk(
+            {'lon': self.nx // BOOTSTRAP}
+        )
+        self.uninit = self.uninit['var'].chunk({'lon': self.nx // BOOTSTRAP})