Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up quantiles with sorting #1513

Merged
merged 63 commits into from
Jul 23, 2024
Merged

Speed up quantiles with sorting #1513

merged 63 commits into from
Jul 23, 2024

Conversation

SarahG-579462
Copy link
Contributor

@SarahG-579462 SarahG-579462 commented Oct 26, 2023

Pull Request Checklist:

  • This PR addresses an already opened issue (for bug fixes / features)
  • Tests for the changes have been added (for bug fixes / features)
    • (If applicable) Documentation has been added / updated (for bug fixes / features)
  • CHANGES.rst has been updated (with summary of main changes)
    • Link to issue (:issue:number) and pull request (:pull:number) has been added

What kind of change does this PR introduce?

  • nbutils.quantile has a speed-up of more than 2.5x by a combination of changes in nbutils.quantile and nbutils._quantile
  • This does not cover nbutils.vec_quantiles (used for adapt_freq) but similar principles could be used
  • It adds the possibility of using fastnanquantile module which is very fast

Does this PR introduce a breaking change?

No

Other information:

  • The new low-level function to compute quantiles nbutils._quantile is a 1d jitted version of xclim.core.utils._nan_quantile
  • Manual benchmarking can be performed in the notebook benchmarks/sdba_quantile.ipynb, attached to this PR.

@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@github-actions
Copy link

github-actions bot commented Oct 26, 2023

Welcome, new contributor!

It appears that this is your first Pull Request. To give credit where it's due, we ask that you add your information to the AUTHORS.rst and .zenodo.json:

  • The relevant author information has been added to AUTHORS.rst and .zenodo.json

Please make sure you've read our contributing guide. We look forward to reviewing your Pull Request shortly ✨

@github-actions github-actions bot added sdba Issues concerning the sdba submodule. docs Improvements to documenation labels Oct 26, 2023
xclim/sdba/nbutils.py Outdated Show resolved Hide resolved
@juliettelavoie
Copy link
Contributor

juliettelavoie commented Nov 1, 2023

For a 30 years period, over QC, the bias adjustment (first initial DQM + Npdf + restore) with these parameters:

n_iter: 30
nquantiles:50
group: time
n_escore: 1

took 8 hours for this branch and 8h40 for the master. (time for branch faster_mbcn is coming.)

EDIT: Not sure if it makes a difference but, the test for this branch ran mostly during work hours and the test for the master ran during the night.

UPDATE: It took 6h30 with branch faster_mbcn. @coxipi your branch does help in my case!

@Zeitsperre
Copy link
Collaborator

@SarahG-579462

checking consistency... /home/docs/checkouts/readthedocs.org/user_builds/xclim/checkouts/1513/docs/notebooks/sdba-speed.ipynb: WARNING: document isn't included in any toctree

Small FYI, you need to add the notebook to the exclude_patterns in docs/conf.py if you don't want it rendered.

xclim/sdba/nbutils.py Outdated Show resolved Hide resolved
xclim/sdba/nbutils.py Outdated Show resolved Hide resolved
@coxipi
Copy link
Contributor

coxipi commented Mar 12, 2024

So I hijacked the PR a bit to test general benchmark tools. I needed to install pytest-benchmarkand pytest-benchmark[histogram]. The idea would be to have a benchmarking suite, similar to the structure of the testing suite. For a test similar to your current notebook, I run:

pytest  -n0 --dist no benchmarks/sdba/test_nbutils.py

where test_nbutils.py is

from xclim.sdba import nbutils
import pytest 
import importlib
import os
import numpy as np

nq = 50
i = 0
@pytest.mark.parametrize(
    "use_nanquantile,size",
    [
    (True, 100),
    (True, 200),
    (True, 500),
    (True, 1000),
    (False, 100),
    (False, 200),
    (False, 500),
    (False, 1000),
    ],
)
def test_nanquantile_simple(benchmark,use_nanquantile, size):
    np.random.seed(0)
    arr = np.random.randn(size)
    os.environ["USE_NANQUANTILE"] = "True" if use_nanquantile else ""
    importlib.reload(nbutils)   
    def func():
        return nbutils._choosequantile(arr, np.linspace(0, 1, nq))
    # trigger jit compilation on first run
    global i
    if i==0: 
        func()
        i = i + 1
    benchmark(func)

(I should have named it test__choosequantile but anyways)

I can then have an histogram with the results:
benchmark_multi

It's a bit annoying that the order of parameters in pytest.mark.parametrize is not respected. I could sort the names alphabetically, but then "1000" points comes before the other smaller samples. Anyways.

Anyways, it seems pretty plug and play. I find the visualization with the notebook is better, so we might want to keep a notebook for benchmarks too? Simpler tests that are not comparing multiple methods would benefit of simple numerical tests just to assert the performance doesn't drop below a certain level perhaps?

@coxipi
Copy link
Contributor

coxipi commented Mar 13, 2024

I did a test with EmpiricalQuantileMapping, 30 years, 3 locations, dayofyear-31.
Sorting is still advantageous, but the effect is marginal. There is maybe an overhead which is just much greater than the actual computation of quantiles? False/True refers to use_nanquantile
image

In comparison, group="time" has a better performance for sortquantile, it's about 85-90% of the nanquantile running time. In that case, there must be less overhead (complicated map_blocks structure has a less important role in this case?)

@SarahG-579462
Copy link
Contributor Author

I did a test with EmpiricalQuantileMapping, 30 years, 3 locations, dayofyear-31. Sorting is still advantageous, but the effect is marginal. There is maybe an overhead which is just much greater than the actual computation of quantiles? False/True refers to use_nanquantile

This is really interesting, thanks. I totally support you taking over this PR, I don't really have time to work on it these days.

@SarahG-579462
Copy link
Contributor Author

The present implementation is not right, the call to nan_quantile is not properly "jitterized". Among other things, some operations in nan_quantile use the argument axis in nan_quantile which is unsupported for many functions used with njit. The library fastnanquantile shows how those parts working on axis must be done outside the njit calls. Also, np.asanyarray doesn't seem to be supported. I'm not sure why this is triggered or not in certain contexts.

The function would either need to be properly jitterized, or keep it unjittered and study again the performance. Since sdba and xclim are to be separated, I feel some redundancy would not be bad on this front, I was considering to simply revert some commits and continue to use sortquantile. Thoughts?

During testing, I had made a version of Abel's nan_quantile that supports only 1 dimension, that is jitt'able. Would it be preferable to use this? Abel's unjitted version is really not much slower than the jitted version, because he's using quite pure numpy functions.

@coxipi
Copy link
Contributor

coxipi commented Jul 9, 2024

Should be good to go, I added a few tests

@coxipi
Copy link
Contributor

coxipi commented Jul 9, 2024

Would it be possible for you to add fastnanquantile as a dependency that can be installed via tox (e.g. under deps: fastnanquantile: fastnanquantile) and add that to one of the PyPI/tox CI builds (in main.yml) so that we can test against it?

@Zeitsperre under test-pypi?
EDIT: I will wait your return, exploring this part of the codebase is not good for my cardiac rhythm

@github-actions github-actions bot added the CI Automation and Contiunous Integration label Jul 12, 2024
Copy link

github-actions bot commented Jul 12, 2024

Note

It appears that this Pull Request modifies the main.yml workflow.

On inspection, the XCLIM_TESTDATA_BRANCH environment variable is set to the most recent tag (v2023.12.14).

No further action is required.

@coveralls
Copy link

coveralls commented Jul 12, 2024

Coverage Status

coverage: 90.417% (-0.3%) from 90.679%
when pulling df6963b on speed-up-quantile
into 1b83536 on main.

@Zeitsperre
Copy link
Collaborator

@aulemahal I leave you to do the final review and merge. I won't pretend to be the expert here.

@coxipi
Copy link
Contributor

coxipi commented Jul 19, 2024

@SarahG-579462 can you inspect a last time? Should be good now

Copy link
Contributor Author

@SarahG-579462 SarahG-579462 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me! a note about fastmath below.

xclim/sdba/nbutils.py Show resolved Hide resolved
xclim/sdba/nbutils.py Show resolved Hide resolved
@coxipi coxipi merged commit 05e721b into main Jul 23, 2024
21 checks passed
@coxipi coxipi deleted the speed-up-quantile branch July 23, 2024 16:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Approved for additional tests CI Automation and Contiunous Integration docs Improvements to documenation sdba Issues concerning the sdba submodule.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants