weighted operations: performance optimisations #3883

mathause · 2020-03-24T15:31:54Z

There was a discussion on the performance of the weighted mean/ sum in terms of memory footprint but also speed, and there may indeed be some things that can be optimized. See the posts at the end of the PR. However, the optimal implementation will probably depend on the use case and some profiling will be required.

I'll just open an issue to keep track of this.
@seth-p

mathause · 2020-03-25T08:49:45Z

maybe relevant: #1995

mathause · 2020-05-18T19:21:55Z

%load_ext line_profiler


import numpy as np
import xarray as xr

from xarray.core.weighted import Weighted as w

shape_weights = (1000, 1000)
shape_data = (1000, 1000, 10)
add_nans = False

def lprun_weighted(shape_weights, shape_data, add_nans, skipna=None):

    weights = xr.DataArray(np.random.randn(*shape_weights))

    data = np.random.randn(*shape_data)

    # add approximately 25 % NaNs
    if add_nans:
        c = int(data.size * 0.25)
        data.ravel()[np.random.choice(data.size, c, replace=False)] = np.NaN

    data = xr.DataArray(data)


    return data.weighted(weights).mean(skipna=skipna)



%lprun -f w._reduce -f w._weighted_mean -f w._sum_of_weights -f w._weighted_sum -f w.__init__ -f lprun_weighted -u 1e-03 lprun_weighted(shape_weights, shape_data, add_nans, skipna=None)

mathause · 2020-05-26T09:33:40Z

weighted(weights).mean(skipna=True) calls

xarray/xarray/core/weighted.py

Line 143 in d1f7cb8

mask = da.notnull()

and

xarray/xarray/core/weighted.py

Line 130 in d1f7cb8

da = da.fillna(0.0)

da.fillna(0.0) in turn calls where(null(data), other). Thus null/ notnull is called twice. This could be optimized by doing self.null = obj.null() in __init__(...). This might also allow self.any_null = self.null.any() to skip NaN handling if there are none. However, this needs more thinking if a Dataset is passed. Probably overkill but leaving this here for reference.

mathause mentioned this issue Oct 27, 2020

Option to skip tests in weighted() #4541

Closed

dcherian mentioned this issue Dec 3, 2020

Comprehensive benchmarking suite #4648

Open

19 tasks

dcherian added the topic-performance label Jul 4, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

weighted operations: performance optimisations #3883

weighted operations: performance optimisations #3883

mathause commented Mar 24, 2020

mathause commented Mar 25, 2020

mathause commented May 18, 2020 •

edited

Loading

mathause commented May 26, 2020 •

edited

Loading

weighted operations: performance optimisations #3883

weighted operations: performance optimisations #3883

Comments

mathause commented Mar 24, 2020

mathause commented Mar 25, 2020

mathause commented May 18, 2020 • edited Loading

mathause commented May 26, 2020 • edited Loading

mathause commented May 18, 2020 •

edited

Loading

mathause commented May 26, 2020 •

edited

Loading