Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch some attributes over to dynamic calculation? #158

Closed
bradyrx opened this issue Mar 23, 2020 · 7 comments
Closed

Switch some attributes over to dynamic calculation? #158

bradyrx opened this issue Mar 23, 2020 · 7 comments

Comments

@bradyrx
Copy link

bradyrx commented Mar 23, 2020

I am moving a discussion from pangeo over to here: pangeo-data/pangeo#764. xarray has a method on their CFTimeIndex to shift the cftime.

E.g.,

import xarray as xr
times = xr.cftime_range('1950', '1952', freq='AS')
print(times)
>>> CFTimeIndex([1950-01-01 00:00:00, 1951-01-01 00:00:00, 1952-01-01 00:00:00], dtype='object')
print(times.shift(1, 'YS'))
>>> CFTimeIndex([1951-01-01 00:00:00, 1952-01-01 00:00:00, 1953-01-01 00:00:00], dtype='object')

I am using this heavily in my package to shift large CFTimeIndex's many times in succession. It's running fairly slow and we're finding that the bottleneck is most likely in cftime rather than xarray.CFTimeIndex.shift(). Please see @spencerkclark's timing comments toward the bottom of that issue discussion.

It looks like the calculation of dayofwk and dayofyr is the bottleneck here, since it's being computed on initialization of the object (

cftime/cftime/_cftime.pyx

Lines 1468 to 1473 in 9f7c576

if self.dayofwk < 0:
jd = JulianDayFromDate(self,calendar='365_day')
year,month,day,hour,mn,sec,ms,dayofwk,dayofyr =\
DateFromJulianDay(jd,return_tuple=True,calendar='365_day')
self.dayofwk = dayofwk
self.dayofyr = dayofyr
). @spencerkclark showed that there are huge speedups from commenting that out.

Would it be feasible to make these dynamic attributes? E.g. they are calculated as a property of the object only when requested by the user?

Details:

cftime version: 1.0.4.2

environment: Mac OSX, python 3.7.3

@jswhit
Copy link
Collaborator

jswhit commented Mar 23, 2020

Can you try PR #160 (branch issue158) and see if speeds things up as you would expect?

@bradyrx
Copy link
Author

bradyrx commented Mar 24, 2020

This looks awesome, @jswhit! Thanks for the quick turnaround. @spencerkclark I think this nails the problem. Thoughts? We get ~200x speedup on cftime.replace, 400x speedup on cftime +datetime.timedelta, and 20x speedup on xr.CFTimeIndex.shift().

Current setup:

import xarray as xr
import cftime
import datetime
print(xr.__version__)
>>> 0.15.1
print(cftime.__version__)
>>> 1.1.1.1

date = cftime.DatetimeNoLeap(2000, 1, 1)
%timeit date.replace(year=2001)
>>> 222 µs ± 44.6 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

timedelta = datetime.timedelta(days=9)
%timeit date + timedelta
>>> 207 µs ± 36.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

times = xr.cftime_range('1950', '2000', freq='AS')
%timeit times.shift(1, 'YS')
>>> 14.3 ms ± 1.7 ms per loop (mean ± std. dev. of 7 runs, 100 loops each)

With #160:

import xarray as xr
import cftime
import datetime
print(xr.__version__)
>>> 0.15.1
print(cftime.__version__)
>>> 1.1.1.1

date = cftime.DatetimeNoLeap(2000, 1, 1)
%timeit date.replace(year=2001)
>>> 1.15 µs ± 113 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

timedelta = datetime.timedelta(days=9)
%timeit date + timedelta
>>> 512 ns ± 41.7 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

times = xr.cftime_range('1950', '2000', freq='AS')
%timeit times.shift(1, 'YS')
>>> 796 µs ± 58.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

replace

timedelta

shift

@spencerkclark
Copy link
Collaborator

Indeed this is excellent; clever to cache the results of these in #160 as well! Thanks @jswhit for implementing things and @bradyrx for starting the discussion. I'm glad we were able to work this out together.

jswhit added a commit that referenced this issue Mar 24, 2020
change dayofwk and dayofyr into properties (issue #158)
@jswhit
Copy link
Collaborator

jswhit commented Mar 24, 2020

Merged now - will be in the 1.1.2 release. Let me know if you need a release sooner rather than later. Does the xarray test suite still pass with this change?

@spencerkclark
Copy link
Collaborator

Good question; one failing test does appear, which seems related to the changes:

___________________________________________ test_dayofyear_after_cftime_range[D] ____________________________________________

freq = 'D'

    @pytest.mark.parametrize("freq", ["A", "M", "D"])
    def test_dayofyear_after_cftime_range(freq):
        pytest.importorskip("cftime", minversion="1.0.2.1")
        result = cftime_range("2000-02-01", periods=3, freq=freq).dayofyear
        expected = pd.date_range("2000-02-01", periods=3, freq=freq).dayofyear
>       np.testing.assert_array_equal(result, expected)
E       AssertionError:
E       Arrays are not equal
E
E       Mismatch: 66.7%
E       Max absolute difference: 33
E       Max relative difference: 0.97058824
E        x: array([32,  1,  1])
E        y: array([32, 33, 34])

tests/test_cftime_offsets.py:1184: AssertionError

A minimal example of the issue would be:

In [1]: import cftime; import datetime

In [2]: date = cftime.DatetimeGregorian(2000, 2, 1)

In [3]: date.dayofyr
Out[3]: 32

In [4]: date_after_timedelta_addition = date + datetime.timedelta(days=1)

In [5]: date_after_timedelta_addition.dayofyr
Out[5]: 1

We would expect [5] to return 33.

@spencerkclark
Copy link
Collaborator

#163 provides a fix for the issue above.

@jswhit
Copy link
Collaborator

jswhit commented Mar 28, 2020

Merged #163

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants