Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP Add a CFTimeIndex-enabled xr.cftime_range function #2301

Merged
merged 23 commits into from
Sep 19, 2018

Conversation

spencerkclark
Copy link
Member

@spencerkclark spencerkclark commented Jul 19, 2018

  • Closes add CFTimeIndex enabled date_range function #2142
  • Tests added (for all bug fixes or enhancements)
  • Tests passed (for all non-documentation changes)
  • Fully documented, including whats-new.rst for all changes and api.rst for new API (remove if this change should not be visible to users, e.g., if it is an internal clean-up, or if this is part of a larger project that will be documented later)

I took the approach first discussed here by @shoyer and followed pandas by creating simplified offset classes for use with cftime objects to implement a CFTimeIndex-enabled cftime_range function. I still may clean things up a bit and add a few more tests, but I wanted to post this in its current state to show some progress, as I think it is more or less working. I will try to ping folks when it is ready for a more detailed review.

Here are a few examples:

In [1]: import xarray as xr

In [2]: xr.cftime_range('2000-02-01', '2002-05-05', freq='3M', calendar='noleap')
Out[2]:
CFTimeIndex([2000-02-28 00:00:00, 2000-05-31 00:00:00, 2000-08-31 00:00:00,
             2000-11-30 00:00:00, 2001-02-28 00:00:00, 2001-05-31 00:00:00,
             2001-08-31 00:00:00, 2001-11-30 00:00:00, 2002-02-28 00:00:00],
            dtype='object')

In [3]: xr.cftime_range('2000-02-01', periods=4, freq='3A-JUN', calendar='noleap')
Out[3]:
CFTimeIndex([2000-06-30 00:00:00, 2003-06-30 00:00:00, 2006-06-30 00:00:00,
             2009-06-30 00:00:00],
            dtype='object')

In [4]: xr.cftime_range(end='2000-02-01', periods=4, freq='3A-JUN')
Out[4]:
CFTimeIndex([1990-06-30 00:00:00, 1993-06-30 00:00:00, 1996-06-30 00:00:00,
             1999-06-30 00:00:00],
            dtype='object')

Hopefully the offset classes defined here would also be useful for implementing things like resample for CFTimeIndex objects (#2191) and CFTimeIndex.shift (#2244).



def _adjust_n_years(other, n, month, reference_day):
"""Adjust the number of times an annual offset is applied based on
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

W291 trailing whitespace

xarray/coding/cftime_offsets.py Show resolved Hide resolved
xarray/coding/cftime_offsets.py Show resolved Hide resolved
xarray/coding/cftime_offsets.py Show resolved Hide resolved
xarray/coding/cftime_offsets.py Show resolved Hide resolved
xarray/coding/cftime_offsets.py Show resolved Hide resolved
+----------------------+----------------------------------------------------------------+
| all_leap, 366_day | ``cftime.DatetimeAllLeap`` |
+----------------------+----------------------------------------------------------------+
| 360_day | ``cftime.Datetime360Day`` |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

E501 line too long (93 > 79 characters)

+----------------------+----------------------------------------------------------------+
| 360_day | ``cftime.Datetime360Day`` |
+----------------------+----------------------------------------------------------------+
| julian | ``cftime.DatetimeJulian`` |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

E501 line too long (93 > 79 characters)


>>> xr.date_range(start='2000', periods=6, freq='2MS', calendar='noleap')
CFTimeIndex([2000-01-01 00:00:00, 2000-03-01 00:00:00, 2000-05-01 00:00:00,
2000-07-01 00:00:00, 2000-09-01 00:00:00, 2000-11-01 00:00:00],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

E501 line too long (80 > 79 characters)


>>> xr.date_range(start='0001', periods=6, freq='2MS', calendar='standard')
CFTimeIndex([0001-01-01 00:00:00, 0001-03-01 00:00:00, 0001-05-01 00:00:00,
0001-07-01 00:00:00, 0001-09-01 00:00:00, 0001-11-01 00:00:00],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

E501 line too long (80 > 79 characters)

As in the standard pandas function, three of the ``start``, ``end``,
``periods``, or ``freq`` arguments must be specified at a given time, with
the other set to ``None``. See the `pandas documentation
<https://pandas.pydata.org/pandas-docs/stable/generated/pandas.date_range.html#pandas.date_range>`_
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

W291 trailing whitespace

Add docstring for xr.date_range

Fix failing test

Fix test skipping logic

Coerce result of zip to a list in test setup

Add and clean up tests

Fix skip logic

Skip roll_forward and roll_backward tests if cftime is not installed

Expose all possible arguments to pd.date_range

Add more detail to docstrings

flake8

Add a what's new entry

Add a short example to time-series.rst
@spencerkclark
Copy link
Member Author

@jhamman @shoyer when you get a chance, I think this is ready for review.

I did a few more things since first pushing this PR:

  • Previously CFTimeIndex would raise an error if it contained an empty array; I'm not really sure what the use case would be, but to be consistent with the way DatetimeIndex behaves, I made modifications to allow for this (in some cases date_range returns an empty index). I also added the ability to give a CFTimeIndex a name.
  • The repr of CFTimeIndex previously provided no information regarding the calendar type of the index; I found this somewhat inconvenient, so I added a custom __unicode__ method to CFTimeIndex, which pandas.Index uses to build its repr. I think the result looks decent (I updated the examples above), but the implementation is a bit crude (relying on a number of private API functions of the base Index class). Maybe there is a better way or maybe we should put this off until later?
  • I added arguments to the constructor for CFTimeIndex that makes it more analogous to the constructor for a DatetimeIndex, allowing one to either pass dates directly, or create dates using arguments one can pass to date_range (inspired by Exposing/documenting CFTimeIndex as public API #2140 (comment)).

I hope the general approach seems reasonable, though if you had something else in mind for how to implement this originally, I'd be open to changing things. I tried to keep things as basic as I could (the pandas date_range function is fairly involved!).

@shoyer
Copy link
Member

shoyer commented Jul 23, 2018

I haven't had the chance to look in detail at this yet, but one small point I would suggest is renaming it from xarray.date_time to xarray.cftime_range. This makes it more obvious that the function is really for users of cftime.

@spencerkclark spencerkclark changed the title WIP Add a CFTimeIndex-enabled xr.date_range function WIP Add a CFTimeIndex-enabled xr.cftime_range function Aug 3, 2018
@jhamman
Copy link
Member

jhamman commented Aug 29, 2018

@spencerkclark - sync this branch with master and we'll get this all wrapped up.

@spencerkclark
Copy link
Member Author

Thanks @jhamman! This should be synced up and ready for review. The test failure under the dask-dev build doesn't appear to be related.

@shoyer
Copy link
Member

shoyer commented Aug 31, 2018

if you merge in master again, the test suite should be passing after #2393

@spencerkclark
Copy link
Member Author

Awesome, thanks @shoyer.

@jhamman
Copy link
Member

jhamman commented Sep 11, 2018

I took a quick look at this today. It looks quite complete and I'm eager to get this in to master. I'm personally having trouble breaking off enough time to give it a full review. If anyone else in @pydata/xarray has some time to give this a close look, I'm sure @spencerkclark would really appreciate it.

elif type(other) == type(self):
return type(self)(self.n - other.n)
else:
raise NotImplementedError
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use return NotImplemented here (which Python will reraise as TypeError)

_freq = None

def __init__(self, n=1):
self.n = n
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you want to add checks here to verify that n is an integer? or maybe floats are OK, too?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed n must be an integer. I added a check here. I also added a check on the month argument provided to the YearOffset classes.

elif isinstance(date_str_or_date, cftime.datetime):
return date_str_or_date
else:
raise ValueError('date_str_or_date must be a string or a '
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be TypeError


For dates from standard calendars within the ``pandas.Timestamp``-valid
range, this function operates as a thin wrapper around
``pandas.date_range``.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this use-case is important to support, and interfaces that can return different types can be difficult to understand. I would instead always return a CFTimeIndex.

@@ -149,10 +182,22 @@ class CFTimeIndex(pd.Index):
'The microseconds of the datetime')
date_type = property(get_date_type)

def __new__(cls, data):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's keep the constructor the same here, rather than overriding it to match the signature of cftime_range. (I know this is a deviation from pandas, but frankly I think the pandas behavior is a confusing mistake.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure thing -- I retained my addition of the optional name argument so that we can support it in cftime_range (and obviously the constructor here as well).

else:
attrs.append(('calendar', repr(infer_calendar_name(self._data))))

prepr = (pd.compat.u(",%s") %
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can just use u",%s" rather than pd.compat.u (which is only needed for old versions of Python 3 that we no longer support)

Adapted from pandas.core.indexes.base.__unicode__
"""
klass = self.__class__.__name__
data = self._format_data()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a little dangerous: we're using private methods here which might go away without notice. I have been burned by this in the past (specifically in xarray with pandas).

The other reason to hold off on copying the pandas repr() is that we don't implement casting like pd.to_datetime() instead the CFTimeIndex constructor, e.g.,

In [2]: pd.DatetimeIndex(['2000', '2001'])
Out[2]: DatetimeIndex(['2000-01-01', '2001-01-01'], dtype='datetime64[ns]', freq=None)

(to be clear, I don't think this is a good idea either, but at least it ensures that the repr roundtrips faithfully)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, admittedly this was a bit of a hack. For now I reverted things back to simply use the default repr, which CFTimeIndex inherits from pd.Index. Eventually it would be nice if the repr for CFTimeIndex indicated its calendar type, but I'll make a separate issue for that, which we can handle later in a cleaner way.

to_offset, get_date_type, _MONTH_ABBREVIATIONS, _cftime_range,
to_cftime_datetime, cftime_range)
from xarray import CFTimeIndex
from . import has_cftime
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of checking has_cftime and skipif decorators on every function, use cftime = pytest.importorskip('cftime').

('2000', None, 5, 'A', 'foo'),
('2000', '1999', None, 'A', 'foo')]
)
def test_cftime_range(start, end, periods, freq, name, calendar):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add selection of calendar specific tests, e.g., verify that generating generating a year's worth of dates with noleap, all_leap and 360_day does the right thing?

@@ -0,0 +1,765 @@
"""Time offset classes for use with cftime.datetime objects"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming that most of this file is copied/adapted from pandas, please add copy of the pandas copyright notice at the top.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a copy of the license for pandas to the top of the module. Is that what you meant? Should we do the same in cftimeindex.py?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a copy of the license for pandas to the top of the module. Is that what you meant?

Yes, this is what I had in mind. A comment would also be OK rather than the docstring.

Should we do the same in cftimeindex.py?

Sure, this is probably a good idea

@@ -7,6 +7,7 @@

from xarray.core import pycompat
from xarray.core.utils import is_scalar
from .times import infer_calendar_name
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

F401 '.times.infer_calendar_name' imported but unused

import pandas as pd
import xarray as xr

from datetime import timedelta
from xarray.coding.cftimeindex import (
parse_iso8601, CFTimeIndex, assert_all_valid_date_type,
_parsed_string_to_bounds, _parse_iso8601_with_reso)
from xarray.coding.cftime_offsets import get_date_type, YearBegin
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

F401 'xarray.coding.cftime_offsets.get_date_type' imported but unused
F401 'xarray.coding.cftime_offsets.YearBegin' imported but unused

import pandas as pd
import xarray as xr

from datetime import timedelta
from xarray.coding.cftimeindex import (
parse_iso8601, CFTimeIndex, assert_all_valid_date_type,
_parsed_string_to_bounds, _parse_iso8601_with_reso)
from xarray.coding.cftime_offsets import get_date_type, YearBegin
from xarray.coding.times import infer_calendar_name
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

F401 'xarray.coding.times.infer_calendar_name' imported but unused

@@ -240,7 +287,7 @@ def __sub__(self, other):
elif type(other) == type(self) and other.month == self.month:
return type(self)(self.n - other.n, month=self.month)
else:
raise NotImplementedError
raise NotImplemented
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be return NotImplemented, not raise NotImplemented

@spencerkclark
Copy link
Member Author

Many thanks for the initial review @shoyer. I think I got to everything so far. I agree restricting cftime_range to only return CFTimeIndexes makes things much simpler to reason about.

return -self + other

def __apply__(self):
raise NotImplementedError
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should also be return NotImplemented to differ implementation to subclasses

@shoyer shoyer merged commit 5b87b6e into pydata:master Sep 19, 2018
@shoyer
Copy link
Member

shoyer commented Sep 19, 2018

thanks @spencerkclark !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

add CFTimeIndex enabled date_range function
4 participants