-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP Add a CFTimeIndex-enabled xr.cftime_range function #2301
Conversation
xarray/coding/cftime_offsets.py
Outdated
|
||
|
||
def _adjust_n_years(other, n, month, reference_day): | ||
"""Adjust the number of times an annual offset is applied based on |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
W291 trailing whitespace
xarray/coding/cftime_offsets.py
Outdated
+----------------------+----------------------------------------------------------------+ | ||
| all_leap, 366_day | ``cftime.DatetimeAllLeap`` | | ||
+----------------------+----------------------------------------------------------------+ | ||
| 360_day | ``cftime.Datetime360Day`` | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
E501 line too long (93 > 79 characters)
xarray/coding/cftime_offsets.py
Outdated
+----------------------+----------------------------------------------------------------+ | ||
| 360_day | ``cftime.Datetime360Day`` | | ||
+----------------------+----------------------------------------------------------------+ | ||
| julian | ``cftime.DatetimeJulian`` | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
E501 line too long (93 > 79 characters)
xarray/coding/cftime_offsets.py
Outdated
|
||
>>> xr.date_range(start='2000', periods=6, freq='2MS', calendar='noleap') | ||
CFTimeIndex([2000-01-01 00:00:00, 2000-03-01 00:00:00, 2000-05-01 00:00:00, | ||
2000-07-01 00:00:00, 2000-09-01 00:00:00, 2000-11-01 00:00:00], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
E501 line too long (80 > 79 characters)
xarray/coding/cftime_offsets.py
Outdated
|
||
>>> xr.date_range(start='0001', periods=6, freq='2MS', calendar='standard') | ||
CFTimeIndex([0001-01-01 00:00:00, 0001-03-01 00:00:00, 0001-05-01 00:00:00, | ||
0001-07-01 00:00:00, 0001-09-01 00:00:00, 0001-11-01 00:00:00], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
E501 line too long (80 > 79 characters)
xarray/coding/cftime_offsets.py
Outdated
As in the standard pandas function, three of the ``start``, ``end``, | ||
``periods``, or ``freq`` arguments must be specified at a given time, with | ||
the other set to ``None``. See the `pandas documentation | ||
<https://pandas.pydata.org/pandas-docs/stable/generated/pandas.date_range.html#pandas.date_range>`_ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
W291 trailing whitespace
Add docstring for xr.date_range Fix failing test Fix test skipping logic Coerce result of zip to a list in test setup Add and clean up tests Fix skip logic Skip roll_forward and roll_backward tests if cftime is not installed Expose all possible arguments to pd.date_range Add more detail to docstrings flake8 Add a what's new entry Add a short example to time-series.rst
70f3cd0
to
009fbe3
Compare
@jhamman @shoyer when you get a chance, I think this is ready for review. I did a few more things since first pushing this PR:
I hope the general approach seems reasonable, though if you had something else in mind for how to implement this originally, I'd be open to changing things. I tried to keep things as basic as I could (the pandas |
I haven't had the chance to look in detail at this yet, but one small point I would suggest is renaming it from |
@spencerkclark - sync this branch with master and we'll get this all wrapped up. |
Thanks @jhamman! This should be synced up and ready for review. The test failure under the dask-dev build doesn't appear to be related. |
if you merge in master again, the test suite should be passing after #2393 |
Awesome, thanks @shoyer. |
I took a quick look at this today. It looks quite complete and I'm eager to get this in to master. I'm personally having trouble breaking off enough time to give it a full review. If anyone else in @pydata/xarray has some time to give this a close look, I'm sure @spencerkclark would really appreciate it. |
xarray/coding/cftime_offsets.py
Outdated
elif type(other) == type(self): | ||
return type(self)(self.n - other.n) | ||
else: | ||
raise NotImplementedError |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use return NotImplemented
here (which Python will reraise as TypeError)
xarray/coding/cftime_offsets.py
Outdated
_freq = None | ||
|
||
def __init__(self, n=1): | ||
self.n = n |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you want to add checks here to verify that n
is an integer? or maybe floats are OK, too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed n
must be an integer. I added a check here. I also added a check on the month
argument provided to the YearOffset
classes.
xarray/coding/cftime_offsets.py
Outdated
elif isinstance(date_str_or_date, cftime.datetime): | ||
return date_str_or_date | ||
else: | ||
raise ValueError('date_str_or_date must be a string or a ' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should be TypeError
xarray/coding/cftime_offsets.py
Outdated
|
||
For dates from standard calendars within the ``pandas.Timestamp``-valid | ||
range, this function operates as a thin wrapper around | ||
``pandas.date_range``. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this use-case is important to support, and interfaces that can return different types can be difficult to understand. I would instead always return a CFTimeIndex
.
@@ -149,10 +182,22 @@ class CFTimeIndex(pd.Index): | |||
'The microseconds of the datetime') | |||
date_type = property(get_date_type) | |||
|
|||
def __new__(cls, data): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's keep the constructor the same here, rather than overriding it to match the signature of cftime_range
. (I know this is a deviation from pandas, but frankly I think the pandas behavior is a confusing mistake.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure thing -- I retained my addition of the optional name argument so that we can support it in cftime_range
(and obviously the constructor here as well).
xarray/coding/cftimeindex.py
Outdated
else: | ||
attrs.append(('calendar', repr(infer_calendar_name(self._data)))) | ||
|
||
prepr = (pd.compat.u(",%s") % |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can just use u",%s"
rather than pd.compat.u
(which is only needed for old versions of Python 3 that we no longer support)
xarray/coding/cftimeindex.py
Outdated
Adapted from pandas.core.indexes.base.__unicode__ | ||
""" | ||
klass = self.__class__.__name__ | ||
data = self._format_data() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a little dangerous: we're using private methods here which might go away without notice. I have been burned by this in the past (specifically in xarray with pandas).
The other reason to hold off on copying the pandas repr() is that we don't implement casting like pd.to_datetime()
instead the CFTimeIndex constructor, e.g.,
In [2]: pd.DatetimeIndex(['2000', '2001'])
Out[2]: DatetimeIndex(['2000-01-01', '2001-01-01'], dtype='datetime64[ns]', freq=None)
(to be clear, I don't think this is a good idea either, but at least it ensures that the repr roundtrips faithfully)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, admittedly this was a bit of a hack. For now I reverted things back to simply use the default repr, which CFTimeIndex inherits from pd.Index
. Eventually it would be nice if the repr for CFTimeIndex indicated its calendar type, but I'll make a separate issue for that, which we can handle later in a cleaner way.
xarray/tests/test_cftime_offsets.py
Outdated
to_offset, get_date_type, _MONTH_ABBREVIATIONS, _cftime_range, | ||
to_cftime_datetime, cftime_range) | ||
from xarray import CFTimeIndex | ||
from . import has_cftime |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of checking has_cftime
and skipif
decorators on every function, use cftime = pytest.importorskip('cftime')
.
xarray/tests/test_cftime_offsets.py
Outdated
('2000', None, 5, 'A', 'foo'), | ||
('2000', '1999', None, 'A', 'foo')] | ||
) | ||
def test_cftime_range(start, end, periods, freq, name, calendar): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add selection of calendar specific tests, e.g., verify that generating generating a year's worth of dates with noleap, all_leap and 360_day does the right thing?
@@ -0,0 +1,765 @@ | |||
"""Time offset classes for use with cftime.datetime objects""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assuming that most of this file is copied/adapted from pandas, please add copy of the pandas copyright notice at the top.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a copy of the license for pandas to the top of the module. Is that what you meant? Should we do the same in cftimeindex.py
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a copy of the license for pandas to the top of the module. Is that what you meant?
Yes, this is what I had in mind. A comment would also be OK rather than the docstring.
Should we do the same in cftimeindex.py?
Sure, this is probably a good idea
xarray/coding/cftimeindex.py
Outdated
@@ -7,6 +7,7 @@ | |||
|
|||
from xarray.core import pycompat | |||
from xarray.core.utils import is_scalar | |||
from .times import infer_calendar_name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
F401 '.times.infer_calendar_name' imported but unused
xarray/tests/test_cftimeindex.py
Outdated
import pandas as pd | ||
import xarray as xr | ||
|
||
from datetime import timedelta | ||
from xarray.coding.cftimeindex import ( | ||
parse_iso8601, CFTimeIndex, assert_all_valid_date_type, | ||
_parsed_string_to_bounds, _parse_iso8601_with_reso) | ||
from xarray.coding.cftime_offsets import get_date_type, YearBegin |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
F401 'xarray.coding.cftime_offsets.get_date_type' imported but unused
F401 'xarray.coding.cftime_offsets.YearBegin' imported but unused
xarray/tests/test_cftimeindex.py
Outdated
import pandas as pd | ||
import xarray as xr | ||
|
||
from datetime import timedelta | ||
from xarray.coding.cftimeindex import ( | ||
parse_iso8601, CFTimeIndex, assert_all_valid_date_type, | ||
_parsed_string_to_bounds, _parse_iso8601_with_reso) | ||
from xarray.coding.cftime_offsets import get_date_type, YearBegin | ||
from xarray.coding.times import infer_calendar_name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
F401 'xarray.coding.times.infer_calendar_name' imported but unused
xarray/coding/cftime_offsets.py
Outdated
@@ -240,7 +287,7 @@ def __sub__(self, other): | |||
elif type(other) == type(self) and other.month == self.month: | |||
return type(self)(self.n - other.n, month=self.month) | |||
else: | |||
raise NotImplementedError | |||
raise NotImplemented |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should be return NotImplemented
, not raise NotImplemented
Many thanks for the initial review @shoyer. I think I got to everything so far. I agree restricting |
xarray/coding/cftime_offsets.py
Outdated
return -self + other | ||
|
||
def __apply__(self): | ||
raise NotImplementedError |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should also be return NotImplemented
to differ implementation to subclasses
thanks @spencerkclark ! |
whats-new.rst
for all changes andapi.rst
for new API (remove if this change should not be visible to users, e.g., if it is an internal clean-up, or if this is part of a larger project that will be documented later)I took the approach first discussed here by @shoyer and followed pandas by creating simplified offset classes for use with cftime objects to implement a
CFTimeIndex
-enabledcftime_range
function. I still may clean things up a bit and add a few more tests, but I wanted to post this in its current state to show some progress, as I think it is more or less working. I will try to ping folks when it is ready for a more detailed review.Here are a few examples:
Hopefully the offset classes defined here would also be useful for implementing things like
resample
forCFTimeIndex
objects (#2191) andCFTimeIndex.shift
(#2244).