Skip to content

Commit

Permalink
Switch enable_cftimeindex to True by default (#2516)
Browse files Browse the repository at this point in the history
* Switch enable_cftimeindex to True by default

* Add a friendlier error message when plotting cftime objects

* Mention that the non-standard calendars are used in climate science

* Add GH issue references to docs

* Deprecate enable_cftimeindex option

* Add CFTimeIndex.to_datetimeindex method

* Add friendlier error message for resample

* lint

* Address review comments

* Take into account microsecond attribute in cftime_to_nptime

* Add test for decoding dates with microsecond-resolution units

This would have failed before including the microsecond attribute
of each date in cftime_to_nptime in eaa4a44.

* Fix typo in time-series.rst

* Formatting

* Fix test_decode_cf_datetime_non_iso_strings

* Prevent warning emitted from set_options.__exit__
  • Loading branch information
spencerkclark authored and shoyer committed Nov 1, 2018
1 parent 17815b4 commit 656f8bd
Show file tree
Hide file tree
Showing 15 changed files with 447 additions and 387 deletions.
1 change: 1 addition & 0 deletions doc/api-hidden.rst
Original file line number Diff line number Diff line change
Expand Up @@ -152,3 +152,4 @@
plot.FacetGrid.map

CFTimeIndex.shift
CFTimeIndex.to_datetimeindex
102 changes: 63 additions & 39 deletions doc/time-series.rst
Original file line number Diff line number Diff line change
Expand Up @@ -71,10 +71,11 @@ One unfortunate limitation of using ``datetime64[ns]`` is that it limits the
native representation of dates to those that fall between the years 1678 and
2262. When a netCDF file contains dates outside of these bounds, dates will be
returned as arrays of :py:class:`cftime.datetime` objects and a :py:class:`~xarray.CFTimeIndex`
can be used for indexing. The :py:class:`~xarray.CFTimeIndex` enables only a subset of
the indexing functionality of a :py:class:`pandas.DatetimeIndex` and is only enabled
when using the standalone version of ``cftime`` (not the version packaged with
earlier versions ``netCDF4``). See :ref:`CFTimeIndex` for more information.
will be used for indexing. :py:class:`~xarray.CFTimeIndex` enables a subset of
the indexing functionality of a :py:class:`pandas.DatetimeIndex` and is only
fully compatible with the standalone version of ``cftime`` (not the version
packaged with earlier versions ``netCDF4``). See :ref:`CFTimeIndex` for more
information.

Datetime indexing
-----------------
Expand Down Expand Up @@ -221,18 +222,28 @@ Non-standard calendars and dates outside the Timestamp-valid range
Through the standalone ``cftime`` library and a custom subclass of
:py:class:`pandas.Index`, xarray supports a subset of the indexing
functionality enabled through the standard :py:class:`pandas.DatetimeIndex` for
dates from non-standard calendars or dates using a standard calendar, but
outside the `Timestamp-valid range`_ (approximately between years 1678 and
2262). This behavior has not yet been turned on by default; to take advantage
of this functionality, you must have the ``enable_cftimeindex`` option set to
``True`` within your context (see :py:func:`~xarray.set_options` for more
information). It is expected that this will become the default behavior in
xarray version 0.11.

For instance, you can create a DataArray indexed by a time
coordinate with a no-leap calendar within a context manager setting the
``enable_cftimeindex`` option, and the time index will be cast to a
:py:class:`~xarray.CFTimeIndex`:
dates from non-standard calendars commonly used in climate science or dates
using a standard calendar, but outside the `Timestamp-valid range`_
(approximately between years 1678 and 2262).

.. note::

As of xarray version 0.11, by default, :py:class:`cftime.datetime` objects
will be used to represent times (either in indexes, as a
:py:class:`~xarray.CFTimeIndex`, or in data arrays with dtype object) if
any of the following are true:

- The dates are from a non-standard calendar
- Any dates are outside the Timestamp-valid range.

Otherwise pandas-compatible dates from a standard calendar will be
represented with the ``np.datetime64[ns]`` data type, enabling the use of a
:py:class:`pandas.DatetimeIndex` or arrays with dtype ``np.datetime64[ns]``
and their full set of associated features.

For example, you can create a DataArray indexed by a time
coordinate with dates from a no-leap calendar and a
:py:class:`~xarray.CFTimeIndex` will automatically be used:

.. ipython:: python
Expand All @@ -241,27 +252,11 @@ coordinate with a no-leap calendar within a context manager setting the
dates = [DatetimeNoLeap(year, month, 1) for year, month in
product(range(1, 3), range(1, 13))]
with xr.set_options(enable_cftimeindex=True):
da = xr.DataArray(np.arange(24), coords=[dates], dims=['time'],
name='foo')
da = xr.DataArray(np.arange(24), coords=[dates], dims=['time'], name='foo')
.. note::

With the ``enable_cftimeindex`` option activated, a :py:class:`~xarray.CFTimeIndex`
will be used for time indexing if any of the following are true:

- The dates are from a non-standard calendar
- Any dates are outside the Timestamp-valid range

Otherwise a :py:class:`pandas.DatetimeIndex` will be used. In addition, if any
variable (not just an index variable) is encoded using a non-standard
calendar, its times will be decoded into :py:class:`cftime.datetime` objects,
regardless of whether or not they can be represented using
``np.datetime64[ns]`` objects.

xarray also includes a :py:func:`~xarray.cftime_range` function, which enables
creating a :py:class:`~xarray.CFTimeIndex` with regularly-spaced dates. For instance, we can
create the same dates and DataArray we created above using:
creating a :py:class:`~xarray.CFTimeIndex` with regularly-spaced dates. For
instance, we can create the same dates and DataArray we created above using:

.. ipython:: python
Expand Down Expand Up @@ -317,13 +312,42 @@ For data indexed by a :py:class:`~xarray.CFTimeIndex` xarray currently supports:

.. ipython:: python
da.to_netcdf('example.nc')
xr.open_dataset('example.nc')
da.to_netcdf('example-no-leap.nc')
xr.open_dataset('example-no-leap.nc')
.. note::

Currently resampling along the time dimension for data indexed by a
:py:class:`~xarray.CFTimeIndex` is not supported.
While much of the time series functionality that is possible for standard
dates has been implemented for dates from non-standard calendars, there are
still some remaining important features that have yet to be implemented,
for example:

- Resampling along the time dimension for data indexed by a
:py:class:`~xarray.CFTimeIndex` (:issue:`2191`, :issue:`2458`)
- Built-in plotting of data with :py:class:`cftime.datetime` coordinate axes
(:issue:`2164`).

For some use-cases it may still be useful to convert from
a :py:class:`~xarray.CFTimeIndex` to a :py:class:`pandas.DatetimeIndex`,
despite the difference in calendar types (e.g. to allow the use of some
forms of resample with non-standard calendars). The recommended way of
doing this is to use the built-in
:py:meth:`~xarray.CFTimeIndex.to_datetimeindex` method:

.. ipython:: python
modern_times = xr.cftime_range('2000', periods=24, freq='MS', calendar='noleap')
da = xr.DataArray(range(24), [('time', modern_times)])
da
datetimeindex = da.indexes['time'].to_datetimeindex()
da['time'] = datetimeindex
da.resample(time='Y').mean('time')
However in this case one should use caution to only perform operations which
do not depend on differences between dates (e.g. differentiation,
interpolation, or upsampling with resample), as these could introduce subtle
and silent errors due to the difference in calendar types between the dates
encoded in your data and the dates stored in memory.

.. _Timestamp-valid range: https://pandas.pydata.org/pandas-docs/stable/timeseries.html#timestamp-limitations
.. _ISO 8601-format: https://en.wikipedia.org/wiki/ISO_8601
Expand Down
16 changes: 16 additions & 0 deletions doc/whats-new.rst
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,22 @@ v0.11.0 (unreleased)
Breaking changes
~~~~~~~~~~~~~~~~

- For non-standard calendars commonly used in climate science, xarray will now
always use :py:class:`cftime.datetime` objects, rather than by default try to
coerce them to ``np.datetime64[ns]`` objects. A
:py:class:`~xarray.CFTimeIndex` will be used for indexing along time
coordinates in these cases. A new method,
:py:meth:`~xarray.CFTimeIndex.to_datetimeindex`, has been added
to aid in converting from a :py:class:`~xarray.CFTimeIndex` to a
:py:class:`pandas.DatetimeIndex` for the remaining use-cases where
using a :py:class:`~xarray.CFTimeIndex` is still a limitation (e.g. for
resample or plotting). Setting the ``enable_cftimeindex`` option is now a
no-op and emits a ``FutureWarning``.
- ``Dataset.T`` has been removed as a shortcut for :py:meth:`Dataset.transpose`.
Call :py:meth:`Dataset.transpose` directly instead.
- Iterating over a ``Dataset`` now includes only data variables, not coordinates.
Similarily, calling ``len`` and ``bool`` on a ``Dataset`` now
includes only data variables
- Finished deprecation cycles:
- ``Dataset.T`` has been removed as a shortcut for :py:meth:`Dataset.transpose`.
Call :py:meth:`Dataset.transpose` directly instead.
Expand Down
53 changes: 53 additions & 0 deletions xarray/coding/cftimeindex.py
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,7 @@
from __future__ import absolute_import

import re
import warnings
from datetime import timedelta

import numpy as np
Expand All @@ -50,6 +51,8 @@
from xarray.core import pycompat
from xarray.core.utils import is_scalar

from .times import cftime_to_nptime, infer_calendar_name, _STANDARD_CALENDARS


def named(name, pattern):
return '(?P<' + name + '>' + pattern + ')'
Expand Down Expand Up @@ -381,6 +384,56 @@ def _add_delta(self, deltas):
# pandas. No longer used as of pandas 0.23.
return self + deltas

def to_datetimeindex(self, unsafe=False):
"""If possible, convert this index to a pandas.DatetimeIndex.
Parameters
----------
unsafe : bool
Flag to turn off warning when converting from a CFTimeIndex with
a non-standard calendar to a DatetimeIndex (default ``False``).
Returns
-------
pandas.DatetimeIndex
Raises
------
ValueError
If the CFTimeIndex contains dates that are not possible in the
standard calendar or outside the pandas.Timestamp-valid range.
Warns
-----
RuntimeWarning
If converting from a non-standard calendar to a DatetimeIndex.
Warnings
--------
Note that for non-standard calendars, this will change the calendar
type of the index. In that case the result of this method should be
used with caution.
Examples
--------
>>> import xarray as xr
>>> times = xr.cftime_range('2000', periods=2, calendar='gregorian')
>>> times
CFTimeIndex([2000-01-01 00:00:00, 2000-01-02 00:00:00], dtype='object')
>>> times.to_datetimeindex()
DatetimeIndex(['2000-01-01', '2000-01-02'], dtype='datetime64[ns]', freq=None)
""" # noqa: E501
nptimes = cftime_to_nptime(self)
calendar = infer_calendar_name(self)
if calendar not in _STANDARD_CALENDARS and not unsafe:
warnings.warn(
'Converting a CFTimeIndex with dates from a non-standard '
'calendar, {!r}, to a pandas.DatetimeIndex, which uses dates '
'from the standard calendar. This may lead to subtle errors '
'in operations that depend on the length of time between '
'dates.'.format(calendar), RuntimeWarning)
return pd.DatetimeIndex(nptimes)


def _parse_iso8601_without_reso(date_type, datetime_str):
date, _ = _parse_iso8601_with_reso(date_type, datetime_str)
Expand Down
60 changes: 27 additions & 33 deletions xarray/coding/times.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,6 @@
from ..core import indexing
from ..core.common import contains_cftime_datetimes
from ..core.formatting import first_n_items, format_timestamp, last_item
from ..core.options import OPTIONS
from ..core.pycompat import PY3
from ..core.variable import Variable
from .variables import (
Expand Down Expand Up @@ -61,8 +60,9 @@ def _require_standalone_cftime():
try:
import cftime # noqa: F401
except ImportError:
raise ImportError('Using a CFTimeIndex requires the standalone '
'version of the cftime library.')
raise ImportError('Decoding times with non-standard calendars '
'or outside the pandas.Timestamp-valid range '
'requires the standalone cftime package.')


def _netcdf_to_numpy_timeunit(units):
Expand All @@ -84,41 +84,32 @@ def _unpack_netcdf_time_units(units):
return delta_units, ref_date


def _decode_datetime_with_cftime(num_dates, units, calendar,
enable_cftimeindex):
def _decode_datetime_with_cftime(num_dates, units, calendar):
cftime = _import_cftime()
if enable_cftimeindex:
_require_standalone_cftime()

if cftime.__name__ == 'cftime':
dates = np.asarray(cftime.num2date(num_dates, units, calendar,
only_use_cftime_datetimes=True))
else:
# Must be using num2date from an old version of netCDF4 which
# does not have the only_use_cftime_datetimes option.
dates = np.asarray(cftime.num2date(num_dates, units, calendar))

if (dates[np.nanargmin(num_dates)].year < 1678 or
dates[np.nanargmax(num_dates)].year >= 2262):
if not enable_cftimeindex or calendar in _STANDARD_CALENDARS:
if calendar in _STANDARD_CALENDARS:
warnings.warn(
'Unable to decode time axis into full '
'numpy.datetime64 objects, continuing using dummy '
'cftime.datetime objects instead, reason: dates out '
'of range', SerializationWarning, stacklevel=3)
else:
if enable_cftimeindex:
if calendar in _STANDARD_CALENDARS:
dates = cftime_to_nptime(dates)
else:
try:
dates = cftime_to_nptime(dates)
except ValueError as e:
warnings.warn(
'Unable to decode time axis into full '
'numpy.datetime64 objects, continuing using '
'dummy cftime.datetime objects instead, reason:'
'{0}'.format(e), SerializationWarning, stacklevel=3)
if calendar in _STANDARD_CALENDARS:
dates = cftime_to_nptime(dates)
return dates


def _decode_cf_datetime_dtype(data, units, calendar, enable_cftimeindex):
def _decode_cf_datetime_dtype(data, units, calendar):
# Verify that at least the first and last date can be decoded
# successfully. Otherwise, tracebacks end up swallowed by
# Dataset.__repr__ when users try to view their lazily decoded array.
Expand All @@ -128,8 +119,7 @@ def _decode_cf_datetime_dtype(data, units, calendar, enable_cftimeindex):
last_item(values) or [0]])

try:
result = decode_cf_datetime(example_value, units, calendar,
enable_cftimeindex)
result = decode_cf_datetime(example_value, units, calendar)
except Exception:
calendar_msg = ('the default calendar' if calendar is None
else 'calendar %r' % calendar)
Expand All @@ -145,8 +135,7 @@ def _decode_cf_datetime_dtype(data, units, calendar, enable_cftimeindex):
return dtype


def decode_cf_datetime(num_dates, units, calendar=None,
enable_cftimeindex=False):
def decode_cf_datetime(num_dates, units, calendar=None):
"""Given an array of numeric dates in netCDF format, convert it into a
numpy array of date time objects.
Expand Down Expand Up @@ -200,8 +189,7 @@ def decode_cf_datetime(num_dates, units, calendar=None,

except (OutOfBoundsDatetime, OverflowError):
dates = _decode_datetime_with_cftime(
flat_num_dates.astype(np.float), units, calendar,
enable_cftimeindex)
flat_num_dates.astype(np.float), units, calendar)

return dates.reshape(num_dates.shape)

Expand Down Expand Up @@ -291,7 +279,16 @@ def cftime_to_nptime(times):
times = np.asarray(times)
new = np.empty(times.shape, dtype='M8[ns]')
for i, t in np.ndenumerate(times):
dt = datetime(t.year, t.month, t.day, t.hour, t.minute, t.second)
try:
# Use pandas.Timestamp in place of datetime.datetime, because
# NumPy casts it safely it np.datetime64[ns] for dates outside
# 1678 to 2262 (this is not currently the case for
# datetime.datetime).
dt = pd.Timestamp(t.year, t.month, t.day, t.hour, t.minute,
t.second, t.microsecond)
except ValueError as e:
raise ValueError('Cannot convert date {} to a date in the '
'standard calendar. Reason: {}.'.format(t, e))
new[i] = np.datetime64(dt)
return new

Expand Down Expand Up @@ -404,15 +401,12 @@ def encode(self, variable, name=None):
def decode(self, variable, name=None):
dims, data, attrs, encoding = unpack_for_decoding(variable)

enable_cftimeindex = OPTIONS['enable_cftimeindex']
if 'units' in attrs and 'since' in attrs['units']:
units = pop_to(attrs, encoding, 'units')
calendar = pop_to(attrs, encoding, 'calendar')
dtype = _decode_cf_datetime_dtype(
data, units, calendar, enable_cftimeindex)
dtype = _decode_cf_datetime_dtype(data, units, calendar)
transform = partial(
decode_cf_datetime, units=units, calendar=calendar,
enable_cftimeindex=enable_cftimeindex)
decode_cf_datetime, units=units, calendar=calendar)
data = lazy_elemwise_func(data, transform, dtype)

return Variable(dims, data, attrs, encoding)
Expand Down
Loading

0 comments on commit 656f8bd

Please sign in to comment.