Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch enable_cftimeindex to True by default #2516

Merged
merged 17 commits into from
Nov 1, 2018

Conversation

spencerkclark
Copy link
Member

@spencerkclark spencerkclark commented Oct 26, 2018

As discussed in #2437 and #2505, this sets the option enable_cftimeindex to True by default.

  • Fully documented, including whats-new.rst for all changes.

@pep8speaks
Copy link

pep8speaks commented Oct 26, 2018

Hello @spencerkclark! Thanks for updating the PR.

Comment last updated on October 27, 2018 at 18:13 Hours UTC

@dcherian
Copy link
Contributor

Should we add nice error messages to resample and plot telling the user what to do if they try those two with cftimeindex?

@@ -223,16 +224,26 @@ Through the standalone ``cftime`` library and a custom subclass of
functionality enabled through the standard :py:class:`pandas.DatetimeIndex` for
dates from non-standard calendars or dates using a standard calendar, but
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we clarify "non-standard calendars" -> "non-standard calendars commonly used in climate science"

This paragraph is missing a bit of context for non-climate users :)


- Resampling along the time dimension for data indexed by a
:py:class:`~xarray.CFTimeIndex`
- Built-in plotting of data with :py:class:`cftime.datetime` coordinate axes.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be great to link to the GitHub issues for these.


If at any time you would like to restore the old default behavior, which was
to attempt to decode datetimes into ``np.datetime64[ns]`` objects whenever
possible (regardless of calendar type), you can set ``enable_cftimeindex`` to
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rather encourage people to convert their data to a standard calendar via normal xarray/cftime APIs. That's a cleaner solution than setting global configuration.

Can we add an example of this with existing APIs? Or maybe we should augment CFTimeIndex with a new to_datetime64 method?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rather encourage people to convert their data to a standard calendar via normal xarray/cftime APIs. That's a cleaner solution than setting global configuration.

This is a really good point and would make things a lot easier to understand. Currently we use xarray.coding.times.cftime_to_nptime internally; I can convert it to a better-documented public function (perhaps with friendlier error handling too).

@@ -33,6 +33,12 @@ v0.11.0 (unreleased)
Breaking changes
~~~~~~~~~~~~~~~~

- The option ``enable_cftimeindex`` has now been set to ``True`` by default.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should think about eventually removing this option entirely, and making it simply a "no-op" for now (raising FutureWarning). In the long term I think we would prefer to avoid keeping around extraneous global options.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense -- that will clean up the datetime decoding logic (and tests) quite a bit.

@@ -291,7 +279,12 @@ def cftime_to_nptime(times):
times = np.asarray(times)
new = np.empty(times.shape, dtype='M8[ns]')
for i, t in np.ndenumerate(times):
dt = datetime(t.year, t.month, t.day, t.hour, t.minute, t.second)
try:
dt = pd.Timestamp(t.year, t.month, t.day, t.hour, t.minute,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I switched to using pd.Timestamp here, because directly converting from datetime.datetime to np.datetime64[ns] did not raise an error for dates outside 1678 to 2262:

In [1]: from datetime import datetime; import numpy as np

In [2]: np.datetime64(datetime(1, 1, 1), 'ns')
Out[2]: numpy.datetime64('1754-08-30T22:43:41.128654848')

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good -- please note this in a comment in the code

The usual rule is that if you have to comment on your pull request to explain why you did something, you should just comment the code instead! :)

@@ -291,7 +279,12 @@ def cftime_to_nptime(times):
times = np.asarray(times)
new = np.empty(times.shape, dtype='M8[ns]')
for i, t in np.ndenumerate(times):
dt = datetime(t.year, t.month, t.day, t.hour, t.minute, t.second)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was there a reason we did not include the microsecond attribute of the datetime here previously?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possibly microsecond is only a recent addition to cftime.datetime?

'calendar, {!r}, to a pandas.DatetimeIndex, which uses dates '
'from the standard calendar. This may lead to subtle errors '
'in operations that depend on the length of time between '
'dates.'.format(calendar), RuntimeWarning)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great, but can we add a keyword argument for silencing the warning? Maybe index.to_datetimeindex(unsafe=True) could silence the warning?

@@ -291,7 +279,12 @@ def cftime_to_nptime(times):
times = np.asarray(times)
new = np.empty(times.shape, dtype='M8[ns]')
for i, t in np.ndenumerate(times):
dt = datetime(t.year, t.month, t.day, t.hour, t.minute, t.second)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possibly microsecond is only a recent addition to cftime.datetime?

@@ -291,7 +279,12 @@ def cftime_to_nptime(times):
times = np.asarray(times)
new = np.empty(times.shape, dtype='M8[ns]')
for i, t in np.ndenumerate(times):
dt = datetime(t.year, t.month, t.day, t.hour, t.minute, t.second)
try:
dt = pd.Timestamp(t.year, t.month, t.day, t.hour, t.minute,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good -- please note this in a comment in the code

The usual rule is that if you have to comment on your pull request to explain why you did something, you should just comment the code instead! :)

@@ -683,6 +684,20 @@ def resample(self, freq=None, dim=None, how=None, skipna=None,
else:
raise TypeError("Dimension name should be a string; "
"was passed %r" % dim)

if isinstance(self.indexes[dim_name], CFTimeIndex):
raise TypeError(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about NotImplementedError for now? We do plan to support this eventually

@@ -83,6 +85,11 @@ class set_options(object):

def __init__(self, **kwargs):
self.old = OPTIONS.copy()
if ENABLE_CFTIMEINDEX in kwargs:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add this as a function in SETTERS instead? That lets us keep the option specific logic outside of this class.

@@ -52,7 +54,7 @@ class set_options(object):
Default: ``'inner'``.
- ``enable_cftimeindex``: flag to enable using a ``CFTimeIndex``
for time indexes with non-standard calendars or dates outside the
Timestamp-valid range. Default: ``False``.
Timestamp-valid range. Default: ``True``.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would remove this now -- no need to document a deprecated/unused option.

@shoyer
Copy link
Member

shoyer commented Oct 31, 2018

I don't know why this failed in one of the Appveyor builds:

================================== FAILURES ===================================
___________________________ test_enable_cftimeindex ___________________________
    def test_enable_cftimeindex():
        with pytest.raises(ValueError):
            xarray.set_options(enable_cftimeindex=None)
        with pytest.warns(FutureWarning, match='no-op'):
            with xarray.set_options(enable_cftimeindex=True):
>               assert OPTIONS['enable_cftimeindex']
E               Failed: DID NOT WARN. No warnings of type (<type 'exceptions.FutureWarning'>,) was emitted. The list of emitted warnings is: [].
xarray\tests\test_options.py:38: Failed

@spencerkclark
Copy link
Member Author

Thanks @shoyer -- that was a bit puzzling but I think I sorted it out. I haven't done a careful diagnosis, but my hunch is that the previous build was running into this issue: pytest-dev/pytest#2917. A fix was merged a few weeks ago, but the Python 2.7 32-bit build is/was using a relatively old version of pytest (3.5 versus the most recent 3.9.3).

In the previous build, the FutureWarning for setting the enable_cftimeindex option was being emitted earlier in test_backends.py in the test_open_mfdataset_manyfiles test (before we were watching for it in the options tests). It was being raised there, because upon calling set_options.__exit__, every option (not just the options that were set in a with statement) would be reset to its global value. For example doing something like this would emit the warning (which even outside of this minor testing issue, I think is undesirable):

In [1]: import xarray

In [2]: with xarray.set_options(display_width=40):
  ...:     pass
  ...:
/Users/spencerclark/xarray-dev/xarray/xarray/core/options.py:49: FutureWarning: The enable_cftimeindex option is now a no-op and will be removed in a future version of xarray.
 FutureWarning)

In 6d08d3b I made changes to set_options to only fill the self.old dictionary with the old values for the keyword arguments specified in the constructor, and then had the __exit__ method only reset the values that were temporarily set before. The win-32 build now passes, and we no longer have the undesirable warning behavior shown above. Do you think that was a reasonable fix?

@shoyer
Copy link
Member

shoyer commented Nov 1, 2018

In 6d08d3b I made changes to set_options to only fill the self.old dictionary with the old values for the keyword arguments specified in the constructor, and then had the exit method only reset the values that were temporarily set before. The win-32 build now passes, and we no longer have the undesirable warning behavior shown above. Do you think that was a reasonable fix?

This looks like a reasonable fix to me

@shoyer shoyer merged commit 656f8bd into pydata:master Nov 1, 2018
@spencerkclark spencerkclark deleted the enable-cftimeindex branch November 1, 2018 17:52
dcherian pushed a commit to yohai/xarray that referenced this pull request Dec 16, 2018
* upstream/master: (122 commits)
  add missing , and article in error message (pydata#2557)
  Add libnetcdf, libhdf5, pydap and cfgrib to xarray.show_versions() (pydata#2555)
  revert to dev version for 0.11.1
  Release xarray v0.11
  DOC: update whatsnew for xarray 0.11 release (pydata#2548)
  Drop the hack needed to use CachingFileManager as we don't use it anymore. (pydata#2544)
  add full test env for py37 ci env (pydata#2545)
  Remove old-style resample example in documentation (pydata#2543)
  Stop loading tutorial data by default (pydata#2538)
  Remove the old syntax for resample. (pydata#2541)
  Remove use of deprecated, unused keyword. (pydata#2540)
  Deprecate inplace (pydata#2524)
  Zarr chunking (GH2300) (pydata#2487)
  Include multidimensional stacking groupby in docs (pydata#2493) (pydata#2536)
  Switch enable_cftimeindex to True by default (pydata#2516)
  Raise more informative error when converting tuples to Variable. (pydata#2523)
  Global option to always keep/discard attrs on operations (pydata#2482)
  Remove tests where answers change in cftime 1.0.2.1 (pydata#2522)
  Finish deprecation cycle for DataArray.__contains__ checking array values (pydata#2520)
  Fix bug where OverflowError is not being raised (pydata#2519)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants