Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Continue to use nanosecond-precision Timestamps in precision-sensitive areas #7731

Merged
merged 12 commits into from
Apr 13, 2023

Conversation

spencerkclark
Copy link
Member

This addresses the remaining cftime-related test failures in #7707 by introducing a function that always returns a nanosecond-precision Timestamp object. Despite no corresponding test failures, for safety I grepped and went ahead and replaced the pd.Timestamp constructor with this function in a few other areas. I also updated our documentation to replace any mentions of the "Timestamp-valid range" with "nanosecond-precision range" since Timestamps are now more flexible, and included a note that we have an issue open for relaxing this nanosecond-precision assumption in xarray eventually.

While in principle I think it would be fine if CFTimeIndex.to_datetimeindex returned a DatetimeIndex with non-nanosecond-precision values, since we don't use to_datetimeindex anywhere outside of tests, in its current state it was returning nonsense values:

>>> import pandas as pd
>>> import xarray as xr
>>> times = xr.cftime_range("0001", periods=5)
>>> times.to_datetimeindex()
DatetimeIndex(['1754-08-30 22:43:41.128654848',
               '1754-08-31 22:43:41.128654848',
               '1754-09-01 22:43:41.128654848',
               '1754-09-02 22:43:41.128654848',
               '1754-09-03 22:43:41.128654848'],
              dtype='datetime64[ns]', freq=None)

This is due to the assumption in cftime_to_nptime that the resulting array will have nanosecond-precision values. We can (and should) address this eventually, but for the sake of quickly supporting pandas version two I decided to be conservative and punt this off to be part of #7493. cftime_to_nptime is used in places other than to_datetimeindex, so modifying it has other impacts downstream.

@keewis
Copy link
Collaborator

keewis commented Apr 6, 2023

to properly test this, I guess we'd need to merge #7724 first?

@jsignell
Copy link
Contributor

jsignell commented Apr 6, 2023

to properly test this, I guess we'd need to merge #7724 first?

Otherwise the env in the upstream test will never solve right?

@keewis
Copy link
Collaborator

keewis commented Apr 6, 2023

sorry, I forgot about being able to use the upstream-dev environment for testing. That should solve just fine since by design it ignores all pinned dependencies, but we currently get occasional segfaults and unrelated failing tests, and we only test on a single OS / python version.

@spencerkclark
Copy link
Member Author

I'm fine waiting until #7724 is merged to let our main CI cover this. Indeed the upstream tests are flaky. Locally I just installed pandas 2 via pip to do testing during development.

@keewis keewis mentioned this pull request Apr 7, 2023
@github-actions github-actions bot added CI Continuous Integration tools dependencies Pull requests that update a dependency file labels Apr 12, 2023
xarray/core/pdcompat.py Outdated Show resolved Hide resolved
@dcherian
Copy link
Contributor

dcherian commented Apr 12, 2023

RTD failures are real:

WARNING: [autosummary] failed to import xarray.CFTimeIndex.is_all_dates.
Possible hints:
* ImportError: 
* AttributeError: type object 'CFTimeIndex' has no attribute 'is_all_dates'
* ModuleNotFoundError: No module named 'xarray.CFTimeIndex'
WARNING: [autosummary] failed to import xarray.CFTimeIndex.is_mixed.
Possible hints:
* ImportError: 
* AttributeError: type object 'CFTimeIndex' has no attribute 'is_mixed'
* ModuleNotFoundError: No module named 'xarray.CFTimeIndex'
WARNING: [autosummary] failed to import xarray.CFTimeIndex.is_monotonic.
Possible hints:
* ImportError: 
* AttributeError: type object 'CFTimeIndex' has no attribute 'is_monotonic'
* ModuleNotFoundError: No module named 'xarray.CFTimeIndex'
WARNING: [autosummary] failed to import xarray.CFTimeIndex.is_type_compatible.
Possible hints:
* AttributeError: type object 'CFTimeIndex' has no attribute 'is_type_compatible'
* ImportError: 
* ModuleNotFoundError: No module named 'xarray.CFTimeIndex'
WARNING: [autosummary] failed to import xarray.CFTimeIndex.set_value.
Possible hints:
* ImportError: 
* AttributeError: type object 'CFTimeIndex' has no attribute 'set_value'
* ModuleNotFoundError: No module named 'xarray.CFTimeIndex'
WARNING: [autosummary] failed to import xarray.CFTimeIndex.to_native_types.
Possible hints:
* ImportError: 
* AttributeError: type object 'CFTimeIndex' has no attribute 'to_native_types'
* ModuleNotFoundError: No module named 'xarray.CFTimeIndex'

@spencerkclark
Copy link
Member Author

Thanks all for the help! Fingers crossed things should be all green now. Happy to address any more review comments.

@dcherian
Copy link
Contributor

There are a bunch of warnings in the tests that could be silenced:

D:\a\xarray\xarray\xarray\tests\test_dataset.py:516: UserWarning: Converting non-nanosecond precision datetime values to nanosecond precision. This behavior can eventually be relaxed in xarray, as it is an artifact from pandas which is now beginning to support non-nanosecond precision values. This warning is caused by passing non-nanosecond np.datetime64 or np.timedelta64 values to the DataArray or Variable constructor; it can be silenced by converting the values to nanosecond precision ahead of time.

But we can also just merge quickly to get a release out

@spencerkclark
Copy link
Member Author

Thanks for noting that @dcherian -- I think I got to all of them now.

@dcherian
Copy link
Contributor

Thanks for patiently working through this Spencer. I'll merge now, and then we can release tomorrow.

@dcherian dcherian merged commit c9c1c6d into pydata:main Apr 13, 2023
@spencerkclark spencerkclark deleted the enforce-nanosecond-precision branch April 13, 2023 15:17
dcherian added a commit to dcherian/xarray that referenced this pull request Apr 18, 2023
* main: (34 commits)
  Update whats-new.rst
  Fix binning by unsorted array (pydata#7762)
  Bump codecov/codecov-action from 3.1.1 to 3.1.2 (pydata#7760)
  Fix typing errors using mypy 1.2 (pydata#7752)
  [skip-ci] dev whats-new
  Add whats-new for v2023.04.0 (pydata#7757)
  remove the `black` hook (pydata#7756)
  reword the what's new entry for the `pandas` 2.0 dtype changes (pydata#7755)
  restructure the contributing guide (pydata#7681)
  Continue to use nanosecond-precision Timestamps in precision-sensitive areas (pydata#7731)
  minor doc updates to clarify extensions using accessors (pydata#7751)
  align: Avoid reindexing when join="exact" (pydata#7736)
  `pandas=2.0` support (pydata#7724)
  Clarify vectorized indexing documentation (pydata#7747)
  Avoid recasting a CFTimeIndex (pydata#7735)
  fix typo (pydata#7746)
  [pre-commit.ci] pre-commit autoupdate (pydata#7745)
  Bump pypa/gh-action-pypi-publish from 1.8.4 to 1.8.5 (pydata#7743)
  preserve boolean dtype in encoding (pydata#7720)
  [skip-ci] Add alignment benchmarks (pydata#7738)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI Continuous Integration tools dependencies Pull requests that update a dependency file topic-cftime
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants