Ensure maximum accuracy when encoding and decoding np.datetime64[ns] values #4684

spencerkclark · 2020-12-12T21:43:57Z

Closes Millisecond precision is lost on datetime64 during IO roundtrip #4045
Tests added
Passes isort . && black . && mypy . && flake8
User visible changes (including notable bug fixes) are documented in whats-new.rst

This PR cleans up the logic used to encode and decode times with pandas so that by default we use int64 values in both directions for all precisions down to nanosecond. If a user specifies an encoding (or a file is read in) such that float values would be required, things still work as they did before. I do this mainly by following the approach I described here: #4045 (comment).

In the process of doing this I made a few changes to coding.times._decode_datetime_with_pandas:

I removed the checks on the minimum and maximum dates to decode, as the issue those checks were imposed for (invalid timestamps in the future #975) was fixed in pandas way back in 2016 (TimedeltaIndex + Timestamp -> no overflow error pandas-dev/pandas#14068).
I used an alternate approach for fixing Unexpected decoded time in xarray >= 0.10.1 #2002, which allows us to continue to use the optimization made in Speed up decode_cf_datetime #1414 without having to cast the input array to a float dtype first.

Note this will change the default units that are chosen for encoding times in some instances -- previously we would never default to anything more precise than seconds -- but I think this change is for the better.

cc: @aldanor

@hmaarrfk this overlaps a little with your work in #4400, so I'm giving you credit here too (I hope you don't mind!).

It probably doesn't really matter though.

dcherian

thanks @spencerkclark this looks good to me.

doc/whats-new.rst

aldanor · 2020-12-15T15:29:30Z

Looks great, thanks! Do I understand this correctly - you won't have to specify encoding manually, as int64 encoding will be picked by default for M8[ns] dtype?

spencerkclark · 2020-12-15T19:48:03Z

Yup exactly -- with this PR, if nothing is specified in the encoding, int64 values will always be used.

dcherian · 2021-01-03T23:38:58Z

Thanks @spencerkclark

spencerkclark added 15 commits December 12, 2020 10:50

Use integers when possible to encode/decode times

637fb87

Improve tests

e2efadb

Further improvements to tests

6a14aa0

Remove optimization in favor of maximum correctness

9b8cfda

Remove print statements

ce4bfb5

Restore optimization

032b577

Add a what's new entry

f3a870f

Add test for decoding timedeltas with nanosecond units too

2740483

Some minor cleanups

0c2b41b

Add comment to motivate new test

db487ce

Add some print statements to try and debug things on Windows

4803882

xfail round-trip test on Windows; remove print statements

ce6d072

Don't xfail Windows tests for now; we should figure why they fail

41d24f1

Fix things on Windows

57814f7

Use pandas for divisiblity check for older NumPy compatibility

3ff3cd8

spencerkclark marked this pull request as draft December 13, 2020 21:09

spencerkclark added 2 commits December 13, 2020 18:07

Reduce changes needed; improve comments

3675008

Checking remainder against zero nanoseconds is more straightforward

ed2bce6

It probably doesn't really matter though.

spencerkclark mentioned this pull request Dec 14, 2020

Millisecond precision is lost on datetime64 during IO roundtrip #4045

Closed

spencerkclark marked this pull request as ready for review December 14, 2020 00:51

dcherian approved these changes Dec 15, 2020

View reviewed changes

doc/whats-new.rst Show resolved Hide resolved

spencerkclark added 3 commits December 16, 2020 06:17

Add a note to the breaking changes section

450037d

Merge branch 'master' into encode-dates-with-ints-if-possible

d6dc260

Merge branch 'master' into encode-dates-with-ints-if-possible

2775a60

dcherian merged commit ed25573 into pydata:master Jan 3, 2021

spencerkclark mentioned this pull request Jan 4, 2021

Ensure maximum accuracy when encoding and decoding cftime.datetime values #4758

Merged

4 tasks

spencerkclark mentioned this pull request Jan 23, 2021

Further improvements to datetime roundtripping Unidata/cftime#225

Merged

spencerkclark deleted the encode-dates-with-ints-if-possible branch February 7, 2021 23:30

znicholls added a commit to znicholls/xarray that referenced this pull request Mar 18, 2021

Re-introduce fix removed in pydata#4684

4f481af

znicholls mentioned this pull request Mar 18, 2021

BUG: Future time decoding #5050

Merged

3 tasks

spencerkclark mentioned this pull request Apr 28, 2022

Unable to decode a date in nanoseconds #4183

Closed

dcherian mentioned this pull request Mar 26, 2023

[WIP] Support nano second time encoding. #4400

Closed

5 tasks

negin513 mentioned this pull request Apr 11, 2023

Subset_data point creates solar files with integer time values with older xarray -- needs conda env update ESCOMP/CTSM#1974

Closed

kmuehlbauer mentioned this pull request Apr 28, 2023

Fill values in time arrays (numpy.datetime64) are lost in zarr #7790

Closed

4 tasks

spencerkclark mentioned this pull request Sep 14, 2023

Preserve nanosecond resolution when encoding/decoding times #7827

Merged

9 tasks

spencerkclark mentioned this pull request Jun 23, 2024

Default time encoding of nanoseconds is NOT good. #9154

Open

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ensure maximum accuracy when encoding and decoding np.datetime64[ns] values #4684

Ensure maximum accuracy when encoding and decoding np.datetime64[ns] values #4684

spencerkclark commented Dec 12, 2020 •

edited

Loading

dcherian left a comment

aldanor commented Dec 15, 2020

spencerkclark commented Dec 15, 2020

dcherian commented Jan 3, 2021

Ensure maximum accuracy when encoding and decoding np.datetime64[ns] values #4684

Ensure maximum accuracy when encoding and decoding np.datetime64[ns] values #4684

Conversation

spencerkclark commented Dec 12, 2020 • edited Loading

dcherian left a comment

Choose a reason for hiding this comment

aldanor commented Dec 15, 2020

spencerkclark commented Dec 15, 2020

dcherian commented Jan 3, 2021

spencerkclark commented Dec 12, 2020 •

edited

Loading