Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure maximum accuracy when encoding and decoding np.datetime64[ns] values #4684

Merged

Conversation

spencerkclark
Copy link
Member

@spencerkclark spencerkclark commented Dec 12, 2020

This PR cleans up the logic used to encode and decode times with pandas so that by default we use int64 values in both directions for all precisions down to nanosecond. If a user specifies an encoding (or a file is read in) such that float values would be required, things still work as they did before. I do this mainly by following the approach I described here: #4045 (comment).

In the process of doing this I made a few changes to coding.times._decode_datetime_with_pandas:

Note this will change the default units that are chosen for encoding times in some instances -- previously we would never default to anything more precise than seconds -- but I think this change is for the better.

cc: @aldanor

@hmaarrfk this overlaps a little with your work in #4400, so I'm giving you credit here too (I hope you don't mind!).

@spencerkclark spencerkclark marked this pull request as draft December 13, 2020 21:09
Copy link
Contributor

@dcherian dcherian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @spencerkclark this looks good to me.

doc/whats-new.rst Show resolved Hide resolved
@aldanor
Copy link

aldanor commented Dec 15, 2020

Looks great, thanks! Do I understand this correctly - you won't have to specify encoding manually, as int64 encoding will be picked by default for M8[ns] dtype?

@spencerkclark
Copy link
Member Author

Yup exactly -- with this PR, if nothing is specified in the encoding, int64 values will always be used.

@dcherian
Copy link
Contributor

dcherian commented Jan 3, 2021

Thanks @spencerkclark

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Millisecond precision is lost on datetime64 during IO roundtrip
3 participants