-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
improve accuracy of to_pytimedelta #57841
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the contribution @rohanjain101
I think both the title of the PR and the description are misleading. You are not really improving the accuracy of to_pytimedelta
, and an increase in a nanosecond doesn't cause an increase in a microsecond when calling to_pytimedelta
. You simply found the point where the rounding change to the next nanosecond.
I do agree that the way the function works is misleading. The problem is that most people would expect this code:
>>> datetime.timedelta(microseconds=12.5)
datetime.timedelta(microseconds=12)
to round up to 13
instead to 12
. For good or for bad, this is how Python rounding works:
>>> round(12.5)
12
The round
function or the same algorithm is probably used inside of datetime.timedelta
, so we get the behavior your experienced when using pandas to_pytimedelta
.
The reasoning for how round
works is complex, but it's the default rounding strategy for a reason. I'm not sure if in this particular case the rounding strategy you propose could be better. If you think there is a reason to change it, please let us know the reason in the PR description, or maybe better, open an issue to discuss.
I assume from the information provided that you just found Python's rounding strategy misleading, so I'll close this issue, but if there is a better reason than that for this PR, please feel free to add a comment and reopen.
Ok, but the reason this behavior is occurring I don't believe is related to python's round function. It's because python's integer division goes through a floating point which introduces accuracy issues for large integers (and in this case, result is rounded anyway). The issue is with the division, not the rounding done by timedelta. |
Sorry, you are correct. I didn't realize the problem with the floating point precision. I wonder what's the impact on performance of this change, did you execute a benchmark to compare the execution times of both implementations? @MarcoGorelli @jbrockmendel thoughts on this change? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add a whatsnew note in v3.0.0.rst
? I think this change looks OK.
(FWIW, Timestamp.to_pydatetime
also just seems to truncate the nanoseconds component. It would be nice to match that behavior but that would be an API change)
Added whatsnew, are there existing benchmarks for comparing performance? |
Thanks @rohanjain101 |
* improve accuracy of to_pytimedelta * f * f * whatsnew * f --------- Co-authored-by: Rohan Jain <[email protected]>
609988 for microseconds but expected is 609987
In this example, a difference of 1 nanosecond, shouldn't result in a difference of 1 microsecond when converted from ns to us unit.