improve accuracy of to_pytimedelta #57841

rohanjain101 · 2024-03-14T17:12:55Z

>>> a = pd.Timedelta(1152921504609987375)
>>> a.to_pytimedelta()
datetime.timedelta(days=13343, seconds=86304, microseconds=609988)
>>>

609988 for microseconds but expected is 609987

>>> a = pd.Timedelta(1152921504609987374)
>>> a.to_pytimedelta()
datetime.timedelta(days=13343, seconds=86304, microseconds=609987)
>>>

In this example, a difference of 1 nanosecond, shouldn't result in a difference of 1 microsecond when converted from ns to us unit.

…o_td

datapythonista

Thanks for the contribution @rohanjain101

I think both the title of the PR and the description are misleading. You are not really improving the accuracy of to_pytimedelta, and an increase in a nanosecond doesn't cause an increase in a microsecond when calling to_pytimedelta. You simply found the point where the rounding change to the next nanosecond.

I do agree that the way the function works is misleading. The problem is that most people would expect this code:

>>> datetime.timedelta(microseconds=12.5)
datetime.timedelta(microseconds=12)

to round up to 13 instead to 12. For good or for bad, this is how Python rounding works:

>>> round(12.5)
12

The round function or the same algorithm is probably used inside of datetime.timedelta, so we get the behavior your experienced when using pandas to_pytimedelta.

The reasoning for how round works is complex, but it's the default rounding strategy for a reason. I'm not sure if in this particular case the rounding strategy you propose could be better. If you think there is a reason to change it, please let us know the reason in the PR description, or maybe better, open an issue to discuss.

I assume from the information provided that you just found Python's rounding strategy misleading, so I'll close this issue, but if there is a better reason than that for this PR, please feel free to add a comment and reopen.

rohanjain101 · 2024-03-14T23:00:04Z

Ok, but the reason this behavior is occurring I don't believe is related to python's round function. It's because python's integer division goes through a floating point which introduces accuracy issues for large integers (and in this case, result is rounded anyway). The issue is with the division, not the rounding done by timedelta.

datapythonista · 2024-03-14T23:23:32Z

Sorry, you are correct. I didn't realize the problem with the floating point precision.

I wonder what's the impact on performance of this change, did you execute a benchmark to compare the execution times of both implementations?

@MarcoGorelli @jbrockmendel thoughts on this change?

mroeschke

Could you add a whatsnew note in v3.0.0.rst? I think this change looks OK.

(FWIW, Timestamp.to_pydatetime also just seems to truncate the nanoseconds component. It would be nice to match that behavior but that would be an API change)

rohanjain101 · 2024-03-20T18:08:28Z

Added whatsnew, are there existing benchmarks for comparing performance?

mroeschke · 2024-03-25T18:11:03Z

Thanks @rohanjain101

* improve accuracy of to_pytimedelta * f * f * whatsnew * f --------- Co-authored-by: Rohan Jain <[email protected]>

Rohan Jain added 2 commits March 14, 2024 13:11

improve accuracy of to_pytimedelta

3419deb

f

779be5b

rohanjain101 closed this Mar 14, 2024

rohanjain101 reopened this Mar 14, 2024

Merge branch 'main' into to_td

bcdb990

rohanjain101 marked this pull request as ready for review March 14, 2024 21:49

rohanjain101 requested a review from MarcoGorelli as a code owner March 14, 2024 21:49

Rohan Jain added 2 commits March 14, 2024 17:51

f

73af5fd

Merge branch 'to_td' of https://github.com/rohanjain101/pandas into t…

658b310

…o_td

datapythonista reviewed Mar 14, 2024

View reviewed changes

datapythonista closed this Mar 14, 2024

datapythonista added the Timedelta Timedelta data type label Mar 14, 2024

datapythonista reopened this Mar 14, 2024

mroeschke reviewed Mar 20, 2024

View reviewed changes

whatsnew

8680ee5

rohanjain101 and others added 3 commits March 20, 2024 14:08

Merge branch 'pandas-dev:main' into to_td

1d18f4a

f

1c83d02

Merge branch 'main' into to_td

fa3dafc

rohanjain101 requested a review from mroeschke March 25, 2024 12:24

mroeschke approved these changes Mar 25, 2024

View reviewed changes

mroeschke added this to the 3.0 milestone Mar 25, 2024

mroeschke merged commit cf40e56 into pandas-dev:main Mar 25, 2024
46 checks passed

rohanjain101 deleted the to_td branch March 25, 2024 23:04

pmhatre1 pushed a commit to pmhatre1/pandas-pmhatre1 that referenced this pull request May 7, 2024

improve accuracy of to_pytimedelta (pandas-dev#57841)

8fe650c

* improve accuracy of to_pytimedelta * f * f * whatsnew * f --------- Co-authored-by: Rohan Jain <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

improve accuracy of to_pytimedelta #57841

improve accuracy of to_pytimedelta #57841

rohanjain101 commented Mar 14, 2024 •

edited

Loading

datapythonista left a comment

rohanjain101 commented Mar 14, 2024 •

edited

Loading

datapythonista commented Mar 14, 2024

mroeschke left a comment

rohanjain101 commented Mar 20, 2024

mroeschke commented Mar 25, 2024

improve accuracy of to_pytimedelta #57841

improve accuracy of to_pytimedelta #57841

Conversation

rohanjain101 commented Mar 14, 2024 • edited Loading

datapythonista left a comment

Choose a reason for hiding this comment

rohanjain101 commented Mar 14, 2024 • edited Loading

datapythonista commented Mar 14, 2024

mroeschke left a comment

Choose a reason for hiding this comment

rohanjain101 commented Mar 20, 2024

mroeschke commented Mar 25, 2024

rohanjain101 commented Mar 14, 2024 •

edited

Loading

rohanjain101 commented Mar 14, 2024 •

edited

Loading