Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix wraparound/overflow in date_range #19740

Closed
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion doc/source/whatsnew/v0.23.0.txt
Original file line number Diff line number Diff line change
Expand Up @@ -716,7 +716,7 @@ Datetimelike
- Bug in :class:`Timestamp` and :func:`to_datetime` where a string representing a barely out-of-bounds timestamp would be incorrectly rounded down instead of raising ``OutOfBoundsDatetime`` (:issue:`19382`)
- Bug in :func:`Timestamp.floor` :func:`DatetimeIndex.floor` where time stamps far in the future and past were not rounded correctly (:issue:`19206`)
- Bug in :func:`to_datetime` where passing an out-of-bounds datetime with ``errors='coerce'`` and ``utc=True`` would raise ``OutOfBoundsDatetime`` instead of parsing to ``NaT`` (:issue:`19612`)
-
- Bug in :func:`date_range` with frequency of ``Day`` or higher where dates sufficiently far in the future could wrap around to the past instead of raising ``OutOfBoundsDatetime`` (:issue:`19740`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add the issue number here as well


Timezones
^^^^^^^^^
Expand Down
42 changes: 40 additions & 2 deletions pandas/core/indexes/datetimes.py
Original file line number Diff line number Diff line change
Expand Up @@ -2137,11 +2137,11 @@ def _generate_regular_range(start, end, periods, offset):
tz = start.tz
elif start is not None:
b = Timestamp(start).value
e = b + np.int64(periods) * stride
e = _reraise_overflow_as_oob(b, periods, stride, side='start')
tz = start.tz
elif end is not None:
e = Timestamp(end).value + stride
b = e - np.int64(periods) * stride
b = _reraise_overflow_as_oob(e, periods, stride, side='end')
tz = end.tz
else:
raise ValueError("at least 'start' or 'end' should be specified "
Expand All @@ -2166,6 +2166,44 @@ def _generate_regular_range(start, end, periods, offset):
return data


def _reraise_overflow_as_oob(endpoint, periods, stride, side='start'):
"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it might make sense to :
move this to datetimelike.py
remove the side arg (and just o on passing in)

the reason is we have several other uses of checked_add_with_arr which are catching a scalar add overflow and should be quite similar to this (the other is in Timedelta).

the other uses I saw were for array-like input and they don't raise.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The side arg is needed if we want the error message to be specific.

All non-test usages of checked_add_with_arr in datetimelike/datetimes/timedeltas. We may be able to consolidate them in datetimelike. (though im pretty sure that numeric classes should have overflow checking implemented at some point)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my point is you are adding a function here, where if there are only 2 uses, its shorter / simpler just to use a try/except directly

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my point is you are adding a function here, where if there are only 2 uses, its shorter / simpler just to use a try/except directly

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... you mean like the original implementation of the PR?

Calculate the second endpoint for passing to np.arange, checking
to avoid an integer overflow. Catch OverflowError and re-raise
as OutOfBoundsDatetime.

Parameters
----------
endpoint : int
periods : int
stride : int
side : {'start', 'end'}

Returns
-------
other_end : int

Raises
------
OutOfBoundsDatetime
"""
# GH#19740 raise instead of incorrectly wrapping around
assert side in ['start', 'end']
if side == 'end':
stride *= -1

try:
other_end = checked_add_with_arr(np.int64(endpoint),
np.int64(periods) * stride)
except OverflowError:
raise libts.OutOfBoundsDatetime('Cannot generate range with '
'{side}={endpoint} and '
'periods={periods}'
.format(side=side, endpoint=endpoint,
periods=periods))
return other_end


def date_range(start=None, end=None, periods=None, freq='D', tz=None,
normalize=False, name=None, closed=None, **kwargs):
"""
Expand Down
7 changes: 7 additions & 0 deletions pandas/tests/indexes/datetimes/test_date_range.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@
import pandas.util.testing as tm
import pandas.util._test_decorators as td
from pandas import compat
from pandas.errors import OutOfBoundsDatetime
from pandas import date_range, bdate_range, offsets, DatetimeIndex, Timestamp
from pandas.tseries.offsets import (generate_range, CDay, BDay, DateOffset,
MonthEnd, prefix_mapping)
Expand Down Expand Up @@ -78,6 +79,12 @@ def test_date_range_timestamp_equiv_preserve_frequency(self):


class TestDateRanges(TestData):
def test_date_range_out_of_bounds(self):
# GH#19740
with pytest.raises(OutOfBoundsDatetime):
date_range('2016-01-01', periods=100000, freq='D')
with pytest.raises(OutOfBoundsDatetime):
date_range(end='1763-10-12', periods=100000, freq='D')

def test_date_range_gen_error(self):
rng = date_range('1/1/2000 00:00', '1/1/2000 00:18', freq='5min')
Expand Down