Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Fix freq setter for DatetimeIndex/TimedeltaIndex and deprecate for PeriodIndex #20772

Merged
merged 2 commits into from
Apr 30, 2018
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions doc/source/whatsnew/v0.23.0.txt
Original file line number Diff line number Diff line change
Expand Up @@ -890,8 +890,10 @@ Deprecations
of the ``Series`` and ``Index`` classes have been deprecated and will be
removed in a future version (:issue:`20419`).
- ``DatetimeIndex.offset`` is deprecated. Use ``DatetimeIndex.freq`` instead (:issue:`20716`)
- Setting ``PeriodIndex.freq`` (which was not guaranteed to work correctly) is deprecated. Use :meth:`PeriodIndex.asfreq` instead (:issue:`20678`)
- ``Index.get_duplicates()`` is deprecated and will be removed in a future version (:issue:`20239`)


.. _whatsnew_0230.prior_deprecations:

Removal of prior version deprecations/changes
Expand Down Expand Up @@ -1046,6 +1048,7 @@ Datetimelike
- Bug in :func:`to_datetime` where passing an out-of-bounds datetime with ``errors='coerce'`` and ``utc=True`` would raise ``OutOfBoundsDatetime`` instead of parsing to ``NaT`` (:issue:`19612`)
- Bug in :class:`DatetimeIndex` and :class:`TimedeltaIndex` addition and subtraction where name of the returned object was not always set consistently. (:issue:`19744`)
- Bug in :class:`DatetimeIndex` and :class:`TimedeltaIndex` addition and subtraction where operations with numpy arrays raised ``TypeError`` (:issue:`19847`)
- Bug in :class:`DatetimeIndex` and :class:`TimedeltaIndex` where setting the ``freq`` attribute was not fully supported (:issue:`20678`)

Timedelta
^^^^^^^^^
Expand Down
41 changes: 39 additions & 2 deletions pandas/core/indexes/datetimelike.py
Original file line number Diff line number Diff line change
Expand Up @@ -205,6 +205,43 @@ def floor(self, freq):
def ceil(self, freq):
return self._round(freq, np.ceil)

@classmethod
def _validate_frequency(cls, index, freq, **kwargs):
"""
Validate that a frequency is compatible with the values of a given
DatetimeIndex or TimedeltaIndex

Parameters
----------
index : DatetimeIndex or TimedeltaIndex
The index on which to determine if the given frequency is valid
freq : DateOffset
The frequency to validate
"""
inferred = index.inferred_freq
if index.empty or inferred == freq.freqstr:
return None

on_freq = cls._generate(
index[0], None, len(index), None, freq, **kwargs)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is generating the index needed to know if it a correct frequence? Checking the inferred frequency is not enough?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, see it was also done like that below (still wondering though)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's necessary, at least to a certain extent. There are cases where multiple frequencies are valid for a given set of dates, so you can have a valid frequency that's not the inferred frequency, e.g. ['2018-01-01', '2018-01-02', '2018-01-03'] could be calendar day or business day. Users can also define custom frequencies that would need to be validated but would not be the inferred frequency.

There are cases where you don't need to generate the entire index to reject invalid frequencies, e.g. if the second generated value doesn't match. But I don't immediately see how to get around generating the entire index to determine if a frequency is valid. I suppose you could maybe optimize memory usage with huge indexes by doing some type of elementwise validation, only keeping the current generated element in memory.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, yes, I see, that's for sure a valid case.

For me, improving the performance of the validation is not very high priority.

if not np.array_equal(index.asi8, on_freq.asi8):
msg = ('Inferred frequency {infer} from passed values does not '
'conform to passed frequency {passed}')
raise ValueError(msg.format(infer=inferred, passed=freq.freqstr))

@property
def freq(self):
"""Return the frequency object if it is set, otherwise None"""
return self._freq

@freq.setter
def freq(self, value):
if value is not None:
value = frequencies.to_offset(value)
self._validate_frequency(self, value)

self._freq = value


class DatetimeIndexOpsMixin(object):
""" common ops mixin to support a unified interface datetimelike Index """
Expand Down Expand Up @@ -401,7 +438,7 @@ def __getitem__(self, key):
@property
def freqstr(self):
"""
Return the frequency object as a string if its set, otherwise None
Return the frequency object as a string if it is set, otherwise None
"""
if self.freq is None:
return None
Expand All @@ -410,7 +447,7 @@ def freqstr(self):
@cache_readonly
def inferred_freq(self):
"""
Tryies to return a string representing a frequency guess,
Tries to return a string representing a frequency guess,
generated by infer_freq. Returns None if it can't autodetect the
frequency.
"""
Expand Down
22 changes: 2 additions & 20 deletions pandas/core/indexes/datetimes.py
Original file line number Diff line number Diff line change
Expand Up @@ -454,15 +454,7 @@ def __new__(cls, data=None,

if verify_integrity and len(subarr) > 0:
if freq is not None and not freq_infer:
inferred = subarr.inferred_freq
if inferred != freq.freqstr:
on_freq = cls._generate(subarr[0], None, len(subarr), None,
freq, tz=tz, ambiguous=ambiguous)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

passing the tz in the new version is not needed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct, and I don't think it was actually needed in the old version either. At the point at which this is being done, subarr and it's elements should already be tz aware, and cls._generate will infer tz from subarr[0].

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, yep, indeed, that sounds correct (and tested it with a small example)

if not np.array_equal(subarr.asi8, on_freq.asi8):
raise ValueError('Inferred frequency {0} from passed '
'dates does not conform to passed '
'frequency {1}'
.format(inferred, freq.freqstr))
cls._validate_frequency(subarr, freq, ambiguous=ambiguous)

if freq_infer:
inferred = subarr.inferred_freq
Expand Down Expand Up @@ -836,7 +828,7 @@ def __setstate__(self, state):
np.ndarray.__setstate__(data, nd_state)

self.name = own_state[0]
self.freq = own_state[1]
self._freq = own_state[1]
self._tz = timezones.tz_standardize(own_state[2])

# provide numpy < 1.7 compat
Expand Down Expand Up @@ -1726,16 +1718,6 @@ def slice_indexer(self, start=None, end=None, step=None, kind=None):
else:
raise

@property
def freq(self):
"""get/set the frequency of the Index"""
return self._freq

@freq.setter
def freq(self, value):
"""get/set the frequency of the Index"""
self._freq = value

@property
def offset(self):
"""get/set the frequency of the Index"""
Expand Down
19 changes: 16 additions & 3 deletions pandas/core/indexes/period.py
Original file line number Diff line number Diff line change
Expand Up @@ -219,7 +219,7 @@ class PeriodIndex(DatelikeOps, DatetimeIndexOpsMixin, Int64Index):
_is_numeric_dtype = False
_infer_as_myclass = True

freq = None
_freq = None

_engine_type = libindex.PeriodEngine

Expand Down Expand Up @@ -367,7 +367,7 @@ def _from_ordinals(cls, values, name=None, freq=None, **kwargs):
result.name = name
if freq is None:
raise ValueError('freq is not specified and cannot be inferred')
result.freq = Period._maybe_convert_freq(freq)
result._freq = Period._maybe_convert_freq(freq)
result._reset_identity()
return result

Expand Down Expand Up @@ -560,6 +560,19 @@ def is_full(self):
values = self.values
return ((values[1:] - values[:-1]) < 2).all()

@property
def freq(self):
"""Return the frequency object if it is set, otherwise None"""
return self._freq

@freq.setter
def freq(self, value):
msg = ('Setting PeriodIndex.freq has been deprecated and will be '
'removed in a future version; use PeriodIndex.asfreq instead. '
'The PeriodIndex.freq setter is not guaranteed to work.')
warnings.warn(msg, FutureWarning, stacklevel=2)
Copy link
Member Author

@jschendel jschendel Apr 20, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if we really need to deprecate setting this: it's user facing, but I think the only case it works is if you set to the existing frequency. Could potentially remove the setter entirely, causing an AttributeError to be raised, or could raise a custom AttributeError within the setter to give a better message directing users to .asfreq.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, IMO it was buggy, and as you say, only if it was the same frequency, it was working. But given that corner case, it maybe does not hurt to do the deprecation I would say.

I think the custom AttributeError message is a good idea in the future.

self._freq = value

def asfreq(self, freq=None, how='E'):
"""
Convert the PeriodIndex to the specified frequency `freq`.
Expand Down Expand Up @@ -1060,7 +1073,7 @@ def __setstate__(self, state):
np.ndarray.__setstate__(data, nd_state)

# backcompat
self.freq = Period._maybe_convert_freq(own_state[1])
self._freq = Period._maybe_convert_freq(own_state[1])

else: # pragma: no cover
data = np.empty(state)
Expand Down
17 changes: 5 additions & 12 deletions pandas/core/indexes/timedeltas.py
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,8 @@
import pandas.core.common as com
import pandas.core.dtypes.concat as _concat
from pandas.util._decorators import Appender, Substitution, deprecate_kwarg
from pandas.core.indexes.datetimelike import TimelikeOps, DatetimeIndexOpsMixin
from pandas.core.indexes.datetimelike import (
TimelikeOps, DatetimeIndexOpsMixin)
from pandas.core.tools.timedeltas import (
to_timedelta, _coerce_scalar_to_timedelta_type)
from pandas.tseries.offsets import Tick, DateOffset
Expand Down Expand Up @@ -195,7 +196,7 @@ def _add_comparison_methods(cls):
_is_numeric_dtype = True
_infer_as_myclass = True

freq = None
_freq = None

def __new__(cls, data=None, unit=None, freq=None, start=None, end=None,
periods=None, closed=None, dtype=None, copy=False,
Expand Down Expand Up @@ -251,15 +252,7 @@ def __new__(cls, data=None, unit=None, freq=None, start=None, end=None,
if verify_integrity and len(data) > 0:
if freq is not None and not freq_infer:
index = cls._simple_new(data, name=name)
inferred = index.inferred_freq
if inferred != freq.freqstr:
on_freq = cls._generate(
index[0], None, len(index), name, freq)
if not np.array_equal(index.asi8, on_freq.asi8):
raise ValueError('Inferred frequency {0} from passed '
'timedeltas does not conform to '
'passed frequency {1}'
.format(inferred, freq.freqstr))
cls._validate_frequency(index, freq)
index.freq = freq
return index

Expand Down Expand Up @@ -327,7 +320,7 @@ def _simple_new(cls, values, name=None, freq=None, **kwargs):
result = object.__new__(cls)
result._data = values
result.name = name
result.freq = freq
result._freq = freq
result._reset_identity()
return result

Expand Down
35 changes: 34 additions & 1 deletion pandas/tests/indexes/datetimes/test_ops.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,8 +9,9 @@
from pandas import (DatetimeIndex, PeriodIndex, Series, Timestamp,
date_range, _np_version_under1p10, Index,
bdate_range)
from pandas.tseries.offsets import BMonthEnd, CDay, BDay
from pandas.tseries.offsets import BMonthEnd, CDay, BDay, Day, Hour
from pandas.tests.test_base import Ops
from pandas.core.dtypes.generic import ABCDateOffset


@pytest.fixture(params=[None, 'UTC', 'Asia/Tokyo', 'US/Eastern',
Expand Down Expand Up @@ -405,6 +406,38 @@ def test_equals(self):
assert not idx.equals(list(idx3))
assert not idx.equals(pd.Series(idx3))

@pytest.mark.parametrize('values', [
['20180101', '20180103', '20180105'], []])
@pytest.mark.parametrize('freq', [
'2D', Day(2), '2B', BDay(2), '48H', Hour(48)])
@pytest.mark.parametrize('tz', [None, 'US/Eastern'])
def test_freq_setter(self, values, freq, tz):
# GH 20678
idx = DatetimeIndex(values, tz=tz)

# can set to an offset, converting from string if necessary
idx.freq = freq
assert idx.freq == freq
assert isinstance(idx.freq, ABCDateOffset)

# can reset to None
idx.freq = None
assert idx.freq is None

def test_freq_setter_errors(self):
# GH 20678
idx = DatetimeIndex(['20180101', '20180103', '20180105'])

# setting with an incompatible freq
msg = ('Inferred frequency 2D from passed values does not conform to '
'passed frequency 5D')
with tm.assert_raises_regex(ValueError, msg):
idx.freq = '5D'

# setting with non-freq string
with tm.assert_raises_regex(ValueError, 'Invalid frequency'):
idx.freq = 'foo'

def test_offset_deprecated(self):
# GH 20716
idx = pd.DatetimeIndex(['20180101', '20180102'])
Expand Down
12 changes: 12 additions & 0 deletions pandas/tests/indexes/period/test_ops.py
Original file line number Diff line number Diff line change
Expand Up @@ -401,6 +401,18 @@ def test_equals(self, freq):
assert not idx.equals(list(idx3))
assert not idx.equals(pd.Series(idx3))

def test_freq_setter_deprecated(self):
# GH 20678
idx = pd.period_range('2018Q1', periods=4, freq='Q')

# no warning for getter
with tm.assert_produces_warning(None):
idx.freq

# warning for setter
with tm.assert_produces_warning(FutureWarning):
idx.freq = pd.offsets.Day()


class TestPeriodIndexSeriesMethods(object):
""" Test PeriodIndex and Period Series Ops consistency """
Expand Down
36 changes: 36 additions & 0 deletions pandas/tests/indexes/timedeltas/test_ops.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,8 @@
_np_version_under1p10)
from pandas._libs.tslib import iNaT
from pandas.tests.test_base import Ops
from pandas.tseries.offsets import Day, Hour
from pandas.core.dtypes.generic import ABCDateOffset


class TestTimedeltaIndexOps(Ops):
Expand Down Expand Up @@ -306,6 +308,40 @@ def test_equals(self):
assert not idx.equals(list(idx2))
assert not idx.equals(pd.Series(idx2))

@pytest.mark.parametrize('values', [['0 days', '2 days', '4 days'], []])
@pytest.mark.parametrize('freq', ['2D', Day(2), '48H', Hour(48)])
def test_freq_setter(self, values, freq):
# GH 20678
idx = TimedeltaIndex(values)

# can set to an offset, converting from string if necessary
idx.freq = freq
assert idx.freq == freq
assert isinstance(idx.freq, ABCDateOffset)

# can reset to None
idx.freq = None
assert idx.freq is None

def test_freq_setter_errors(self):
# GH 20678
idx = TimedeltaIndex(['0 days', '2 days', '4 days'])

# setting with an incompatible freq
msg = ('Inferred frequency 2D from passed values does not conform to '
'passed frequency 5D')
with tm.assert_raises_regex(ValueError, msg):
idx.freq = '5D'

# setting with a non-fixed frequency
msg = '<2 \* BusinessDays> is a non-fixed frequency'
with tm.assert_raises_regex(ValueError, msg):
idx.freq = '2B'

# setting with non-freq string
with tm.assert_raises_regex(ValueError, 'Invalid frequency'):
idx.freq = 'foo'


class TestTimedeltas(object):

Expand Down