-
-
Notifications
You must be signed in to change notification settings - Fork 18k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: raise ValueError if invalid period freq pass to asfreq when the index of df is a PeriodIndex #56945
ENH: raise ValueError if invalid period freq pass to asfreq when the index of df is a PeriodIndex #56945
Conversation
@MarcoGorelli, could you please take a look at this PR? |
pandas/core/resample.py
Outdated
@@ -2827,6 +2827,16 @@ def asfreq( | |||
if how is None: | |||
how = "E" | |||
|
|||
if isinstance(freq, str): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a few lines down, there's obj.index.asfreq(freq, how=how)
- should these validation go in there?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks, you are right. I moved the check if isinstance(freq, str)
to the definition of asfreq
in pandas/core/indexes/period.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for working on this
a lot of these validation paths go through to_offset
- could we just check that if you run to_offset(..., is_period=True)
then it raises an informative error message if the return value doesn't have a _period_dtype_code
attribute?
So here
pandas/pandas/_libs/tslibs/offsets.pyx
Lines 4840 to 4849 in 02011f2
if isinstance(freq, BaseOffset): | |
return freq | |
if isinstance(freq, tuple): | |
raise TypeError( | |
f"to_offset does not support tuples {freq}, pass as a string instead" | |
) | |
elif PyDelta_Check(freq): | |
return delta_to_tick(freq) |
instead of returning, assign the return value to a variable (say, delta
, as that's what's used below)
then, just after
pandas/pandas/_libs/tslibs/offsets.pyx
Lines 4964 to 4965 in 02011f2
if delta is None: | |
raise ValueError(INVALID_FREQ_ERR_MSG.format(freq)) |
raise if is_period
and delta
doesn't have a _period_dtype_code
attribute?
pandas/core/indexes/period.py
Outdated
elif offset.name == freq.replace(f"{offset.n}", ""): | ||
raise ValueError( | ||
f"Invalid offset: '{offset.name}' for converting time series " | ||
f"with PeriodIndex." | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really understand this, but I think you don't need it - just raise ValueError(INVALID_FREQ_ERR_MSG.format(f"{freq}"))
in the else
branch below?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I use two different error messages here because we raise an error of invalid freq such as “ABC”
and also for freq=“BMS”
which is valid for offsets, but invalid for period. I agree, that it can be confusing, so better to use the standard error message f"Invalid frequency: {freq}"
pandas/core/indexes/period.py
Outdated
) | ||
else: | ||
raise ValueError(INVALID_FREQ_ERR_MSG.format(f"{freq}")) | ||
|
||
arr = self._data.asfreq(freq, how) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you follow this asfreq
and put this validation even deeper?
EDIT: as mentioned in the review, it might be better to go all the way down and do this validation within to_offset
itself?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks, I did as you suggested and moved this validation to to_offset
. It works very well.
pandas/_libs/tslibs/offsets.pyx
Outdated
return delta_to_tick(freq) | ||
delta = delta_to_tick(freq) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe on line 4841, we can do
if isinstance(freq, BaseOffset):
delta = freq
so then that gets validated too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it possible to do this assignment on line 4841 as well? just in case an offset which isn't valid for periods is passed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am unsure if it's possible. I tried to do this assignment on line 4841
if isinstance(freq, BaseOffset):
delta = freq
but then I got failures.
The reason: if we replace return freq
with the delta = freq
, we go to the line 4962 and assign delta to None and then on line 4965 we raise a ValueError.
Which is why instead of the assignment delta = freq
I added the check
if is_period and not hasattr(freq, "_period_dtype_code"):
raise ValueError(f"{freq.base} is not supported as period frequency")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can just move this down to before elif PyDelta_Check(freq):
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks, I saw you already made commit for it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice one - does this mean it's also possible to get rid of the except
in
pandas/pandas/core/resample.py
Lines 2814 to 2820 in acd914d
if hasattr(freq, "_period_dtype_code"): | |
freq = freq_to_period_freqstr(freq.n, freq.name) | |
else: | |
raise ValueError( | |
f"Invalid offset: '{freq.base}' for converting time series " | |
f"with PeriodIndex." | |
) |
?
I am afraid we can't rid of this block yet. When I removed this check the Without this block we silently convert index, for example:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should be good thanks, just one minor comment
does this change anything since 2.2? if so, I'd say that a whatsnew note in 3.0 is warranted (or in 2.2.1 if it should be backported, for example if anything introduced in 2.2 wasn't correct and needs fixing here)
pandas/_libs/tslibs/offsets.pyx
Outdated
return delta_to_tick(freq) | ||
delta = delta_to_tick(freq) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it possible to do this assignment on line 4841 as well? just in case an offset which isn't valid for periods is passed
msg = "WeekOfMonth is not supported as period frequency" | ||
with pytest.raises(TypeError, match=msg): | ||
msg = "WOM-1MON is not supported as period frequency" | ||
with pytest.raises(ValueError, match=msg): | ||
Period("2012-01-02", freq="WOM-1MON") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice! yeah better to match what the user passed if possible, good one
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am unsure if it's possible to do this assignment on line 4841, if I do it
if isinstance(freq, BaseOffset):
delta = freq
I get failures. The reason: if we replace return freq with the delta = freq, we go to the line 4962 and assign delta to None and then on line 4965 we raise a ValueError.
Instead of the assignment delta = freq I added the check
if is_period and not hasattr(freq, "_period_dtype_code"):
raise ValueError(f"{freq.base} is not supported as period frequency")
yes, I think we have an incorrect conversion in 2.2
I fixed it in this PR and added an entry the 2.2.1. Now we raise a ValueError, e.g.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wonderful, thanks a tonne @natmokval !
Owee, I'm MrMeeseeks, Look at me. There seem to be a conflict, please backport manually. Here are approximate instructions:
And apply the correct labels and milestones. Congratulations — you did some good work! Hopefully your backport PR will be tested by the continuous integration and merged soon! Remember to remove the If these instructions are inaccurate, feel free to suggest an improvement. |
thanks @MarcoGorelli, for helping me with this PR! |
…d freq pass to asfreq when the index of df is a PeriodIndex' (cherry picked from commit cb97ce6)
…period freq pass to asfreq when the index of df is a PeriodIndex) (#57292) 'Backport PR #56945: ENH: raise ValueError if invalid period freq pass to asfreq when the index of df is a PeriodIndex' (cherry picked from commit cb97ce6) Co-authored-by: Natalia Mokeeva <[email protected]>
…index of df is a PeriodIndex (pandas-dev#56945)
xref #55785, #52064
PeriodIndex.asfreq
silently converts for offsets such asoffsets.MonthBegin()
, offsets.BusinessMonthEnd(), etc. (with no attribute '_period_dtype_code'
) frequency to period frequency (in this case'M'
).Reproducible Example:
the correct behaviour would be raising an Error:
Another problem: so far in the example below
we get on main
the correct behaviour would be raising an Error:
added to the definition of asfreq a check if
string
denoting frequency is supported as period frequency. If the index of a DataFrame is a PeriodIndex and the frequency is invalid period frequency aValueError
is raised.