-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix/time series interpolation is wrong 21351 #56515
Fix/time series interpolation is wrong 21351 #56515
Conversation
Any idea what is causing this? Seems unlikely to be " |
if not is_period_index: | ||
final_index = result.index | ||
missing_data_points_index = obj.index.difference(final_index) | ||
if len(missing_data_points_index) > 0: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in the opposite case where the difference is empty, i think a the sort_index and .loc
below (which makes a copy) can be avoided
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jbrockmendel Can you please clarify? "in the opposite case where the difference is empty" --> then missing_data_points_index
will be empty, so its length will be zero. In that case, the if-expression evaluates to False
, so that the loc/sort is indeed not executed. Maybe I am overlooking something.
IIUC in the motivating case we have a series whose index has freq1 and a new freq2 that are not multiples of each other. Is what you're doing here similar to
And if so, would that be more efficient/clearer than what you're doing here? I guess it wouldn't work if the original index doesn't have a freq? Or might blow up if the LCD happens to be tiny? |
is there a doctest this changes/fixes? |
@jbrockmendel I don't think this is a "pathological" case, but in general, this type of resampling should be agnostic to what sort of index it is. It is helpful to think of the input data as a "measurement" of a function The problem with the previous implementation is that those anchor points were, more or less randomly, excluded from the interpolation process. All we need to do is make sure none of the original data points is missing because this would mean throwing away valid information. This is achieved by result = concat(
[result, obj.loc[missing_data_points_index]]
) The remaining code is just gymnastics around this. I can help make this more clear in the code or here, but I don't think it can be simplified further. |
@jbrockmendel I spent quite a while looking at this now. The problem here is caused by interpolation applied to a |
@jbrockmendel I took another look and added an implementation for multi-indexes (resulting from groupby). This fixes |
And regarding |
thanks for your PR there's still some failing tests:
If these were wrong to begin with, could you please updated them so they have the correct values so that they pass? That would need doing as part of this PR CI would need to be green for this to be mergeable (furthermore, people are much more likely to review a PR which is green) |
…terpolate` and the related explanation about consideration of anchor points when interpolating downsampled series with non-aligned result index.
Fixes assumption in `test_interp_basic_with_non_range_index`. If the index is [1, 2, 3, 5] and values are [1, 2, np.nan, 4], it is wrong to expect that interpolation will result in 3 for the missing value in case of linear interpolation. It will rather be 2.666...
…ting from groupby-operations
… on series with datetime index using business days only (test case `pandas.tests.series.methods.test_interpolate.TestSeriesInterpolateData.test_interpolate`).
… on irregular index (test case `pandas.tests.series.methods.test_interpolate.TestSeriesInterpolateData.test_nan_irregular_index`).
065129c
to
0294464
Compare
…scipy is not installed
All green now @MarcoGorelli @jbrockmendel |
thanks - aiming to get to this this month |
I don't think the failing test cases are related to my code @MarcoGorelli @mroeschke , what do I do in this case? It occurred after syncing with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, thanks @cbpygit for having stuck with this one!
Leaving open a bit in case there's further comments
Co-authored-by: Matthew Roeschke <[email protected]>
Co-authored-by: Matthew Roeschke <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add a whatsnew note in v3.0.0.rst
under the Other
section of the Bug Fix
group? Thanks for sticking with this almost there!
…-wrong-21351' into fix/time-series-interpolation-is-wrong-21351
Thanks @cbpygit |
* fix: Fixes wrong doctest output in `pandas.core.resample.Resampler.interpolate` and the related explanation about consideration of anchor points when interpolating downsampled series with non-aligned result index. * Resolved merge conflicts * fix: Fixes wrong test case assumption for interpolation Fixes assumption in `test_interp_basic_with_non_range_index`. If the index is [1, 2, 3, 5] and values are [1, 2, np.nan, 4], it is wrong to expect that interpolation will result in 3 for the missing value in case of linear interpolation. It will rather be 2.666... * fix: Make sure frequency indexes are preserved with new interpolation approach * fix: Fixes new-style up-sampling interpolation for MultiIndexes resulting from groupby-operations * fix: Fixes wrong test case assumption when using linear interpolation on series with datetime index using business days only (test case `pandas.tests.series.methods.test_interpolate.TestSeriesInterpolateData.test_interpolate`). * fix: Fixes wrong test case assumption when using linear interpolation on irregular index (test case `pandas.tests.series.methods.test_interpolate.TestSeriesInterpolateData.test_nan_irregular_index`). * fix: Adds test skips for interpolation methods that require scipy if scipy is not installed * fix: Makes sure keyword arguments "downcast" is not passed to scipy interpolation methods that are not using `interp1d` or spline. * fix: Adjusted expected warning type in `test_groupby_resample_interpolate_off_grid`. * fix: Fixes failing interpolation on groupby if the index has `name`=None. Adds this check to an existing test case. * Trigger Actions * feat: Raise error on attempt to interpolate a MultiIndex data frame, providing a useful error message that describes a working alternative syntax. Fixed related test cases and added test that makes sure the error is raised. * Apply suggestions from code review Co-authored-by: Matthew Roeschke <[email protected]> * refactor: Adjusted error type assertion in test case * refactor: Removed unused parametrization definitions and switched to direct parametrization for interpolation methods in tests. * fix: Adds forgotten "@" before pytest.mark.parametrize * refactor: Apply suggestions from code review * refactor: Switched to ficture params syntax for test case parametrization * Update pandas/tests/resample/test_time_grouper.py Co-authored-by: Matthew Roeschke <[email protected]> * Update pandas/tests/resample/test_base.py Co-authored-by: Matthew Roeschke <[email protected]> * refactor: Fixes too long line * tests: Fixes test that fails due to unimportant index name comparison * docs: Added entry in whatsnew * Empty-Commit * Empty-Commit * Empty-Commit * docs: Sorted whatsnew * docs: Adjusted bug fix note and moved it to the right section --------- Co-authored-by: Marco Edward Gorelli <[email protected]> Co-authored-by: Matthew Roeschke <[email protected]>
doc/source/whatsnew/vX.X.X.rst
file if fixing a bug or adding a new feature.