Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Linear interpolation fails on tz-aware Series #25508

Closed
xoolive opened this issue Mar 1, 2019 · 10 comments
Closed

Linear interpolation fails on tz-aware Series #25508

xoolive opened this issue Mar 1, 2019 · 10 comments
Assignees
Labels
Bug Datetime Datetime data dtype ExtensionArray Extending pandas with custom dtypes or arrays. Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate

Comments

@xoolive
Copy link

xoolive commented Mar 1, 2019

The following code fails on 0.24.1, doesn't fail on 0.23.4 but result was wrong.

from datetime import datetime, timezone, timedelta

start = datetime(2019, 1, 1, tzinfo=timezone.utc)

df = pd.DataFrame.from_records(
    {
        "date": [start + timedelta(days=i) for i in range(5)],
        "value": list(range(5)),
        "ref_date": [start + timedelta(days=i) for i in range(5)],
    }
)

df.set_index("date").resample("1H").interpolate("linear")

Things work again if:

  • one makes start tz-naive;
  • one removes column ref_date.
@TomAugspurger
Copy link
Contributor

Right now Block.interpolate is overloaded for .interpolate and .fillna, and dispatches based on whether the method is a fillna method or an interpolation method.

It looks like ExtensionBlock.interpolate, which is what DatetimeTZBlock inherits from, only handle the fillna side of things.

Long-term, we can make our interp methods work with extension arrays (or maybe add interpolate to the interface).

Short term, we can add a .interpolate to DatetimeTZBlock that calls Block.interpolate.

@TomAugspurger TomAugspurger added Datetime Datetime data dtype Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Effort Medium ExtensionArray Extending pandas with custom dtypes or arrays. labels Mar 1, 2019
@TomAugspurger TomAugspurger added this to the Contributions Welcome milestone Mar 1, 2019
@xoolive
Copy link
Author

xoolive commented Mar 1, 2019

Indeed, I added the following in my piece of code that uses the library and it makes things work.

from pandas.core.internals import Block, DatetimeTZBlock
DatetimeTZBlock.interpolate = Block.interpolate

I could see what you means about ExtensionBlock here

I could push a PR with the (ugly) fix you suggest+good comments around here but I don't feel good doing that...

Maybe if I understand why Block.interpolate cannot apply to ExtensionBlock, I could give it a shot. Do you have a few pointers for that?

@TomAugspurger
Copy link
Contributor

TomAugspurger commented Mar 1, 2019 via email

@xoolive
Copy link
Author

xoolive commented Mar 1, 2019

OK, so if I understand well, the difference with tz-naive Series is that you now have to re-attach the timezone after converting back from i8values?

If so, I can try that, then have a look at similar issues.

@xoolive
Copy link
Author

xoolive commented Jan 31, 2020

With pandas 1.0.0, I had to adapt my fix to this one...
Maybe it is not the proper way to do it.

def _tz_interpolate(data, *args, **kwargs):
    return data.astype(int).interpolate(*args, **kwargs).astype(data.dtype)

DatetimeTZBlock.interpolate = _tz_interpolate

@mroeschke mroeschke added the Bug label Apr 1, 2020
@ChrCoello
Copy link

The block

from pandas.core.internals import Block, DatetimeTZBlock
DatetimeTZBlock.interpolate = Block.interpolate

doesn't solve the bug in version 1.3.4.

@xoolive
Copy link
Author

xoolive commented Dec 10, 2021

@ChrCoello this is my current hack:

if str(pd.__version__) < "1.3":

    def _tz_interpolate(
        data: DatetimeTZBlock, *args: Any, **kwargs: Any
    ) -> DatetimeTZBlock:
        return data.astype(int).interpolate(*args, **kwargs).astype(data.dtype)

    DatetimeTZBlock.interpolate = _tz_interpolate

else:
    # - with version 1.3.0, interpolate returns a list
    # - Windows require "int64" as "int" may be interpreted as "int32" and raise
    #   an error (was not raised before 1.3.0)

    def _tz_interpolate(
        data: DatetimeTZBlock, *args: Any, **kwargs: Any
    ) -> DatetimeTZBlock:
        coerced = data.coerce_to_target_dtype("int64")
        interpolated, *_ = coerced.interpolate(*args, **kwargs)
        return interpolated

    DatetimeTZBlock.interpolate = _tz_interpolate

@ChrCoello
Copy link

Thanks @xoolive for the fast answer.
Seems that this work around is not directly usable on my machine. From what I understand from the error message, the assignment of the new hacked _tz_interpolate to DatetimeTZBlock.interpolate is not happening.

(venv) PS C:\code\rebel-cloud-ml> python -m rebel_cloud_ml
Traceback (most recent call last):
  File "c:\bin\Python\3.8.10\lib\runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "c:\bin\Python\3.8.10\lib\runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "C:\code\rebel-cloud-ml\rebel_cloud_ml\__main__.py", line 178, in <module>
    df = add_features(df, holiday_pth=holiday_pth)
  File "C:\code\rebel-cloud-ml\rebel_cloud_ml\__main__.py", line 137, in add_features
    df = handling_missing_values(df)
  File "C:\code\rebel-cloud-ml\rebel_cloud_ml\ml\utils.py", line 29, in handling_missing_values
    df.interpolate(method="linear")
  File "C:\code\rebel-cloud-ml\venv\lib\site-packages\pandas\util\_decorators.py", line 311, in wrapper
    return func(*args, **kwargs)
  File "C:\code\rebel-cloud-ml\venv\lib\site-packages\pandas\core\frame.py", line 10712, in interpolate
    return super().interpolate(
  File "C:\code\rebel-cloud-ml\venv\lib\site-packages\pandas\core\generic.py", line 6899, in interpolate
    new_data = obj._mgr.interpolate(
  File "C:\code\rebel-cloud-ml\venv\lib\site-packages\pandas\core\internals\managers.py", line 377, in interpolate
    return self.apply("interpolate", **kwargs)
  File "C:\code\rebel-cloud-ml\venv\lib\site-packages\pandas\core\internals\managers.py", line 327, in apply
    applied = getattr(b, f)(**kwargs)
  File "C:\code\rebel-cloud-ml\venv\lib\site-packages\pandas\core\internals\blocks.py", line 1369, in interpolate
    new_values = values.fillna(value=fill_value, method=method, limit=limit)
  File "C:\code\rebel-cloud-ml\venv\lib\site-packages\pandas\core\arrays\_mixins.py", line 218, in fillna
    value, method = validate_fillna_kwargs(
  File "C:\code\rebel-cloud-ml\venv\lib\site-packages\pandas\util\_validators.py", line 372, in validate_fillna_kwargs
    method = clean_fill_method(method)
  File "C:\code\rebel-cloud-ml\venv\lib\site-packages\pandas\core\missing.py", line 120, in clean_fill_method
    raise ValueError(f"Invalid fill method. Expecting {expecting}. Got {method}")
ValueError: Invalid fill method. Expecting pad (ffill) or backfill (bfill). Got linear

@ChrCoello
Copy link

I have applied the interpolation earlier in the pre-processing stack, with success. Not a critical bug on my side :)

@jbrockmendel
Copy link
Member

Closed by #51005

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Datetime Datetime data dtype ExtensionArray Extending pandas with custom dtypes or arrays. Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate
Projects
None yet
Development

No branches or pull requests

5 participants