Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

interpolate(method='linear') bug in pandas or pint-pandas? #112

Open
MichaelTiemannOSC opened this issue Feb 12, 2022 · 3 comments
Open

Comments

@MichaelTiemannOSC
Copy link
Collaborator

I added a comment to this Issue in pandas: pandas-dev/pandas#41565

But now I wonder whether it's a Pandas problem (ExtensionArrays implementation of interpolate) or or a Pint-Pandas problem (lack of PintArray implementation of interpolate).

Here's interpolate working as expecting, with float64 as the base type:

>>> import pandas as pd
>>> s = pd.Series([1, None, 3], dtype=float)
>>> s
0    1.0
1    NaN
2    3.0
dtype: float64
>>> s.interpolate(method="linear")
0    1.0
1    2.0
2    3.0
dtype: float64
>>> 

Here's it not working with PintArray:

>>> import pandas as pd
>>> import pint_pandas
>>> from pint_pandas import PintArray as PA_
>>> s = pd.Series(PA_([1., None, 3.], dtype='pint[m]'), dtype='pint[m]')
>>> s
0    1.0
1    nan
2    3.0
dtype: pint[meter]
>>> s.interpolate(method="linear")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/opt/miniconda3/envs/spyder-env/lib/python3.9/site-packages/pandas/util/_decorators.py", line 311, in wrapper
    return func(*args, **kwargs)
  File "/opt/miniconda3/envs/spyder-env/lib/python3.9/site-packages/pandas/core/series.py", line 5423, in interpolate
    return super().interpolate(
  File "/opt/miniconda3/envs/spyder-env/lib/python3.9/site-packages/pandas/core/generic.py", line 6899, in interpolate
    new_data = obj._mgr.interpolate(
  File "/opt/miniconda3/envs/spyder-env/lib/python3.9/site-packages/pandas/core/internals/managers.py", line 377, in interpolate
    return self.apply("interpolate", **kwargs)
  File "/opt/miniconda3/envs/spyder-env/lib/python3.9/site-packages/pandas/core/internals/managers.py", line 327, in apply
    applied = getattr(b, f)(**kwargs)
  File "/opt/miniconda3/envs/spyder-env/lib/python3.9/site-packages/pandas/core/internals/blocks.py", line 1369, in interpolate
    new_values = values.fillna(value=fill_value, method=method, limit=limit)
  File "/opt/miniconda3/envs/spyder-env/lib/python3.9/site-packages/pandas/core/arrays/base.py", line 716, in fillna
    value, method = validate_fillna_kwargs(value, method)
  File "/opt/miniconda3/envs/spyder-env/lib/python3.9/site-packages/pandas/util/_validators.py", line 372, in validate_fillna_kwargs
    method = clean_fill_method(method)
  File "/opt/miniconda3/envs/spyder-env/lib/python3.9/site-packages/pandas/core/missing.py", line 120, in clean_fill_method
    raise ValueError(f"Invalid fill method. Expecting {expecting}. Got {method}")
ValueError: Invalid fill method. Expecting pad (ffill) or backfill (bfill). Got linear

The working interpolate function comes from Block.interpolate (in blocks.py) and uses this try clause to muscle through:

        try:
            m = missing.clean_fill_method(method)
        except ValueError:
            m = None

The non-working interpolate function comes from EABackedBlock.interpolate (also in blocks.py), which is just a weak interface to fillna.

Is it Pandas or Pint-Pandas that needs to implement a linear method for the ExtensionArray that is a PintArray?

@andrewgsavage
Copy link
Collaborator

I've not seen any pandas interface tests about interpolation so I think it's an issue to raise in pandas.

@burnpanck
Copy link
Contributor

From what I gather from pandas-dev/pandas#25508 (a similar issue for timezone-aware columns), the intention is for the base ExtensionBlock.interpolate is supposed to handle that (it definitely shouldn't just forward to fillna but at least throw NotImplementedError). However, it seems that there are open questions around the API between extension types and the base implementation to support interpolate. It looks like in the datetime-column case, workarounds are being used at the extension type level. Pint-pandas could do that too.

@jbrockmendel
Copy link

This was an issue in pandas. An interpolate methods has been added to EAs that you can implement. There isn't much by way of testing though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants