API: Series[EA].fillna fallback behavior with incompatible value #45153

jbrockmendel · 2022-01-01T02:39:07Z

dti = pd.DatetimeIndex([pd.NaT, "2016-01-01"], tz="UTC")
tdi = dti - dti[1]
pi = dti.tz_localize(None).to_period("D")
ci = pi.astype("category")
ii = pd.IntervalIndex([None])

pd.Series(dti).fillna("foo")  # <- casts
pd.Series(tdi).fillna("foo")  # <- raises
pd.Series(pi).fillna("foo")  # <- casts, but untested
pd.Series(ci).fillna("foo")  # <- raises
pd.Series(ii).fillna("foo")  # <- raises, untested

ATM fillna behavior with incompatible values is pretty inconsistent. With numpy dtypes (except dt64 and td64) we always cast. With dt64 and dt64tz we always cast.

With td64 we always raise, but we only have tests for the value being an integer. Back circa 2018 we used to coerce integers to timedelta, so pd.Series(tdi).fillna(1) would interpret the 1 as pd.Timedelta(seconds=1) (not nanos!). We deprecated that coercion and now raise, but I don't think there was much thought given to whether to raise versus cast to object.

With Categorical we raise. We have 2 tests specific to that.

With PeriodDtype we cast and with IntervalDtype we raise, but we have tests for neither.

I don't have a particularly strong opinion on this, but would like to be consistent.

The text was updated successfully, but these errors were encountered:

jreback · 2022-01-01T02:53:17Z

i think we should move to a world when we raise for an incompatible by default but allow control thru an errors='raise' (default) or errors='allow'

these are equivalent (and maybe should be the same)

as the proposed cast='safe' and 'unsafe'

jbrockmendel · 2022-01-01T03:35:21Z

i think we should move to a world when we raise for an incompatible by default but allow control thru an errors='raise' (default) or errors='allow'

i've been thinking something similar for .where and .mask... but in 1.4 we're deprecating the 'errors' keyword since it wasn't actually used. could revert that deprecation in time for the rc and then add a deprecation for actually using the keyword correctly?

jreback · 2022-01-01T05:28:54Z

i think it's ok to leave it
we can always reverse or change it later i think

jbrockmendel · 2022-01-21T02:45:10Z

I've spent some time figuring out what it would take to be consistent across methods/dtypes and what other issues are intertwined with this.

First a reminder of the status quo. The relevant methods are NDFrame fillna, where, mask, replace, shift, unstack, reindex and the various __setitem__ methods.

Each of these methods involve setting some 'other' (or fill_value) value into an array (np.ndarray | ExtensionArray) 'values'. This discussion is about what we do when 'other' cannot be set into 'values'.

In some cases we raise. In others we coerce 'values' to a dtype that can hold 'other'. This coercion happens in Block.coerce_to_target_dtype.

When 'values' is an np.ndarray, we always coerce.

With ExtensionArray (EA) 'values', we are inconsistent. Most cases raise, so here I'll list cases that coerce.

- fillna
    - DatetimeArray and PeriodArray (but not TimedeltaArray) coerce
    - Except for cases that go through EABackedBlock.interpolate, which will raise

- mask(inplace=True)
    - IntervalDtype will coerce from e.g. Interval[int] to Interval[float], but will raise rather coercing to object
    - DatetimeArray, TimedeltaArray, PeriodArray will coerce

- where, mask(inplace=False)
    - IntervalDtype will coerce from e.g. Interval[int] to Interval[float], but will raise rather coercing to object
    - DatetimeArray and TimedeltaArray (but not PeriodArray xref GH#45148) will coerce

- replace, replace_list
    - IntervalDtype, DatetimeArray, TimedeltaArray, PeriodArray will coerce
    - Categorical will coerce using *special* logic implemented in Categorical._replace

- __setitem__
    - IntervalArray, DatetimeArray, TimedeltaArray, PeriodArray coerce

The options to make these consistent are roughly:

Deprecate/change to always raise.
- xref analogous for setitem DISCUSS/API: setitem-like operations should only update inplace and never fallback with upcast (i.e never change the dtype) #39584
Deprecate/change to always coerce.
Deprecate/change to coerce but not to object (like IntervalDtype does with 'where')
- https://mail.python.org/pipermail/pandas-dev/2021-October/001408.html
Add a keyword e.g. 'errors' and deprecate to actually use it.
- Leaves out __setitem__

The big trouble with either a keyword or always-coerce is allowing EAs to specify their own coercion logic (xref #24246) and implementing Categorical coercion logic in a reasonable way.

To qualify as An Elegant Solution, we'd want the coercion logic for Categorical to be used in Categorical._replace to avoid having bespoke logic there, as well as resolve merge/concat inconsistencies #41626, #24093, #42840, #15332, ... (dont have a complete list of these, haven't fully vetted these)

The keyword option is the least opinionated option, but it would mean a significantly increased API surface to test that I'm not looking forward to.

jbrockmendel · 2023-07-12T21:36:16Z

@MarcoGorelli is this closed by PDEP6?

MarcoGorelli · 2023-07-24T11:27:34Z

@MarcoGorelli is this closed by PDEP6?

~~Not by the PRs I currently have open, but I think it'd be in scope~~ Yes, but only for the fillna(..., inplace=True) case

jbrockmendel added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Jan 1, 2022

This was referenced Jan 4, 2022

POC/API/DEPR: errors kwd for fillna #45190

Closed

[ArrayManager] Array version of fillna logic #41104

Closed

mroeschke added ExtensionArray Extending pandas with custom dtypes or arrays. Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Series Series data structure and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Jan 13, 2022

jbrockmendel mentioned this issue Jan 31, 2022

BUG: TypeError when attempting to replace Nullable integer data type with a float value #45729

Closed

3 tasks

roib20 mentioned this issue Feb 1, 2022

BUG: Replacing pd.NA by None has no effect #45601

Closed

3 tasks

jbrockmendel added the PDEP missing values Issues that would be addressed by the Ice Cream Agreement from the Aug 2023 sprint label Oct 25, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

API: Series[EA].fillna fallback behavior with incompatible value #45153

API: Series[EA].fillna fallback behavior with incompatible value #45153

jbrockmendel commented Jan 1, 2022 •

edited

Loading

jreback commented Jan 1, 2022

jbrockmendel commented Jan 1, 2022

jreback commented Jan 1, 2022

jbrockmendel commented Jan 21, 2022

jbrockmendel commented Jul 12, 2023

MarcoGorelli commented Jul 24, 2023 •

edited

Loading

API: Series[EA].fillna fallback behavior with incompatible value #45153

API: Series[EA].fillna fallback behavior with incompatible value #45153

Comments

jbrockmendel commented Jan 1, 2022 • edited Loading

jreback commented Jan 1, 2022

jbrockmendel commented Jan 1, 2022

jreback commented Jan 1, 2022

jbrockmendel commented Jan 21, 2022

jbrockmendel commented Jul 12, 2023

MarcoGorelli commented Jul 24, 2023 • edited Loading

jbrockmendel commented Jan 1, 2022 •

edited

Loading

MarcoGorelli commented Jul 24, 2023 •

edited

Loading