-
-
Notifications
You must be signed in to change notification settings - Fork 18.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[0.24.1] New nullable integer fillna with non-int doesn't coerce to object #25288
Comments
This works fine if you use an actual integer value to fill, so there's not really much of a point to using Int64 in this case since you're still asking for an object. In any case I suppose it should still coerce to object for you like using float here would. Investigation and PRs are always welcome |
I vaguely recall some discussion on whether ExtensionArray.fillna should
allow coercing the array to the dtype of the `fill_value`. I
don't recall if we reached a final conclusion. It's somewhat inconvenient
to have to manual `.astype` before filling with a different
dtype, but the type stability ensured by `ExtensionArray[T].fillna ->
ExtensionArray[T]` is nice.
…On Tue, Feb 12, 2019 at 10:35 PM William Ayd ***@***.***> wrote:
This works fine if you use an actual integer value to fill, so there's not
really much of a point to using Int64 in this case since you're still
asking for an object.
In any case I suppose it should still coerce to object for you like using
float here would. Investigation and PRs are always welcome
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#25288 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABQHIi1zyiheZ19Ef7M62IDXIE5hr0Psks5vM5YbgaJpZM4a37EH>
.
|
In fact, since I'm using pandas for an ETL tool, this doesn't look nice to me. The alternative I used is to remove the ".0" part after "astype(object)" and fill it with NaNs values. |
@jelther getting slightly off topic but if you are getting zeros appended to your integers its because they are getting cast to float at some point. Explicitly constructing your data frame with |
@WillAyd , I think this is the best alternative but I'm not able to specify since I'm extracting the data from a SQL Server database with "pandas.read_sql". I don't see on the documentation how I would be able to specify the "dtypes" when selecting the data. |
+1. For eg DatetimeArray, the fill value also needs to be a datetime-like. |
I would assume that >>> pd.Series([1, 2, None], dtype='Int64') + 0.5
0 1.5
1 2.5
2 NaN
dtype: float64 However; >>> pd.Series([1, 2, None], dtype='Int64').fillna('')
TypeError: <U1 cannot be converted to an IntegerDtype |
This behaviour is also displayed when using >>> pd.Series([1, 2, None], dtype='float64').fillna('')
0 1
1 2
2
dtype: object |
Why do you prefer coercing the series to the dtype of the fill value, rather than the other way around? It's not clear to me that one is preferable to the other. |
Because I need to fill the But I also understand there is something to say for not doing so. Maybe a boolean argument such as |
Definitely on the IntegerArray the |
Resolves pandas-dev#25472, resolves pandas-dev#25288.
Resolves pandas-dev#25472, resolves pandas-dev#25288.
Resolves pandas-dev#25472, resolves pandas-dev#25288.
Resolves pandas-dev#25472, resolves pandas-dev#25288.
@jbrockmendel The docs for |
@alexreg the methods on ExtensionArray subclasses are in general stricter about what the allow than the Series/DataFrame methods. This less-strict behavior is implemented on the Block subclasses. |
@jbrockmendel Oh, I see. That makes more sense now. This seems like a slightly tricky matter to get exactly right, and I'd rather not take it on in my existing PR (or another even). Do you have any inclination to have a go at it yourself? |
I'll get to it eventually if no one else does, but it isn't a priority for me ATM. |
@jbrockmendel If you don't quickly pointing me to what you think are the right functions/methods to look at, I may have a go (fairly) soon. |
BTW, I don't think we should change this for the nullable dtypes. I personally think the no-casting behaviour is better (although inconsistent with other dtypes), and unless we have a discussion about which long-term behaviour we want and decide we want casting, IMO we should keep the strict behaviour of fillna for nullable dtypes for now (also on the block level). |
are you then also saying there is no room for a flag (eg |
@alexreg notwithstanding Joris's (reasonable) objection, the place where you would change the behavior is in |
Not necessarily, as that has not yet been brought up in the discussion up to now, as far as I know. |
i brought it up here: #25288 (comment) i still think it makes more sense to be able to just use what you are suggesting from a user perspective, is that now sometimes i can |
@jbrockmendel Thanks For what it's worth, while I certainly see @jorisvandenbossche's point, and feel that sort of behaviour would be appropriate under other circumstances, I agree with @jorijnsmit here. At the end of the day, Pandas has set a precedent for virtually ubiquitous implicit coercion, and to not do so here would indeed seem inconsistent. If we don't want to have implicit coercion here, then we probably shouldn't in lots of other places too, but that design decision has already been made, long ago — and it was probably the right one, given Panda's emphasis on ergonomics over things like strong typing. |
Code Sample
Problem description
Using the new nullable type Int64, it is not possible to fill "NaN" values with other value.
Error raised
Expected Output
The new dataframe should have replaced it's NaN values with the desired input of .fillna() method.
Output of
pd.show_versions()
INSTALLED VERSIONS
commit: None
python: 3.6.4.final.0
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 85 Stepping 4, GenuineIntel
byteorder: little
LC_ALL: None
LANG: en
LOCALE: None.None
pandas: 0.24.1
pytest: 3.3.2
pip: 9.0.1
setuptools: 38.4.0
Cython: 0.27.3
numpy: 1.14.0
scipy: 1.0.0
pyarrow: None
xarray: None
IPython: 6.2.1
sphinx: 1.6.6
patsy: 0.5.0
dateutil: 2.6.1
pytz: 2017.3
blosc: None
bottleneck: 1.2.1
tables: 3.4.2
numexpr: 2.6.4
feather: None
matplotlib: 2.1.2
openpyxl: 2.5.12
xlrd: 1.1.0
xlwt: 1.3.0
xlsxwriter: 1.0.2
lxml.etree: 4.1.1
bs4: 4.6.0
html5lib: 1.0.1
sqlalchemy: 1.1.18
pymysql: None
psycopg2: None
jinja2: 2.8.1
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
gcsfs: None
The text was updated successfully, but these errors were encountered: