-
Notifications
You must be signed in to change notification settings - Fork 902
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Assigning scalar boolean to a Series w/ nulls results in wrong data type #9337
Comments
From offline discussion, the following example illustrates the discrepancy in coercing (or not coercing) from the original default dtype to bool. import cudf
import pandas as pd
df = cudf.DataFrame({'val': [None, None, None]})
print(df.val.dtype)
df["val"] = True
print(df)
print(df.val.dtype, "\n")
df = pd.DataFrame({'val': [None, None, None]})
print(df.val.dtype)
df["val"] = True
print(df)
print(df.val.dtype)
# cuDF
float64
val
0 1.0
1 1.0
2 1.0
float64
# pandas
object
val
0 True
1 True
2 True
bool |
As Nick mentioned, the crux of the problem is that we default to In [8]: cudf.Series([None, None])
Out[8]:
0 <NA>
1 <NA>
dtype: float64
In [10]: pd.Series([None, None])
Out[10]:
0 None
1 None
dtype: object We could default to In [13]: df = cudf.DataFrame({"val": cudf.Series([None, None], dtype="object")})
In [14]: df
Out[14]:
val
0 <NA>
1 <NA>
In [15]: df["val"] = True
In [16]: df
Out[16]:
val
0 True
1 True |
This issue has been labeled |
…9803) Fixes: #9337 - [x] This PR changes the default `dtype` of `all-nulls` column to `object` dtype from `float64` dtype. - [x] To make `np.nan` values read as `float` column `nan_as_null` has to be passed as `False` in `cudf.DataFrame` constructor - This change is in-line with what is already supported by `cudf.Series` constructor. - [x] Added `has_nans` & `nan_count` property which is needed for some of the checks. - [x] Cached the `nan_count` since it is repeatedly used in math operations and clearing the cache in the regular `_clear_cache` call. - [x] Fixes pytests that are going to break due to this breaking change of types. Authors: - GALI PREM SAGAR (https://github.com/galipremsagar) Approvers: - https://github.com/brandon-b-miller - Ashwin Srinath (https://github.com/shwina) URL: #9803
Using latest nightly cudf conda packages:
Assigning w/ booleans at DF creation time works as expected:
But assigning scalar after initializing w/ all nulls gives an unexpected float64 dtype where I'd expect a bool:
Pandas's behavior:
The text was updated successfully, but these errors were encountered: