You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The rest of this issue's text is preserved for the record, but is no longer relevant.
ORIGINAL ISSUE INTENT
When using cudf.Series.factorize with na_sentinel=None an error is encountered, whereas pandas supports this option.
File "/cudf/python/cudf/cudf/core/series.py", line 2517, in label_encoding
dtype = min_scalar_type(max(len(cats), na_sentinel), 8)
TypeError: '>' not supported between instances of 'NoneType' and 'int'
Describe the solution you'd like
As described here When None is passed, any nans in the input data should result in nan being part of the set of uniques returned by the function. In cuDF this likely reads the same except substitute true <NA>.
This issue has been marked stale due to no recent activity in the past 30d. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be marked rotten if there is no activity in the next 60d.
This issue has been labeled inactive-90d due to no recent activity in the past 90 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed.
Is your feature request related to a problem? Please describe.
EDITED 12/16/2022
As of pandas 1.5, the
na_sentinel
parameter is deprecated in favor of a boolean flaguse_na_sentinel
. We need to update cudf to match the new behavior to retain compatibility with pandas 2.0.The rest of this issue's text is preserved for the record, but is no longer relevant.
ORIGINAL ISSUE INTENT
When using
cudf.Series.factorize
withna_sentinel=None
an error is encountered, whereas pandas supports this option.Describe the solution you'd like
As described here When
None
is passed, any nans in the input data should result innan
being part of the set of uniques returned by the function. In cuDF this likely reads the same except substitute true<NA>
.Describe alternatives you've considered
Additional context
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.factorize.html
The text was updated successfully, but these errors were encountered: