[FEA] Support `use_na_sentinel` in factorize #6946

brandon-b-miller · 2020-12-08T21:20:36Z

Is your feature request related to a problem? Please describe.

EDITED 12/16/2022
As of pandas 1.5, the na_sentinel parameter is deprecated in favor of a boolean flag use_na_sentinel. We need to update cudf to match the new behavior to retain compatibility with pandas 2.0.

The rest of this issue's text is preserved for the record, but is no longer relevant.

ORIGINAL ISSUE INTENT

When using cudf.Series.factorize with na_sentinel=None an error is encountered, whereas pandas supports this option.

  File "/cudf/python/cudf/cudf/core/series.py", line 2517, in label_encoding                                                                                                                 
    dtype = min_scalar_type(max(len(cats), na_sentinel), 8)                                                                                                                                                          
TypeError: '>' not supported between instances of 'NoneType' and 'int'

Describe the solution you'd like
As described here When None is passed, any nans in the input data should result in nan being part of the set of uniques returned by the function. In cuDF this likely reads the same except substitute true <NA>.

Describe alternatives you've considered

Additional context
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.factorize.html

The text was updated successfully, but these errors were encountered:

github-actions · 2021-02-16T20:20:14Z

This issue has been marked stale due to no recent activity in the past 30d. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be marked rotten if there is no activity in the next 60d.

github-actions · 2021-05-17T21:04:51Z

This issue has been labeled inactive-90d due to no recent activity in the past 90 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed.

brandon-b-miller added feature request New feature or request Python Affects Python cuDF API. labels Dec 8, 2020

brandon-b-miller mentioned this issue Dec 8, 2020

Share factorize implementation with Index and cudf module #6885

Merged

github-actions bot added the stale label Feb 16, 2021

github-actions bot added the inactive-90d label May 17, 2021

vyasr mentioned this issue Jul 23, 2022

[DISCUSSION] Replacing na_sentinel with null mask for Series.label_encoding #5622

Closed

vyasr changed the title ~~[FEA] Support na_sentinel=None in factorize~~ [FEA] Support use_na_sentinel in factorize Dec 17, 2022

vyasr removed inactive-90d labels Feb 23, 2024

vyasr added this to cuDF Python Nov 5, 2024

github-project-automation bot moved this to Todo in cuDF Python Nov 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEA] Support `use_na_sentinel` in factorize #6946

[FEA] Support `use_na_sentinel` in factorize #6946

brandon-b-miller commented Dec 8, 2020 •

edited by vyasr

Loading

github-actions bot commented Feb 16, 2021

github-actions bot commented May 17, 2021

[FEA] Support use_na_sentinel in factorize #6946

[FEA] Support use_na_sentinel in factorize #6946

Comments

brandon-b-miller commented Dec 8, 2020 • edited by vyasr Loading

github-actions bot commented Feb 16, 2021

github-actions bot commented May 17, 2021

[FEA] Support `use_na_sentinel` in factorize #6946

[FEA] Support `use_na_sentinel` in factorize #6946

brandon-b-miller commented Dec 8, 2020 •

edited by vyasr

Loading