Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Support use_na_sentinel in factorize #6946

Open
brandon-b-miller opened this issue Dec 8, 2020 · 2 comments
Open

[FEA] Support use_na_sentinel in factorize #6946

brandon-b-miller opened this issue Dec 8, 2020 · 2 comments
Labels
feature request New feature or request Python Affects Python cuDF API.

Comments

@brandon-b-miller
Copy link
Contributor

brandon-b-miller commented Dec 8, 2020

Is your feature request related to a problem? Please describe.

EDITED 12/16/2022
As of pandas 1.5, the na_sentinel parameter is deprecated in favor of a boolean flag use_na_sentinel. We need to update cudf to match the new behavior to retain compatibility with pandas 2.0.

The rest of this issue's text is preserved for the record, but is no longer relevant.

ORIGINAL ISSUE INTENT

When using cudf.Series.factorize with na_sentinel=None an error is encountered, whereas pandas supports this option.

  File "/cudf/python/cudf/cudf/core/series.py", line 2517, in label_encoding                                                                                                                 
    dtype = min_scalar_type(max(len(cats), na_sentinel), 8)                                                                                                                                                          
TypeError: '>' not supported between instances of 'NoneType' and 'int'

Describe the solution you'd like
As described here When None is passed, any nans in the input data should result in nan being part of the set of uniques returned by the function. In cuDF this likely reads the same except substitute true <NA>.

Describe alternatives you've considered

Additional context
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.factorize.html

@github-actions
Copy link

This issue has been marked stale due to no recent activity in the past 30d. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be marked rotten if there is no activity in the next 60d.

@github-actions github-actions bot added the stale label Feb 16, 2021
@github-actions
Copy link

This issue has been labeled inactive-90d due to no recent activity in the past 90 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed.

@vyasr vyasr changed the title [FEA] Support na_sentinel=None in factorize [FEA] Support use_na_sentinel in factorize Dec 17, 2022
@vyasr vyasr added this to cuDF Python Nov 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request Python Affects Python cuDF API.
Projects
Status: Todo
Development

No branches or pull requests

2 participants