Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Propagate nulls through isin #7556

Open
brandon-b-miller opened this issue Mar 10, 2021 · 3 comments
Open

[FEA] Propagate nulls through isin #7556

brandon-b-miller opened this issue Mar 10, 2021 · 3 comments
Assignees
Labels
feature request New feature or request Python Affects Python cuDF API.

Comments

@brandon-b-miller
Copy link
Contributor

Is your feature request related to a problem? Please describe.
In pandas, we can check if the values of a series or dataframe are contained within some other container, like a list or dataframe, by using isin. Currently, this doesn't work correctly for nulls. On branch-0.19, if the dataframe or series we're checking contains an <NA>, we get a False:

>>> values = cudf.Series([1,2,3])
>>> df = cudf.DataFrame({'a':[1,2,None]})
>>> df
      a
0     1
1     2
2  <NA>
>>> df.isin(values)
       a
0   True
1   True
2  False

Where we should get just another <NA> there, like in pandas, using nullable dtypes:

>>> values = pd.Series([1,2,3], dtype='Int64')
>>> df = pd.DataFrame({'a':pd.Series([1,2,None], dtype='Int64')})
>>> df
      a
0     1
1     2
2  <NA>
>>> df.isin(values)
      a
0  True
1  True
2  <NA>

While the fillna that causes us to get False is being removed in PR #7490, we'll need to rework how we're testing this functionality and change it to test against nullable types. It just so happens that when using non nullable pandas types, we get False as well - hence our results lining up so far.

Describe the solution you'd like
We should get an <NA> everywhere the series or dataframe in question already has an <NA> and our tests should be updated to reflect that.

Describe alternatives you've considered
We could change it as part of PR #7490 but it would be somewhat tangential to the point.

Additional context
Add any other context, code examples, or references to existing implementations about the feature request here.

@github-actions
Copy link

github-actions bot commented Apr 9, 2021

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.

@github-actions
Copy link

github-actions bot commented Feb 7, 2022

This issue has been labeled inactive-90d due to no recent activity in the past 90 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed.

@wence-
Copy link
Contributor

wence- commented Nov 22, 2023

Amusingly this only happens with DataFrame.isin, if asking isin of Series objects, pandas does like cudf.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request Python Affects Python cuDF API.
Projects
Status: Todo
Development

No branches or pull requests

3 participants