[FEA] Support for an additional parameter in `is_integer`, `all_integer`, `is_float` & `all_float` #5130

galipremsagar · 2020-05-07T18:21:43Z

Is your feature request related to a problem? Please describe.
The current APIs(is_integer, all_integer, is_float & all_float) return False if there are null values in the input, But since we actually differ with pandas in this aspect and we support null values in int & float columns, we'd like to have the above APIs return True when there is a null in the input column.

Describe the solution you'd like
From the discussion we had here: #5054 (comment) It appears to be that we'd like to have an additional parameter to support this feature.

Describe alternatives you've considered
This is currently being handled on the python side with performing a dropna and then calling the above APIs.

Additional context
This request came up as part of #5054

The text was updated successfully, but these errors were encountered:

jrhemstad · 2020-05-07T20:02:09Z

Why is this desired behavior? null is not an integer, right?

kkraus14 · 2020-05-07T21:13:49Z

@jrhemstad we're using these functions to specifically check if we can typecast safely. I.E. expected behavior from the Python side is to throw if a non-integer or non-float is detected because we can't safely typecast it. On the other hand, we can safely carry over nulls without issue so the value is irrelevant.

Alternative, we could return null in cases where the value is null, which we'd then need to do a follow up operation of replace_nulls with True or False against depending on the usage.

jrhemstad · 2020-05-07T22:59:50Z

expected behavior from the Python side is to throw if a non-integer or non-float is detected because we can't safely typecast it.

I don't understand why this requires returning true for is_integer(null).

If you want to ensure all of the elements in a column are valid integers before typecasting, you run is_integer(...) which returns a nullable boolean column, and then you do an any/all reduction which ignores null elements.

What am I missing?

kkraus14 · 2020-05-07T23:54:06Z

I believe that currently is_integer(...) returns False on nulls instead of null. If it returns null we're good to go.

jrhemstad · 2020-05-08T00:08:29Z

I believe that currently is_integer(...) returns False

Ahhhh, now this makes sense. Okay.

galipremsagar · 2020-05-08T00:08:49Z

So, in this PR: #5054, we have used all_integers & all_floats under an understanding of them being reductions of is_integer & is_float respectively. But this seems to not be the case.

is_integer & is_float seem to return null when there is null, and when we apply all reduction the nulls get ignored.

But all_integers & all_floats seem to return False for null.

>>> import cudf
>>> import cudf._lib.strings.char_types as c
>>> s = cudf.Series(["10", np.nan, "1"])
>>> cudf.Series(c.is_integer(s._column)).all()
True
>>> s = cudf.Series(["abc", None, "1"])
>>> cudf.Series(c.is_integer(s._column))
0    False
1     null
2     True
dtype: bool
>>> s = cudf.Series(["10", None, "1"])
>>> cudf.Series(c.is_integer(s._column)).all()
True
>>> c.all_integers(s._column)
False

>>> cudf.Series(c.is_float(s._column))
0    True
1    null
2    True
dtype: bool
>>> cudf.Series(c.is_float(s._column)).all()
True
>>> c.all_floats(s._column)
False

So, if this is expected behavior of all_floats & all_integers we'll fall back to using is_integer + all/ is_float + all.

kkraus14 · 2020-05-08T00:36:23Z

Happy to be incorrect here then! Seems like we can just use is_float(column).all() and is_integer(column).all() and we can likely remove the all_floats and all_integers from libcudf 😄.

galipremsagar · 2020-05-08T00:39:40Z

Happy to be incorrect here then! Seems like we can just use is_float(column).all() and is_integer(column).all() and we can likely remove the all_floats and all_integers from libcudf 😄.

Apologies for the confusion Keith & Jake.

Shall we close this issue then? or wait for comments from @jrhemstad ?

cc: @davidwendt

kkraus14 · 2020-05-08T00:40:11Z

@galipremsagar I think if we're good to go with using is_integer and is_float along with a reduction we can close this issue out.

Maybe raise an issue that we should consider removing all_floats and all_integers as they're unneeded.

galipremsagar · 2020-05-08T00:50:29Z

Sure 👍 Closing this FEA.

galipremsagar added feature request New feature or request Needs Triage Need team to review and classify labels May 7, 2020

galipremsagar mentioned this issue May 7, 2020

[REVIEW] Change String typecasting to be inline with Pandas #5054

Merged

3 tasks

kkraus14 added libcudf Affects libcudf (C++/CUDA) code. strings strings issues (C++ and Python) and removed Needs Triage Need team to review and classify labels May 7, 2020

galipremsagar closed this as completed May 8, 2020

galipremsagar mentioned this issue May 8, 2020

[FEA] Add a new parameter for all_integer & all_float to evaluate null values as True #5136

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEA] Support for an additional parameter in `is_integer`, `all_integer`, `is_float` & `all_float` #5130

[FEA] Support for an additional parameter in `is_integer`, `all_integer`, `is_float` & `all_float` #5130

galipremsagar commented May 7, 2020

jrhemstad commented May 7, 2020

kkraus14 commented May 7, 2020

jrhemstad commented May 7, 2020

kkraus14 commented May 7, 2020

jrhemstad commented May 8, 2020

galipremsagar commented May 8, 2020 •

edited

Loading

kkraus14 commented May 8, 2020

galipremsagar commented May 8, 2020

kkraus14 commented May 8, 2020

galipremsagar commented May 8, 2020

[FEA] Support for an additional parameter in is_integer, all_integer, is_float & all_float #5130

[FEA] Support for an additional parameter in is_integer, all_integer, is_float & all_float #5130

Comments

galipremsagar commented May 7, 2020

jrhemstad commented May 7, 2020

kkraus14 commented May 7, 2020

jrhemstad commented May 7, 2020

kkraus14 commented May 7, 2020

jrhemstad commented May 8, 2020

galipremsagar commented May 8, 2020 • edited Loading

kkraus14 commented May 8, 2020

galipremsagar commented May 8, 2020

kkraus14 commented May 8, 2020

galipremsagar commented May 8, 2020

[FEA] Support for an additional parameter in `is_integer`, `all_integer`, `is_float` & `all_float` #5130

[FEA] Support for an additional parameter in `is_integer`, `all_integer`, `is_float` & `all_float` #5130

galipremsagar commented May 8, 2020 •

edited

Loading