-
Notifications
You must be signed in to change notification settings - Fork 28.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MINOR][DOCS][PYTHON] Adding missing boolean type for replacement value in fillna #17688
Conversation
Can you fix up the PR title - it seems to get truncated. Also add |
Hi @felixcheung Thanks for the review. I have added a small testcase. |
Test build #3669 has finished for PR 17688 at commit
|
My apologies for style failures, i have fixed them. |
Jenkins, retest this please |
Test build #76042 has finished for PR 17688 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@holdenk what do you think?
We should also update the list of types a few lines up while we are fixing this. thanks a lot for catching this @vundela |
@holdenk Thanks for the review. Can you please let me know the line number where you are expecting list of types missing. Is this for fillna or other API? |
LGTM too. I just quickly checked if there are similar instances but I could not find and I checked R's one and Scala one. |
@vundela L1237 |
good catch - instead of duplicating it, perhaps just say |
@@ -1238,7 +1238,7 @@ def fillna(self, value, subset=None): | |||
Value to replace null values with. | |||
If the value is a dict, then `subset` is ignored and `value` must be a mapping | |||
from column name (string) to replacement value. The replacement value must be | |||
an int, long, float, or string. | |||
an int, long, float, boolean, or string. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this indicates the replacement If the value is a dict
whereas param value
can't be a bool as below:
>>> from pyspark.sql import Row
>>> spark.createDataFrame([Row(a=None), Row(a=True)]).fillna({"a": True}).first()
Row(a=True)
>>> spark.createDataFrame([Row(a=None), Row(a=True)]).fillna(True).first()
Row(a=None)
I can't find def fill(value: Boolean)
in functions.scala
. Namely, (I guess) this will call it with int
(a parent type). So,
>>> spark.createDataFrame([Row(a=None), Row(a=0)]).fillna(True).first()
Row(a=1)
>>> spark.createDataFrame([Row(a=None), Row(a=0)]).fillna(False).first()
Row(a=0)
So, the current status looks correct to me.
BTW, probably, we should throw an exception in
if not isinstance(value, (float, int, long, basestring, dict)):
raise ValueError("value should be a float, int, long, string, or dict")
in Python boolean is a int - https://www.python.org/dev/peps/pep-0285/
- Should bool inherit from int?
=> Yes.
>>> isinstance(True, int)
True
However, this looks just a documentation fix and I guess there are many instances similar with it. I think it is fine with not fixing it here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense, I'd say that the eventual improvement would maybe be offering fill
for bool for symetry with the rest of the types but its not necessary here rather than type checking for bool on the input.
LGTM, thanks @HyukjinKwon for noticing the lack of bool in the scala code. |
@holdenk, Thanks for reviewing it. I guess L1237 can't be changed until we support boolean type. L1240 specifically talks about the types of values supported in dict. Please let me know if you think otherwise. |
Yup, it's fine hence the LGTM :) |
Thanks @holdenk |
LGTM too. |
…ue in fillna ## What changes were proposed in this pull request? Currently pyspark Dataframe.fillna API supports boolean type when we pass dict, but it is missing in documentation. ## How was this patch tested? >>> spark.createDataFrame([Row(a=True),Row(a=None)]).fillna({"a" : True}).show() +----+ | a| +----+ |true| |true| +----+ Please review http://spark.apache.org/contributing.html before opening a pull request. Author: Srinivasa Reddy Vundela <[email protected]> Closes #17688 from vundela/fillna_doc_fix. (cherry picked from commit 6613046) Signed-off-by: Felix Cheung <[email protected]>
merged to master/2.2 |
What changes were proposed in this pull request?
Currently pyspark Dataframe.fillna API supports boolean type when we pass dict, but it is missing in documentation.
How was this patch tested?
Please review http://spark.apache.org/contributing.html before opening a pull request.