Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MINOR][DOCS][PYTHON] Adding missing boolean type for replacement value in fillna #17688

Closed
wants to merge 1 commit into from

Conversation

vundela
Copy link

@vundela vundela commented Apr 19, 2017

What changes were proposed in this pull request?

Currently pyspark Dataframe.fillna API supports boolean type when we pass dict, but it is missing in documentation.

How was this patch tested?

spark.createDataFrame([Row(a=True),Row(a=None)]).fillna({"a" : True}).show()
+----+
| a|
+----+
|true|
|true|
+----+

Please review http://spark.apache.org/contributing.html before opening a pull request.

@felixcheung
Copy link
Member

felixcheung commented Apr 19, 2017

Can you fix up the PR title - it seems to get truncated. Also add [PYTHON] to the title
Is there any existing test to add to?

@vundela vundela changed the title [MINOR][DOCS] Adding missing boolean type for replacement value in fi… [MINOR][DOCS][PYTHON] Adding missing boolean type for replacement value in fillna Apr 20, 2017
@vundela
Copy link
Author

vundela commented Apr 21, 2017

Hi @felixcheung Thanks for the review. I have added a small testcase.

@SparkQA
Copy link

SparkQA commented Apr 21, 2017

Test build #3669 has finished for PR 17688 at commit acae068.

  • This patch fails Python style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@vundela
Copy link
Author

vundela commented Apr 21, 2017

My apologies for style failures, i have fixed them.

@felixcheung
Copy link
Member

Jenkins, retest this please

@SparkQA
Copy link

SparkQA commented Apr 21, 2017

Test build #76042 has finished for PR 17688 at commit 1ec1c0d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

@felixcheung felixcheung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM
@holdenk what do you think?

@holdenk
Copy link
Contributor

holdenk commented Apr 21, 2017

We should also update the list of types a few lines up while we are fixing this. thanks a lot for catching this @vundela

@vundela
Copy link
Author

vundela commented Apr 21, 2017

@holdenk Thanks for the review. Can you please let me know the line number where you are expecting list of types missing. Is this for fillna or other API?

@HyukjinKwon
Copy link
Member

LGTM too. I just quickly checked if there are similar instances but I could not find and I checked R's one and Scala one.

@holdenk
Copy link
Contributor

holdenk commented Apr 22, 2017

@vundela L1237

@felixcheung
Copy link
Member

good catch - instead of duplicating it, perhaps just say supported data types or supported data types above

@@ -1238,7 +1238,7 @@ def fillna(self, value, subset=None):
Value to replace null values with.
If the value is a dict, then `subset` is ignored and `value` must be a mapping
from column name (string) to replacement value. The replacement value must be
an int, long, float, or string.
an int, long, float, boolean, or string.
Copy link
Member

@HyukjinKwon HyukjinKwon Apr 23, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this indicates the replacement If the value is a dict whereas param value can't be a bool as below:

>>> from pyspark.sql import Row
>>> spark.createDataFrame([Row(a=None), Row(a=True)]).fillna({"a": True}).first()
Row(a=True)
>>> spark.createDataFrame([Row(a=None), Row(a=True)]).fillna(True).first()
Row(a=None)

I can't find def fill(value: Boolean) in functions.scala. Namely, (I guess) this will call it with int (a parent type). So,

>>> spark.createDataFrame([Row(a=None), Row(a=0)]).fillna(True).first()
Row(a=1)
>>> spark.createDataFrame([Row(a=None), Row(a=0)]).fillna(False).first()
Row(a=0)

So, the current status looks correct to me.

BTW, probably, we should throw an exception in

if not isinstance(value, (float, int, long, basestring, dict)):
    raise ValueError("value should be a float, int, long, string, or dict")

in Python boolean is a int - https://www.python.org/dev/peps/pep-0285/

  1. Should bool inherit from int?

=> Yes.

>>> isinstance(True, int)
True

However, this looks just a documentation fix and I guess there are many instances similar with it. I think it is fine with not fixing it here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense, I'd say that the eventual improvement would maybe be offering fill for bool for symetry with the rest of the types but its not necessary here rather than type checking for bool on the input.

@holdenk
Copy link
Contributor

holdenk commented Apr 23, 2017

LGTM, thanks @HyukjinKwon for noticing the lack of bool in the scala code.

@vundela
Copy link
Author

vundela commented Apr 24, 2017

@holdenk, Thanks for reviewing it. I guess L1237 can't be changed until we support boolean type. L1240 specifically talks about the types of values supported in dict. Please let me know if you think otherwise.

@holdenk
Copy link
Contributor

holdenk commented Apr 24, 2017

Yup, it's fine hence the LGTM :)

@vundela
Copy link
Author

vundela commented Apr 24, 2017

Thanks @holdenk

@HyukjinKwon
Copy link
Member

LGTM too.

@asfgit asfgit closed this in 6613046 May 1, 2017
asfgit pushed a commit that referenced this pull request May 1, 2017
…ue in fillna

## What changes were proposed in this pull request?

Currently pyspark Dataframe.fillna API supports boolean type when we pass dict, but it is missing in documentation.

## How was this patch tested?
>>> spark.createDataFrame([Row(a=True),Row(a=None)]).fillna({"a" : True}).show()
+----+
|   a|
+----+
|true|
|true|
+----+

Please review http://spark.apache.org/contributing.html before opening a pull request.

Author: Srinivasa Reddy Vundela <[email protected]>

Closes #17688 from vundela/fillna_doc_fix.

(cherry picked from commit 6613046)
Signed-off-by: Felix Cheung <[email protected]>
@felixcheung
Copy link
Member

merged to master/2.2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants