-
Notifications
You must be signed in to change notification settings - Fork 28.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-9101] [PySpark] Add missing NullType #7499
Conversation
Jenkins, test this please. |
cc @davies |
I'm just wondering that is there a real use case that need NullType? Currently, it's only used during type inferring. |
Test build #37728 has finished for PR 7499 at commit
|
It can happen if there is a null literal -- I'm not sure what happens in Python though. |
@rxin, @davies, the JIRA ticket contains an example of a query that fails due to this issue: https://issues.apache.org/jira/browse/SPARK-9101 @sixers, it might be nice to add a regression test based on the simple example you gave in the JIRA. |
@JoshRosen I see, thanks! |
This is my first contribution to Spark, would you give me some directions where to put this test? In general what is broken is parsing schema of Java DataFrame (with NullType). It's done lazily here: spark/python/pyspark/sql/dataframe.py Line 182 in 692378c
which eventually uses spark/python/pyspark/sql/types.py Line 708 in 692378c
So it also breaks in other cases like this: sqlContext.createDataFrame(sc.parallelize([(None,1),(None,2), (None,3), (None, 4)]), samplingRatio=0.5).collect()
sqlContext.createDataFrame([[None]], schema=StructType([StructField("col", NullType(), True)])).collect() Because of that I think tests should be written for _parse_datatype_json_value. There are some tests in spark/python/pyspark/sql/types.py Line 651 in 692378c
Tests for simple types are dynamic, created by iterating over spark/python/pyspark/sql/types.py Line 660 in 692378c
In general I think there are two options:
I'm not sure if it brings any value. What do you think? Should I go with one of those or you see other places where I can introduce a test for that? |
@sixers, my suggestion was to add an end-to-end test, like spark/python/pyspark/sql/tests.py Line 130 in 163e3f1
test_select_null_literal .
The fact that this bug was unnoticed for so long implies that our Python suite doesn't contain any tests which try to select null literals, which is why I wanted to add such a test. |
I see, thanks for the suggestion, I added this test. |
Jenkins, this is ok to test. |
Test build #37847 has finished for PR 7499 at commit
|
Thanks! Going to merge this. |
Actually I'm having some trouble with ASF git. I will merge when that works. |
I merged it. |
JIRA: https://issues.apache.org/jira/browse/SPARK-9101 Author: Mateusz Buśkiewicz <[email protected]> Closes #7499 from sixers/spark-9101 and squashes the following commits: dd75aa6 [Mateusz Buśkiewicz] [SPARK-9101] [PySpark] Test for selecting null literal 97e3f2f [Mateusz Buśkiewicz] [SPARK-9101] [PySpark] Add missing NullType to _atomic_types in pyspark.sql.types (cherry picked from commit 02181fb) Signed-off-by: Reynold Xin <[email protected]>
Note: I merged it in master (1.5.0), as well as branch-1.4 (1.4.2). |
JIRA: https://issues.apache.org/jira/browse/SPARK-9101