-
Notifications
You must be signed in to change notification settings - Fork 28.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-4493][SQL] Don't pushdown Eq, NotEq, Lt, LtEq, Gt and GtEq predicates with nulls for Parquet #3367
Conversation
FilterApi.eq(binaryColumn(n), Binary.fromByteArray(v.asInstanceOf[Array[Byte]])) | ||
(n: String, v: Any) => FilterApi.eq( | ||
binaryColumn(n), | ||
Option(v).map(b => Binary.fromByteArray(v.asInstanceOf[Array[Byte]])).orNull) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Binary.fromString
and Binary.fromByteArray
don't accept null
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add this as a comment.
Test build #23612 has started for PR 3367 at commit
|
Test build #23612 has finished for PR 3367 at commit
|
Test FAILed. |
Build failure due to syncing issue between GitHub and ASF Git repo. |
retest this please |
Test build #23654 has started for PR 3367 at commit
|
Test build #23654 has finished for PR 3367 at commit
|
Test FAILed. |
Test build #530 has started for PR 3367 at commit
|
Test build #530 has finished for PR 3367 at commit
|
@@ -85,6 +86,7 @@ class ParquetQuerySuite extends QueryTest with FunSuiteLike with BeforeAndAfterA | |||
TestData // Load test data tables. | |||
|
|||
var testRDD: SchemaRDD = null | |||
var originalParquetFilterPushdownEnabled = TestSQLContext.parquetFilterPushDown |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why var
?
Minor comments, otherwise LGTM. |
Addressed all styling issues. Thanks! |
12c9d1c
to
cc41281
Compare
Test build #24026 has started for PR 3367 at commit
|
Test build #24027 has started for PR 3367 at commit
|
Test build #24026 has finished for PR 3367 at commit
|
Test PASSed. |
Test build #24027 has finished for PR 3367 at commit
|
Test PASSed. |
Thanks! Merged to master. |
This is a follow-up of #3367 and #3644. At the time #3644 was written, #3367 hadn't been merged yet, thus `IsNull` and `IsNotNull` filters are not covered in the first version of `ParquetFilterSuite`. This PR adds corresponding test cases. <!-- Reviewable:start --> [<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/apache/spark/3748) <!-- Reviewable:end --> Author: Cheng Lian <[email protected]> Closes #3748 from liancheng/test-null-filters and squashes the following commits: 1ab943f [Cheng Lian] IsNull and IsNotNull Parquet filter test case for boolean type bcd616b [Cheng Lian] Adds Parquet filter pushedown tests for IsNull and IsNotNull
Predicates like
a = NULL
anda < NULL
can't be pushed down since ParquetLt
,LtEq
,Gt
,GtEq
doesn't accept null value. Note thatEq
andNotEq
can only be used withnull
to represent predicates likea IS NULL
anda IS NOT NULL
.However, normally this issue doesn't cause NPE because any value compared to
NULL
resultsNULL
, and Spark SQL automatically optimizes outNULL
predicate in theSimplifyFilters
rule. Only testing code that intentionally disables the optimizer may trigger this issue. (That's why this issue is not marked as blocker and I do NOT think we need to backport this to branch-1.1This PR restricts
Lt
,LtEq
,Gt
andGtEq
to non-null values only, and only usesEq
with null value to pushdownIsNull
andIsNotNull
. Also, added support for ParquetNotEq
filter for completeness and (tiny) performance gain, it's also used to pushdownIsNotNull
.