-
Notifications
You must be signed in to change notification settings - Fork 28.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-13167][SQL] Include rows with null values for partition column when reading from JDBC datasources. #11063
Conversation
@@ -213,14 +212,21 @@ class DataFrameReader private[sql](sqlContext: SQLContext) extends Logging { | |||
url: String, | |||
table: String, | |||
parts: Array[Partition], | |||
connectionProperties: Properties): DataFrame = { | |||
connectionProperties: Properties, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this actually breaks api compatibility
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for reviewing the patch, Reynold.
This particular jdbc method where I made the signature changes is not public. It i defined as private def jdbc ..
@rxin Thank you for reviewing the PR. As I mentioned in my comment, I did not change the public method. Any suggestions to improve this fix ? |
@@ -45,7 +47,8 @@ private[sql] object JDBCRelation { | |||
* incorrect values may cause the partitioning to be poor, but no data | |||
* will fail to be represented. | |||
*/ | |||
def columnPartition(partitioning: JDBCPartitioningInfo): Array[Partition] = { | |||
def columnPartition(partitioning: JDBCPartitioningInfo, | |||
schema: StructType, url: String): Array[Partition] = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add some documentation to this function to explain the parameters?
just do them with @param
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also 4 space indent for function params
…null value partition column rows
f4358bb
to
1e6a631
Compare
Thanks for input, Reynold . Update the PR to specify the is null clause in the first partition where clause. Please review. |
Thanks - can you update the pull request description to reflect the latest change? |
sure. Updated the description. |
Test build #2599 has finished for PR 11063 at commit
|
Thanks - I'm merging this in master. |
Test build #2600 has finished for PR 11063 at commit
|
Thank you. |
… when reading from JDBC datasources. Rows with null values in partition column are not included in the results because none of the partition where clause specify is null predicate on the partition column. This fix adds is null predicate on the partition column to the first JDBC partition where clause. Example: JDBCPartition(THEID < 1 or THEID is null, 0),JDBCPartition(THEID >= 1 AND THEID < 2,1), JDBCPartition(THEID >= 2, 2) Author: sureshthalamati <[email protected]> Closes apache#11063 from sureshthalamati/nullable_jdbc_part_col_spark-13167.
…olumn when reading from JDBC datasources. apache#11063
Rows with null values in partition column are not included in the results because none of the partition
where clause specify is null predicate on the partition column. This fix adds is null predicate on the partition column to the first JDBC partition where clause.
Example:
JDBCPartition(THEID < 1 or THEID is null, 0),JDBCPartition(THEID >= 1 AND THEID < 2,1),
JDBCPartition(THEID >= 2, 2)