[SPARK-13167][SQL] Include rows with null values for partition column when reading from JDBC datasources. #11063

sureshthalamati · 2016-02-03T22:38:45Z

Rows with null values in partition column are not included in the results because none of the partition
where clause specify is null predicate on the partition column. This fix adds is null predicate on the partition column to the first JDBC partition where clause.

Example:
JDBCPartition(THEID < 1 or THEID is null, 0),JDBCPartition(THEID >= 1 AND THEID < 2,1),
JDBCPartition(THEID >= 2, 2)

rxin · 2016-02-04T05:58:59Z

sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala

@@ -213,14 +212,21 @@ class DataFrameReader private[sql](sqlContext: SQLContext) extends Logging {
      url: String,
      table: String,
      parts: Array[Partition],
-      connectionProperties: Properties): DataFrame = {
+      connectionProperties: Properties,


this actually breaks api compatibility
.

Thank you for reviewing the patch, Reynold.
This particular jdbc method where I made the signature changes is not public. It i defined as private def jdbc ..

sureshthalamati · 2016-02-23T06:32:10Z

@rxin Thank you for reviewing the PR. As I mentioned in my comment, I did not change the public method. Any suggestions to improve this fix ?

rxin · 2016-02-23T08:39:58Z

sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCRelation.scala

@@ -45,7 +47,8 @@ private[sql] object JDBCRelation {
   * incorrect values may cause the partitioning to be poor, but no data
   * will fail to be represented.
   */
-  def columnPartition(partitioning: JDBCPartitioningInfo): Array[Partition] = {
+  def columnPartition(partitioning: JDBCPartitioningInfo,
+    schema: StructType, url: String): Array[Partition] = {


can you add some documentation to this function to explain the parameters?

just do them with @param

also 4 space indent for function params

…null value partition column rows

sureshthalamati · 2016-03-01T23:20:01Z

Thanks for input, Reynold . Update the PR to specify the is null clause in the first partition where clause. Please review.

rxin · 2016-03-01T23:45:44Z

Thanks - can you update the pull request description to reflect the latest change?

sureshthalamati · 2016-03-02T00:40:29Z

sure. Updated the description.

SparkQA · 2016-03-02T01:31:56Z

Test build #2599 has finished for PR 11063 at commit 1e6a631.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

rxin · 2016-03-02T01:33:59Z

Thanks - I'm merging this in master.

SparkQA · 2016-03-02T01:42:06Z

Test build #2600 has finished for PR 11063 at commit 1e6a631.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

sureshthalamati · 2016-03-02T02:17:49Z

Thank you.

… when reading from JDBC datasources. Rows with null values in partition column are not included in the results because none of the partition where clause specify is null predicate on the partition column. This fix adds is null predicate on the partition column to the first JDBC partition where clause. Example: JDBCPartition(THEID < 1 or THEID is null, 0),JDBCPartition(THEID >= 1 AND THEID < 2,1), JDBCPartition(THEID >= 2, 2) Author: sureshthalamati <[email protected]> Closes apache#11063 from sureshthalamati/nullable_jdbc_part_col_spark-13167.

…olumn when reading from JDBC datasources. apache#11063

rxin reviewed Feb 4, 2016
View reviewed changes

rxin reviewed Feb 23, 2016
View reviewed changes

adding null predicate to the first partition where clause to include …

1e6a631

…null value partition column rows

sureshthalamati force-pushed the nullable_jdbc_part_col_spark-13167 branch from f4358bb to 1e6a631 Compare March 1, 2016 23:17

asfgit closed this in e42724b Mar 2, 2016

zzcclp added a commit to zzcclp/spark that referenced this pull request Aug 19, 2016

[EXT][SPARK-13167][SQL] Include rows with null values for partition c…

7c880cd

…olumn when reading from JDBC datasources. apache#11063

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-13167][SQL] Include rows with null values for partition column when reading from JDBC datasources. #11063

[SPARK-13167][SQL] Include rows with null values for partition column when reading from JDBC datasources. #11063

sureshthalamati commented Feb 3, 2016

rxin Feb 4, 2016

sureshthalamati Feb 4, 2016

sureshthalamati commented Feb 23, 2016

rxin Feb 23, 2016

rxin Feb 23, 2016

sureshthalamati commented Mar 1, 2016

rxin commented Mar 1, 2016

sureshthalamati commented Mar 2, 2016

SparkQA commented Mar 2, 2016

rxin commented Mar 2, 2016

SparkQA commented Mar 2, 2016

sureshthalamati commented Mar 2, 2016

[SPARK-13167][SQL] Include rows with null values for partition column when reading from JDBC datasources. #11063

[SPARK-13167][SQL] Include rows with null values for partition column when reading from JDBC datasources. #11063

Conversation

sureshthalamati commented Feb 3, 2016

rxin Feb 4, 2016

Choose a reason for hiding this comment

sureshthalamati Feb 4, 2016

Choose a reason for hiding this comment

sureshthalamati commented Feb 23, 2016

rxin Feb 23, 2016

Choose a reason for hiding this comment

rxin Feb 23, 2016

Choose a reason for hiding this comment

sureshthalamati commented Mar 1, 2016

rxin commented Mar 1, 2016

sureshthalamati commented Mar 2, 2016

SparkQA commented Mar 2, 2016

rxin commented Mar 2, 2016

SparkQA commented Mar 2, 2016

sureshthalamati commented Mar 2, 2016