[SPARK-23268][SQL]Reorganize packages in data source V2 #20435

gengliangwang · 2018-01-30T09:41:12Z

What changes were proposed in this pull request?

create a new package for partitioning/distribution related classes.
As Spark will add new concrete implementations of Distribution in new releases, it is good to
have a new package for partitioning/distribution related classes.
move streaming related class to package org.apache.spark.sql.sources.v2.reader/writer.streaming, instead of org.apache.spark.sql.sources.v2.streaming.reader/writer.
So that the there won't be package reader/writer inside package streaming, which is quite confusing.
Before change:

v2
├── reader
├── streaming
│   ├── reader
│   └── writer
└── writer

After change:

v2
├── reader
│   └── streaming
└── writer
    └── streaming

How was this patch tested?

Unit test.

SparkQA · 2018-01-30T09:44:32Z

Test build #86814 has finished for PR 20435 at commit 3dc5622.

This patch fails Scala style tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2018-01-30T11:28:50Z

Test build #86817 has finished for PR 20435 at commit 719cae2.

This patch fails to build.
This patch merges cleanly.
This patch adds the following public classes (experimental):
case class KafkaSourceOffset(partitionToOffsets: Map[TopicPartition, Long])

SparkQA · 2018-01-30T11:57:37Z

Test build #86820 has finished for PR 20435 at commit 17f8a5e.

This patch fails to build.
This patch does not merge cleanly.
This patch adds no public classes.

SparkQA · 2018-01-30T12:54:19Z

Test build #86824 has finished for PR 20435 at commit c7c0a1d.

This patch fails Scala style tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
case class KafkaSourceOffset(partitionToOffsets: Map[TopicPartition, Long])

SparkQA · 2018-01-30T13:14:40Z

Test build #86827 has finished for PR 20435 at commit d272952.

This patch fails to build.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2018-01-30T14:20:21Z

Test build #86830 has finished for PR 20435 at commit f451194.

This patch fails to build.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2018-01-30T16:46:26Z

cc @jose-torres

SparkQA · 2018-01-30T17:30:20Z

Test build #86832 has finished for PR 20435 at commit 0b6b59e.

This patch fails PySpark unit tests.
This patch merges cleanly.
This patch adds no public classes.

gatorsmile · 2018-01-30T18:04:57Z

retest this please

gatorsmile · 2018-01-30T18:08:21Z

cc @zsxwing @marmbrus too

jose-torres · 2018-01-30T18:09:27Z

Streaming part LGTM; I have no particular opinion or context on the distribution stuff.

gengliangwang · 2018-01-30T18:12:29Z

retest this please

gatorsmile · 2018-01-30T18:12:36Z

LGTM to adding the new package of partitioning/distribution.

zsxwing · 2018-01-30T18:13:19Z

cc @marmbrus

SparkQA · 2018-01-30T21:26:36Z

Test build #86839 has finished for PR 20435 at commit 0b6b59e.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2018-01-30T21:26:56Z

Test build #86838 has finished for PR 20435 at commit 0b6b59e.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2018-01-30T21:41:45Z

Test build #86840 has finished for PR 20435 at commit 0b6b59e.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2018-01-31T03:00:59Z

external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSourceOffset.scala

@@ -20,14 +20,16 @@ package org.apache.spark.sql.kafka010
 import org.apache.kafka.common.TopicPartition

 import org.apache.spark.sql.execution.streaming.{Offset, SerializedOffset}
-import org.apache.spark.sql.sources.v2.streaming.reader.{Offset => OffsetV2, PartitionOffset}
+import org.apache.spark.sql.sources.v2.reader.streaming
+import org.apache.spark.sql.sources.v2.reader.streaming.PartitionOffset


can we keep it same as before?

import org.apache.spark.sql.sources.v2.reader.streaming.{Offset => OffsetV2, PartitionOffset}

ueshin · 2018-01-31T04:58:05Z

external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSourceOffset.scala

@@ -20,14 +20,16 @@ package org.apache.spark.sql.kafka010
 import org.apache.kafka.common.TopicPartition

 import org.apache.spark.sql.execution.streaming.{Offset, SerializedOffset}
-import org.apache.spark.sql.sources.v2.streaming.reader.{Offset => OffsetV2, PartitionOffset}
+import org.apache.spark.sql.sources.v2.reader.streaming
+import org.apache.spark.sql.sources.v2.reader.streaming.PartitionOffset

 /**
 * An [[Offset]] for the [[KafkaSource]]. This one tracks all partitions of subscribed topics and


Should this Offset be streaming.Offset?

ueshin · 2018-01-31T05:00:07Z

sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/sources/memoryV2.scala

-import org.apache.spark.sql.sources.v2.streaming.StreamWriteSupport
-import org.apache.spark.sql.sources.v2.streaming.writer.StreamWriter
-import org.apache.spark.sql.sources.v2.writer._
+import org.apache.spark.sql.sources.v2.writer.{StreamWriteSupport, _}


import org.apache.spark.sql.sources.v2.writer._?

cloud-fan · 2018-01-31T13:16:51Z

external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSourceOffset.scala


 /**
 * An [[Offset]] for the [[KafkaSource]]. This one tracks all partitions of subscribed topics and
 * their offsets.
 */
 private[kafka010]
-case class KafkaSourceOffset(partitionToOffsets: Map[TopicPartition, Long]) extends OffsetV2 {
+case class KafkaSourceOffset(partitionToOffsets: Map[TopicPartition, Long])
+  extends OffsetV2 {


unnecessary change?

cloud-fan · 2018-01-31T13:18:15Z

sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MicroBatchExecution.scala

@@ -403,7 +403,7 @@ class MicroBatchExecution(
          val current = committedOffsets.get(reader).map(off => reader.deserializeOffset(off.json))
          reader.setOffsetRange(
            toJava(current),
-            Optional.of(available.asInstanceOf[OffsetV2]))
+            Optional.of(available.asInstanceOf[streaming.Offset]))


shall we still use OffsetV2?

SparkQA · 2018-01-31T16:31:02Z

Test build #86873 has finished for PR 20435 at commit e609060.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2018-01-31T17:29:12Z

Test build #86879 has finished for PR 20435 at commit 1d90cf1.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
case class KafkaSourceOffset(partitionToOffsets: Map[TopicPartition, Long]) extends OffsetV2

cloud-fan · 2018-02-01T02:57:13Z

LGTM, also cc @rdblue

gatorsmile · 2018-02-01T04:33:23Z

Thanks! Merged to master/2.3

## What changes were proposed in this pull request? 1. create a new package for partitioning/distribution related classes. As Spark will add new concrete implementations of `Distribution` in new releases, it is good to have a new package for partitioning/distribution related classes. 2. move streaming related class to package `org.apache.spark.sql.sources.v2.reader/writer.streaming`, instead of `org.apache.spark.sql.sources.v2.streaming.reader/writer`. So that the there won't be package reader/writer inside package streaming, which is quite confusing. Before change: ``` v2 ├── reader ├── streaming │ ├── reader │ └── writer └── writer ``` After change: ``` v2 ├── reader │ └── streaming └── writer └── streaming ``` ## How was this patch tested? Unit test. Author: Wang Gengliang <[email protected]> Closes #20435 from gengliangwang/new_pkg. (cherry picked from commit 56ae326) Signed-off-by: gatorsmile <[email protected]>

## What changes were proposed in this pull request? This is a followup of #20435. While reorganizing the packages for streaming data source v2, the top level stream read/write support interfaces should not be in the reader/writer package, but should be in the `sources.v2` package, to follow the `ReadSupport`, `WriteSupport`, etc. ## How was this patch tested? N/A Author: Wenchen Fan <[email protected]> Closes #20509 from cloud-fan/followup. (cherry picked from commit a75f927) Signed-off-by: Wenchen Fan <[email protected]>

## What changes were proposed in this pull request? This is a followup of #20435. While reorganizing the packages for streaming data source v2, the top level stream read/write support interfaces should not be in the reader/writer package, but should be in the `sources.v2` package, to follow the `ReadSupport`, `WriteSupport`, etc. ## How was this patch tested? N/A Author: Wenchen Fan <[email protected]> Closes #20509 from cloud-fan/followup.

## What changes were proposed in this pull request? This is a followup of apache#20435. While reorganizing the packages for streaming data source v2, the top level stream read/write support interfaces should not be in the reader/writer package, but should be in the `sources.v2` package, to follow the `ReadSupport`, `WriteSupport`, etc. ## How was this patch tested? N/A Author: Wenchen Fan <[email protected]> Closes apache#20509 from cloud-fan/followup.

gengliangwang added 3 commits January 30, 2018 20:41

create a new package for partitioning/distribution related classes

3e6955b

re-org streaming

d1c4fcb

Fix

c7c0a1d

gengliangwang force-pushed the new_pkg branch from 17f8a5e to c7c0a1d Compare January 30, 2018 12:50

Fix

d272952

Fix

f451194

Fix

0b6b59e

cloud-fan reviewed Jan 31, 2018

View reviewed changes

ueshin reviewed Jan 31, 2018

View reviewed changes

address comments

e609060

cloud-fan reviewed Jan 31, 2018

View reviewed changes

Address comments

1d90cf1

asfgit closed this in 56ae326 Feb 1, 2018

cloud-fan mentioned this pull request Feb 5, 2018

[SPARK-23268][SQL][followup] Reorganize packages in data source V2 #20509

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-23268][SQL]Reorganize packages in data source V2 #20435

[SPARK-23268][SQL]Reorganize packages in data source V2 #20435

gengliangwang commented Jan 30, 2018 •

edited

Loading

SparkQA commented Jan 30, 2018

SparkQA commented Jan 30, 2018

SparkQA commented Jan 30, 2018

SparkQA commented Jan 30, 2018

SparkQA commented Jan 30, 2018

SparkQA commented Jan 30, 2018

cloud-fan commented Jan 30, 2018

SparkQA commented Jan 30, 2018

gatorsmile commented Jan 30, 2018

gatorsmile commented Jan 30, 2018

jose-torres commented Jan 30, 2018

gengliangwang commented Jan 30, 2018

gatorsmile commented Jan 30, 2018

zsxwing commented Jan 30, 2018

SparkQA commented Jan 30, 2018

SparkQA commented Jan 30, 2018

SparkQA commented Jan 30, 2018

cloud-fan Jan 31, 2018

ueshin Jan 31, 2018

ueshin Jan 31, 2018

cloud-fan Jan 31, 2018

cloud-fan Jan 31, 2018

SparkQA commented Jan 31, 2018

SparkQA commented Jan 31, 2018

cloud-fan commented Feb 1, 2018

gatorsmile commented Feb 1, 2018

[SPARK-23268][SQL]Reorganize packages in data source V2 #20435

[SPARK-23268][SQL]Reorganize packages in data source V2 #20435

Conversation

gengliangwang commented Jan 30, 2018 • edited Loading

What changes were proposed in this pull request?

How was this patch tested?

SparkQA commented Jan 30, 2018

SparkQA commented Jan 30, 2018

SparkQA commented Jan 30, 2018

SparkQA commented Jan 30, 2018

SparkQA commented Jan 30, 2018

SparkQA commented Jan 30, 2018

cloud-fan commented Jan 30, 2018

SparkQA commented Jan 30, 2018

gatorsmile commented Jan 30, 2018

gatorsmile commented Jan 30, 2018

jose-torres commented Jan 30, 2018

gengliangwang commented Jan 30, 2018

gatorsmile commented Jan 30, 2018

zsxwing commented Jan 30, 2018

SparkQA commented Jan 30, 2018

SparkQA commented Jan 30, 2018

SparkQA commented Jan 30, 2018

cloud-fan Jan 31, 2018

Choose a reason for hiding this comment

ueshin Jan 31, 2018

Choose a reason for hiding this comment

ueshin Jan 31, 2018

Choose a reason for hiding this comment

cloud-fan Jan 31, 2018

Choose a reason for hiding this comment

cloud-fan Jan 31, 2018

Choose a reason for hiding this comment

SparkQA commented Jan 31, 2018

SparkQA commented Jan 31, 2018

cloud-fan commented Feb 1, 2018

gatorsmile commented Feb 1, 2018

gengliangwang commented Jan 30, 2018 •

edited

Loading