-
Notifications
You must be signed in to change notification settings - Fork 28.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-23268][SQL]Reorganize packages in data source V2 #20435
Conversation
Test build #86814 has finished for PR 20435 at commit
|
Test build #86817 has finished for PR 20435 at commit
|
Test build #86820 has finished for PR 20435 at commit
|
17f8a5e
to
c7c0a1d
Compare
Test build #86824 has finished for PR 20435 at commit
|
Test build #86827 has finished for PR 20435 at commit
|
Test build #86830 has finished for PR 20435 at commit
|
cc @jose-torres |
Test build #86832 has finished for PR 20435 at commit
|
retest this please |
Streaming part LGTM; I have no particular opinion or context on the distribution stuff. |
retest this please |
LGTM to adding the new package of partitioning/distribution. |
cc @marmbrus |
Test build #86839 has finished for PR 20435 at commit
|
Test build #86838 has finished for PR 20435 at commit
|
Test build #86840 has finished for PR 20435 at commit
|
@@ -20,14 +20,16 @@ package org.apache.spark.sql.kafka010 | |||
import org.apache.kafka.common.TopicPartition | |||
|
|||
import org.apache.spark.sql.execution.streaming.{Offset, SerializedOffset} | |||
import org.apache.spark.sql.sources.v2.streaming.reader.{Offset => OffsetV2, PartitionOffset} | |||
import org.apache.spark.sql.sources.v2.reader.streaming | |||
import org.apache.spark.sql.sources.v2.reader.streaming.PartitionOffset |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we keep it same as before?
import org.apache.spark.sql.sources.v2.reader.streaming.{Offset => OffsetV2, PartitionOffset}
@@ -20,14 +20,16 @@ package org.apache.spark.sql.kafka010 | |||
import org.apache.kafka.common.TopicPartition | |||
|
|||
import org.apache.spark.sql.execution.streaming.{Offset, SerializedOffset} | |||
import org.apache.spark.sql.sources.v2.streaming.reader.{Offset => OffsetV2, PartitionOffset} | |||
import org.apache.spark.sql.sources.v2.reader.streaming | |||
import org.apache.spark.sql.sources.v2.reader.streaming.PartitionOffset | |||
|
|||
/** | |||
* An [[Offset]] for the [[KafkaSource]]. This one tracks all partitions of subscribed topics and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this Offset
be streaming.Offset
?
import org.apache.spark.sql.sources.v2.streaming.StreamWriteSupport | ||
import org.apache.spark.sql.sources.v2.streaming.writer.StreamWriter | ||
import org.apache.spark.sql.sources.v2.writer._ | ||
import org.apache.spark.sql.sources.v2.writer.{StreamWriteSupport, _} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
import org.apache.spark.sql.sources.v2.writer._
?
|
||
/** | ||
* An [[Offset]] for the [[KafkaSource]]. This one tracks all partitions of subscribed topics and | ||
* their offsets. | ||
*/ | ||
private[kafka010] | ||
case class KafkaSourceOffset(partitionToOffsets: Map[TopicPartition, Long]) extends OffsetV2 { | ||
case class KafkaSourceOffset(partitionToOffsets: Map[TopicPartition, Long]) | ||
extends OffsetV2 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unnecessary change?
@@ -403,7 +403,7 @@ class MicroBatchExecution( | |||
val current = committedOffsets.get(reader).map(off => reader.deserializeOffset(off.json)) | |||
reader.setOffsetRange( | |||
toJava(current), | |||
Optional.of(available.asInstanceOf[OffsetV2])) | |||
Optional.of(available.asInstanceOf[streaming.Offset])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shall we still use OffsetV2
?
Test build #86873 has finished for PR 20435 at commit
|
Test build #86879 has finished for PR 20435 at commit
|
LGTM, also cc @rdblue |
Thanks! Merged to master/2.3 |
## What changes were proposed in this pull request? 1. create a new package for partitioning/distribution related classes. As Spark will add new concrete implementations of `Distribution` in new releases, it is good to have a new package for partitioning/distribution related classes. 2. move streaming related class to package `org.apache.spark.sql.sources.v2.reader/writer.streaming`, instead of `org.apache.spark.sql.sources.v2.streaming.reader/writer`. So that the there won't be package reader/writer inside package streaming, which is quite confusing. Before change: ``` v2 ├── reader ├── streaming │ ├── reader │ └── writer └── writer ``` After change: ``` v2 ├── reader │ └── streaming └── writer └── streaming ``` ## How was this patch tested? Unit test. Author: Wang Gengliang <[email protected]> Closes #20435 from gengliangwang/new_pkg. (cherry picked from commit 56ae326) Signed-off-by: gatorsmile <[email protected]>
## What changes were proposed in this pull request? This is a followup of #20435. While reorganizing the packages for streaming data source v2, the top level stream read/write support interfaces should not be in the reader/writer package, but should be in the `sources.v2` package, to follow the `ReadSupport`, `WriteSupport`, etc. ## How was this patch tested? N/A Author: Wenchen Fan <[email protected]> Closes #20509 from cloud-fan/followup. (cherry picked from commit a75f927) Signed-off-by: Wenchen Fan <[email protected]>
## What changes were proposed in this pull request? This is a followup of #20435. While reorganizing the packages for streaming data source v2, the top level stream read/write support interfaces should not be in the reader/writer package, but should be in the `sources.v2` package, to follow the `ReadSupport`, `WriteSupport`, etc. ## How was this patch tested? N/A Author: Wenchen Fan <[email protected]> Closes #20509 from cloud-fan/followup.
## What changes were proposed in this pull request? This is a followup of apache#20435. While reorganizing the packages for streaming data source v2, the top level stream read/write support interfaces should not be in the reader/writer package, but should be in the `sources.v2` package, to follow the `ReadSupport`, `WriteSupport`, etc. ## How was this patch tested? N/A Author: Wenchen Fan <[email protected]> Closes apache#20509 from cloud-fan/followup.
What changes were proposed in this pull request?
create a new package for partitioning/distribution related classes.
As Spark will add new concrete implementations of
Distribution
in new releases, it is good tohave a new package for partitioning/distribution related classes.
move streaming related class to package
org.apache.spark.sql.sources.v2.reader/writer.streaming
, instead oforg.apache.spark.sql.sources.v2.streaming.reader/writer
.So that the there won't be package reader/writer inside package streaming, which is quite confusing.
Before change:
After change:
How was this patch tested?
Unit test.