Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-4608][Streaming] Reorganize StreamingContext implicit to improve API convenience #3464

Closed
wants to merge 5 commits into from

Conversation

zsxwing
Copy link
Member

@zsxwing zsxwing commented Nov 26, 2014

There is only one implicit function toPairDStreamFunctions in StreamingContext. This PR did similar reorganization like SPARK-4397.

Compiled the following codes with Spark Streaming 1.1.0 and ran it with this PR. Everything is fine.

import org.apache.spark._
import org.apache.spark.streaming._
import org.apache.spark.streaming.StreamingContext._

object StreamingApp {

  def main(args: Array[String]) {
    val conf = new SparkConf().setMaster("local[2]").setAppName("FileWordCount")
    val ssc = new StreamingContext(conf, Seconds(10))
    val lines = ssc.textFileStream("/some/path")
    val words = lines.flatMap(_.split(" "))
    val pairs = words.map(word => (word, 1))
    val wordCounts = pairs.reduceByKey(_ + _)
    wordCounts.print()

    ssc.start()
    ssc.awaitTermination()
  }
}

@zsxwing
Copy link
Member Author

zsxwing commented Nov 26, 2014

/cc @tdas

@SparkQA
Copy link

SparkQA commented Nov 26, 2014

Test build #23864 has started for PR 3464 at commit c15162c.

  • This patch merges cleanly.

@tdas
Copy link
Contributor

tdas commented Nov 26, 2014

Could you add a test to test that this actually works without importing StreamignContext._ ?
Also it would be great if all the StreamingContext._ in different files can be removed, which would make this more reliable. I am not planning to backport this to 1.2- so this change should be fine.

@zsxwing
Copy link
Member Author

zsxwing commented Nov 26, 2014

Could you add a test to test that this actually works without importing StreamignContext._ ?

Do you mean streaming/src/test/scala/org/apache/spark/streamingtest/ImplicitSuite.scala in this PR?

@zsxwing
Copy link
Member Author

zsxwing commented Nov 26, 2014

Also it would be great if all the StreamingContext._ in different files can be removed, which would make this more reliable.

Done.

@@ -66,7 +66,6 @@ main entry point for all streaming functionality. We create a local StreamingCon
{% highlight scala %}
import org.apache.spark._
import org.apache.spark.streaming._
import org.apache.spark.streaming.StreamingContext._
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, on second thought, lets not remove it from the programming guide. People might see the programmign guide of one version and try it on an older version. I would say, key this, but add a comment not necessary in Spark 1.3+

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about now?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

@SparkQA
Copy link

SparkQA commented Nov 26, 2014

Test build #23871 has started for PR 3464 at commit 27833bb.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Nov 26, 2014

Test build #23872 has started for PR 3464 at commit e6f9cc9.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Nov 26, 2014

Test build #23864 has finished for PR 3464 at commit c15162c.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class CompressedSerializer(FramedSerializer):

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23864/
Test FAILed.

@zsxwing
Copy link
Member Author

zsxwing commented Nov 26, 2014

This patch adds the following public classes (experimental):
class CompressedSerializer(FramedSerializer):

Some bug in the detecting codes? This PR doesn't add any class.

@SparkQA
Copy link

SparkQA commented Nov 26, 2014

Test build #23871 has finished for PR 3464 at commit 27833bb.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23871/
Test PASSed.

@SparkQA
Copy link

SparkQA commented Nov 26, 2014

Test build #23872 has finished for PR 3464 at commit e6f9cc9.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23872/
Test PASSed.

Conflicts:
	streaming/src/main/scala/org/apache/spark/streaming/dstream/PairDStreamFunctions.scala
	streaming/src/test/scala/org/apache/spark/streaming/CheckpointSuite.scala
@SparkQA
Copy link

SparkQA commented Nov 26, 2014

Test build #23887 has started for PR 3464 at commit f74c190.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Nov 26, 2014

Test build #23887 has finished for PR 3464 at commit f74c190.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class MatrixFactorizationModel(

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23887/
Test PASSed.

@zsxwing
Copy link
Member Author

zsxwing commented Dec 3, 2014

ping @tdas

1 similar comment
@zsxwing
Copy link
Member Author

zsxwing commented Dec 25, 2014

ping @tdas

def mockDStream[T]: org.apache.spark.streaming.dstream.DStream[T] = null

def testToPairDStreamFunctions(): Unit = {
val rdd: org.apache.spark.streaming.dstream.DStream[(Int, Int)] = mockDStream
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this a copy-paste error ;) should be named dstream instead of rdd.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. Done.

@tdas
Copy link
Contributor

tdas commented Dec 26, 2014

LGTM, except one comment.

@SparkQA
Copy link

SparkQA commented Dec 26, 2014

Test build #24826 has started for PR 3464 at commit aa6d44a.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Dec 26, 2014

Test build #24826 has finished for PR 3464 at commit aa6d44a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24826/
Test PASSed.

@tdas
Copy link
Contributor

tdas commented Dec 26, 2014

Merging this. Thanks @zsxwing!

@asfgit asfgit closed this in f9ed2b6 Dec 26, 2014
@zsxwing zsxwing deleted the SPARK-4608 branch December 26, 2014 03:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants