Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SPARK-22896 Improvement in String interpolation #20070

Closed
wants to merge 12 commits into from

Conversation

chetkhatri
Copy link
Contributor

What changes were proposed in this pull request?

  • String interpolation in ml pipeline example has been corrected as per scala standard.

How was this patch tested?

  • manually tested.

@srowen
Copy link
Member

srowen commented Dec 26, 2017

OK, we can do this for consistency. In general we don't spend a ton of time on changes like this, because there is already way too much else to review. Please make one change only (this one) that updates all examples in one go.

@chetkhatri
Copy link
Contributor Author

@srowen Absolutely correct, this all in one shot. I did changes in all.

@chetkhatri
Copy link
Contributor Author

@srowen also i did merge another similiar PR with graphx to here. so Just FYI - we are good.

@srowen
Copy link
Member

srowen commented Dec 26, 2017

I think there are many more examples of this; a quick search suggests there are about 40 example files that have some string concatenation.

@chetkhatri
Copy link
Contributor Author

In scala ? I don't think so. I am re-iterating and doing double check.

@chetkhatri
Copy link
Contributor Author

You're correct - I missed other packages. I will re-confirm soon. Thanks.

@chetkhatri
Copy link
Contributor Author

@srowen I rechecked all scala examples and this is commulative PR for the same.

Copy link
Member

@srowen srowen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few small comments here but looking fine as an improvement. Also have a look for more old .format() calls if you like; we could update those.

@@ -145,9 +145,9 @@ object Analytics extends Logging {
// TriangleCount requires the graph to be partitioned
.partitionBy(partitionStrategy.getOrElse(RandomVertexCut)).cache()
val triangles = TriangleCount.run(graph)
println("Triangles: " + triangles.vertices.map {
println(s"Triangles: ${triangles.vertices.map {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is probably more readable if the expression is stored in a local val first. Anything nontrivial like this gets hard to parse in an interpolated string. While we're here, worth fixing up.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure

println("pValues = " + chi.getAs[Vector](0))
println("degreesOfFreedom = " + chi.getSeq[Int](1).mkString("[", ",", "]"))
println("statistics = " + chi.getAs[Vector](2))
println(s"pValues = ${chi.getAs[Vector](0)}")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is OK; anything more complex I might suggest breaking out the expression into a val.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok

@@ -51,10 +51,10 @@ object CorrelationExample {

val df = data.map(Tuple1.apply).toDF("features")
val Row(coeff1: Matrix) = Correlation.corr(df, "features").head
println("Pearson correlation matrix:\n" + coeff1.toString)
println(s"Pearson correlation matrix:\n ${coeff1.toString}")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another thing we could improve: .toString is redundant here I believe

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed.

@@ -169,10 +169,10 @@ private class MyLogisticRegressionModel(
Vectors.dense(-margin, margin)
}

/** Number of classes the label can take. 2 indicates binary classification. */
// Number of classes the label can take. 2 indicates binary classification.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, good to make this a standard comment, not scaladoc style

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@@ -31,12 +31,11 @@ object QuantileDiscretizerExample {

// $example on$
val data = Array((0, 18.0), (1, 19.0), (2, 8.0), (3, 5.0), (4, 2.2))
val df = spark.createDataFrame(data).toDF("id", "hour")
val df = spark.createDataFrame(data).toDF("id", "hour").repartition(1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although it looks weird, I think the author intended the repartition(1) to not appear in the body of the example that's copied into the docs. I wouldn't change this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

@@ -82,9 +82,9 @@ class CustomReceiver(host: String, port: Int)
var socket: Socket = null
var userInput: String = null
try {
logInfo("Connecting to " + host + ":" + port)
logInfo(s"Connecting to $host $port")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: we could make the string consistent with the one two lines below by adding a colon

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, addressed

@SparkQA
Copy link

SparkQA commented Dec 28, 2017

Test build #4026 has finished for PR 20070 at commit 5507cad.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

}
// $example off$

sc.stop()
}

}
// scalastyle:on println
// scalastyle:on println
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It didn't like the lack of newline at the end of this file

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, addressed.

Copy link
Contributor Author

@chetkhatri chetkhatri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@srowen apologize for delay :) I have addressed suggestion. thanks

}
// $example off$

sc.stop()
}

}
// scalastyle:on println
// scalastyle:on println
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, addressed.

@@ -82,9 +82,9 @@ class CustomReceiver(host: String, port: Int)
var socket: Socket = null
var userInput: String = null
try {
logInfo("Connecting to " + host + ":" + port)
logInfo(s"Connecting to $host $port")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, addressed

@SparkQA
Copy link

SparkQA commented Dec 29, 2017

Test build #4028 has finished for PR 20070 at commit 79e6789.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@chetkhatri
Copy link
Contributor Author

@srowen please do re-run the build.

@SparkQA
Copy link

SparkQA commented Dec 29, 2017

Test build #4029 has finished for PR 20070 at commit 70ce734.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

println(s"Chose ${categoricalFeatures.size} categorical features: " +
categoricalFeatures.mkString(", "))
println(s"Chose ${categoricalFeatures.size} " +
s"categorical features: {$categoricalFeatures.mkString(", ")}")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo here. Please make sure build/test/style passes locally before pushing again, or else this takes a lot of work to review.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did fixed this. Can you please give me steps as a check list before commit for test.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

./dev/run-tests

@SparkQA
Copy link

SparkQA commented Dec 30, 2017

Test build #4031 has finished for PR 20070 at commit 0321faf.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@chetkhatri
Copy link
Contributor Author

@srowen Okey. current status looks good

@@ -45,7 +45,7 @@ object QuantileDiscretizerExample {
.setNumBuckets(3)

val result = discretizer.fit(df).transform(df)
result.show()
result.show(false)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more question - is it necessary to make this not truncate?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're following same style in other examples so it is good to do.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which other examples? most do not set this, and the Java equivalent doesn't either. If there's a good reason that the output needs to be untruncated, that's fine, just also change the Java example.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@srowen correct either way it works for ex. examples/ml/LDAExamples.scala

@@ -68,7 +68,7 @@ object HypothesisTestingExample {
// against the label.
val featureTestResults: Array[ChiSqTestResult] = Statistics.chiSqTest(obs)
featureTestResults.zipWithIndex.foreach { case (k, v) =>
println("Column " + (v + 1).toString + ":")
println(s"Column ${(v + 1).toString} :")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.toString is redundant here and elsewhere with interpolation. I think that should be simplified.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@srowen Thanks , Changes addressed

print("Topic " + topic + ":")
for (word <- Range(0, ldaModel.vocabSize)) { print(" " + topics(word, topic)); }
print(s"Topic $topic :")
for (word <- Range(0, ldaModel.vocabSize)) { print(s" ${topics(word, topic)}") }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Go ahead and put the print on a new line (I know it wasn't before)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@srowen Thanks for suggestion, it has been addressed.

Copy link
Member

@srowen srowen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still more instances of .toString. Just search on the changes page, even, to see them.

@@ -45,7 +45,7 @@ object QuantileDiscretizerExample {
.setNumBuckets(3)

val result = discretizer.fit(df).transform(df)
result.show()
result.show(false)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which other examples? most do not set this, and the Java equivalent doesn't either. If there's a good reason that the output needs to be untruncated, that's fine, just also change the Java example.

@@ -46,7 +46,10 @@ object LatentDirichletAllocationExample {
val topics = ldaModel.topicsMatrix
for (topic <- Range(0, 3)) {
print(s"Topic $topic :")
for (word <- Range(0, ldaModel.vocabSize)) { print(s" ${topics(word, topic)}") }
for (word <- Range(0, ldaModel.vocabSize))
{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, we put the open brace on the preceding line with for, and don't triple indent. See any other for loop in the code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@srowen sure done.

@chetkhatri
Copy link
Contributor Author

@srowen Request for review when you get on this.

@SparkQA
Copy link

SparkQA commented Jan 3, 2018

Test build #4033 has finished for PR 20070 at commit e891f53.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@srowen
Copy link
Member

srowen commented Jan 3, 2018

Merged to master/2.3

@asfgit asfgit closed this in 9a2b65a Jan 3, 2018
asfgit pushed a commit that referenced this pull request Jan 3, 2018
## What changes were proposed in this pull request?

* String interpolation in ml pipeline example has been corrected as per scala standard.

## How was this patch tested?
* manually tested.

Author: chetkhatri <[email protected]>

Closes #20070 from chetkhatri/mllib-chetan-contrib.

(cherry picked from commit 9a2b65a)
Signed-off-by: Sean Owen <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants