-
Notifications
You must be signed in to change notification settings - Fork 28.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SPARK-1462: Examples of ML algorithms are using deprecated APIs #416
Conversation
Can one of the admins verify this patch? |
@techaddict Thanks for work on this JIRA! Since we try to hide breeze types in MLlib, I'm not sure whether we should use breeze vectors directly in examples. We might choose either using breeze vectors in examples and leaving a note about their usage in MLlib, or implementing necessary operations in MLlib's vectors to be used in examples. I prefer the former given the time frame. @mateiz what do you think? |
@mengxr Ya i was thinking that too, as eventually we'll need function's like squaredDist(in KMeans Examples) implemented in mllib. |
@techaddict Oh, this is good. I was just starting work on a very similar PR. For what it's worth, my changes looked very similar, so +1. (What about classes like @mengxr The issue with continuing to use the MLlib |
@srowen Sorry for the confusion. |
@srowen i think we need to add implementation some additional function's to |
@mengxr Ah right, I understand. Yeah best to take the opportunity to add those methods to the façade I think, since there will be other needs for those methods undoubtedly. @techaddict looks to have this well in hand but let me know if I can pitch in anywhere. |
@@ -18,17 +18,17 @@ | |||
package org.apache.spark.examples | |||
|
|||
import java.util.Random | |||
import org.apache.spark.util.Vector | |||
import breeze.linalg.{Vector => BV, DenseVector => BDV} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's not rename them in this PR, it might just confuse people. If you import Vector it should take precedence over the scala.Vector class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ya right. Fixed
@mengxr for now is this ok ? |
@@ -58,10 +59,21 @@ object LocalKMeans { | |||
bestIndex | |||
} | |||
|
|||
// TODO: Make this a part of the Vector | |||
def squaredDist(a: Vector[Double], b: Vector[Double]): Double = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
breeze has squaredDistance
operator.
@techaddict Most of the comments are not related to the change here. But it would be great if you can update them in the next pass. Also, the imports are not organized in the examples. Could you fix them as well? Thanks a lot! |
@mengxr fixed imports, for those not related to this PR, should i create a new PR ? |
No, those are minors. It is okay to leave them untouched if you don't have bandwidth. |
@mengxr ok, is there anything else to do on this PR ? |
No. Just tested with |
Jenkins, test this please. |
Merged build triggered. |
Merged build started. |
Merged build finished. All automated tests passed. |
All automated tests passed. |
@mateiz this is ready to be merged. |
Merged this, thanks! |
This will also fix SPARK-1464: Update MLLib Examples to Use Breeze. Author: Sandeep <[email protected]> Closes #416 from techaddict/1462 and squashes the following commits: a43638e [Sandeep] Some Style Changes 3ce69c3 [Sandeep] Fix Ordering and Naming of Imports in Examples 6c7e543 [Sandeep] SPARK-1462: Examples of ML algorithms are using deprecated APIs (cherry picked from commit 6ad4c54) Signed-off-by: Matei Zaharia <[email protected]>
@techaddict thanks for doing this! |
@pwendell 😄 i think this also closes SPARK-1464 ? |
Removed unnecessary DStream operations and updated docs Removed StreamingContext.registerInputStream and registerOutputStream - they were useless. InputDStream has been made to register itself, and just registering a DStream as output stream cause RDD objects to be created but the RDDs will not be computed at all.. Also made DStream.register() private[streaming] for the same reasons. Updated docs, specially added package documentation for streaming package. Also, changed NetworkWordCount's input storage level to use MEMORY_ONLY, replication on the local machine causes warning messages (as replication fails) which is scary for a new user trying out his/her first example.
This will also fix SPARK-1464: Update MLLib Examples to Use Breeze. Author: Sandeep <[email protected]> Closes apache#416 from techaddict/1462 and squashes the following commits: a43638e [Sandeep] Some Style Changes 3ce69c3 [Sandeep] Fix Ordering and Naming of Imports in Examples 6c7e543 [Sandeep] SPARK-1462: Examples of ML algorithms are using deprecated APIs
"make bin" to avoid "make dev" error in released packer version
This will also fix SPARK-1464: Update MLLib Examples to Use Breeze.