Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SPARK-1462: Examples of ML algorithms are using deprecated APIs #416

Closed
wants to merge 3 commits into from

Conversation

techaddict
Copy link
Contributor

This will also fix SPARK-1464: Update MLLib Examples to Use Breeze.

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@mengxr
Copy link
Contributor

mengxr commented Apr 15, 2014

@techaddict Thanks for work on this JIRA! Since we try to hide breeze types in MLlib, I'm not sure whether we should use breeze vectors directly in examples. We might choose either using breeze vectors in examples and leaving a note about their usage in MLlib, or implementing necessary operations in MLlib's vectors to be used in examples. I prefer the former given the time frame. @mateiz what do you think?

@techaddict
Copy link
Contributor Author

@mengxr Ya i was thinking that too, as eventually we'll need function's like squaredDist(in KMeans Examples) implemented in mllib.
btw i would love to work on improving the current state of vectors.

@srowen
Copy link
Member

srowen commented Apr 15, 2014

@techaddict Oh, this is good. I was just starting work on a very similar PR. For what it's worth, my changes looked very similar, so +1. (What about classes like LocalKMeans though? I think there are more to change)

@mengxr The issue with continuing to use the MLlib Vector is that the whole thing is deprecated, no? Even if you pushed these operations into it, it would still generate warnings. The deprecation comment implied that breeze vectors should be used directly.

@mengxr
Copy link
Contributor

mengxr commented Apr 15, 2014

@srowen Sorry for the confusion. util.Vector is deprecated, but linalg.Vector is new. I meant adding necessary operations to linalg.Vector and using it in examples instead of breeze vectors.

@techaddict
Copy link
Contributor Author

@srowen i think we need to add implementation some additional function's to linalg.Vector like squaredDist (supported by util.Vector)

@srowen
Copy link
Member

srowen commented Apr 15, 2014

@mengxr Ah right, I understand. Yeah best to take the opportunity to add those methods to the façade I think, since there will be other needs for those methods undoubtedly. @techaddict looks to have this well in hand but let me know if I can pitch in anywhere.

@@ -18,17 +18,17 @@
package org.apache.spark.examples

import java.util.Random
import org.apache.spark.util.Vector
import breeze.linalg.{Vector => BV, DenseVector => BDV}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's not rename them in this PR, it might just confuse people. If you import Vector it should take precedence over the scala.Vector class.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ya right. Fixed

@techaddict
Copy link
Contributor Author

@mengxr for now is this ok ?

@@ -58,10 +59,21 @@ object LocalKMeans {
bestIndex
}

// TODO: Make this a part of the Vector
def squaredDist(a: Vector[Double], b: Vector[Double]): Double = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

breeze has squaredDistance operator.

@mengxr
Copy link
Contributor

mengxr commented Apr 16, 2014

@techaddict Most of the comments are not related to the change here. But it would be great if you can update them in the next pass. Also, the imports are not organized in the examples. Could you fix them as well? Thanks a lot!

@techaddict
Copy link
Contributor Author

@mengxr fixed imports, for those not related to this PR, should i create a new PR ?

@mengxr
Copy link
Contributor

mengxr commented Apr 16, 2014

No, those are minors. It is okay to leave them untouched if you don't have bandwidth.

@techaddict
Copy link
Contributor Author

@mengxr ok, is there anything else to do on this PR ?

@mengxr
Copy link
Contributor

mengxr commented Apr 16, 2014

No. Just tested with sbt/sbt examples/compile and didn't see any warning messages. Great job!

@mengxr
Copy link
Contributor

mengxr commented Apr 16, 2014

Jenkins, test this please.

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@AmplabJenkins
Copy link

Merged build finished. All automated tests passed.

@AmplabJenkins
Copy link

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14168/

@techaddict
Copy link
Contributor Author

@mateiz this is ready to be merged.

@mateiz
Copy link
Contributor

mateiz commented Apr 17, 2014

Merged this, thanks!

@asfgit asfgit closed this in 6ad4c54 Apr 17, 2014
asfgit pushed a commit that referenced this pull request Apr 17, 2014
This will also fix SPARK-1464: Update MLLib Examples to Use Breeze.

Author: Sandeep <[email protected]>

Closes #416 from techaddict/1462 and squashes the following commits:

a43638e [Sandeep] Some Style Changes
3ce69c3 [Sandeep] Fix Ordering and Naming of Imports in Examples
6c7e543 [Sandeep] SPARK-1462: Examples of ML algorithms are using deprecated APIs

(cherry picked from commit 6ad4c54)
Signed-off-by: Matei Zaharia <[email protected]>
@pwendell
Copy link
Contributor

@techaddict thanks for doing this!

@techaddict
Copy link
Contributor Author

@pwendell 😄 i think this also closes SPARK-1464 ?

pwendell added a commit to pwendell/spark that referenced this pull request May 12, 2014
Removed unnecessary DStream operations and updated docs

Removed StreamingContext.registerInputStream and registerOutputStream - they were useless. InputDStream has been made to register itself, and just registering a DStream as output stream cause RDD objects to be created but the RDDs will not be computed at all.. Also made DStream.register() private[streaming] for the same reasons.

Updated docs, specially added package documentation for streaming package.

Also, changed NetworkWordCount's input storage level to use MEMORY_ONLY, replication on the local machine causes warning messages (as replication fails) which is scary for a new user trying out his/her first example.
pdeyhim pushed a commit to pdeyhim/spark-1 that referenced this pull request Jun 25, 2014
This will also fix SPARK-1464: Update MLLib Examples to Use Breeze.

Author: Sandeep <[email protected]>

Closes apache#416 from techaddict/1462 and squashes the following commits:

a43638e [Sandeep] Some Style Changes
3ce69c3 [Sandeep] Fix Ordering and Naming of Imports in Examples
6c7e543 [Sandeep] SPARK-1462: Examples of ML algorithms are using deprecated APIs
mccheah pushed a commit to mccheah/spark that referenced this pull request Nov 28, 2018
bzhaoopenstack pushed a commit to bzhaoopenstack/spark that referenced this pull request Sep 11, 2019
"make bin" to avoid "make dev" error in released packer version
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants