-
Notifications
You must be signed in to change notification settings - Fork 28.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SPARK-1782: svd for sparse matrix using ARPACK #964
Conversation
copy ARPACK dsaupd/dseupd code from latest breeze change RowMatrix to use sparse SVD change tests for sparse SVD
Can one of the admins verify this patch? |
Jenkins, this is ok to test |
Merged build triggered. |
Merged build started. |
Merged build finished. |
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15436/ |
Merged build triggered. |
Merged build started. |
Merged build finished. |
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15438/ |
@vrilleup It didn't pass the tests. Please check the Jenkins page. |
*/ | ||
private[mllib] def multiplyGramianMatrix(v: Vector): Vector = { | ||
val bv = rows.map{ | ||
row => row.toBreeze * row.toBreeze.dot(v.toBreeze) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would create many temp objects. Please use RDD#aggregate
and axpy
. The initial vector should be a dense vector.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can use the code in computeGramianMatrix
as an example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed to use aggregrate and axpy (from breeze). There are still some boxing/unboxing overhead from/to breeze objects. I can further improve if needed.
Merged build triggered. |
Merged build started. |
* | ||
* The decomposition is computed by providing a function that multiples a vector with A'A to | ||
* ARPACK, and iteratively invoking ARPACK-dsaupd on master node, from which we recover S and V. | ||
* Then we compute U via easy matrix multiplication as U = A * (V * S-1). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"S-1" -> "S^-1" or "S^{-1}"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed. Somehow my intellij treated ^ as a tag and matched it with a ^ in another line. They both got deleted when I was editing the other line...
@@ -494,4 +519,4 @@ object RowMatrix { | |||
|
|||
Matrices.dense(n, n, G.data) | |||
} | |||
} | |||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add a newline at the end.
Merged build triggered. |
Merged build started. |
* @return a dense vector of eigenvalues in descending order and a dense matrix of eigenvectors | ||
* (columns of the matrix). The number of computed eigenvalues might be smaller than k. | ||
*/ | ||
private[mllib] def symmetricEigs(mul: Vector => Vector, n: Int, k: Int, tol: Double) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it implemented in breeze-0.8? If so, please add a TODO here so we will switch to the breeze implementation if we upgrade breeze.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
current implementation in breeze-0.8 is for svd, it computes both U and V in memory. I can refactor that part to move the code block calling ARPACK to eig so only V is computed. TODO added
Merged build finished. |
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15440/ |
Merged build finished. |
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15441/ |
Merged build triggered. |
Merged build finished. |
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16449/ |
Merged build triggered. |
Merged build started. |
@mengxr I tested axpy, there is no visible change in running time. Should be good to merge :) |
Jenkins, retest this please. |
@vrilleup Great! I'll merge it if Jenkins is happy. Thank you for the contribution! |
Merged build triggered. |
Merged build started. |
Merged build finished. |
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16470/ |
Jenkins, retest this please. |
Merged build triggered. |
Merged build started. |
Merged build finished. All automated tests passed. |
All automated tests passed. |
Thanks @vrilleup ! |
Merged. Thanks! |
@mengxr @rezazadeh Thank you! Excited to see this merged :) |
Merged build finished. All automated tests passed. |
All automated tests passed. |
@mengxr I ran one of the large size jobs again, per multiplication time increased to 8 seconds from 2 seconds (the version before yesterday). I don't know which part was the cause (shouldn't be the axpy part because I already compared this). Will submit a patch after identifying the cause. |
copy ARPACK dsaupd/dseupd code from latest breeze change RowMatrix to use sparse SVD change tests for sparse SVD All tests passed. I will run it against some large matrices. Author: Li Pu <[email protected]> Author: Xiangrui Meng <[email protected]> Author: Li Pu <[email protected]> Closes apache#964 from vrilleup/master and squashes the following commits: 7312ec1 [Li Pu] very minor comment fix 4c618e9 [Li Pu] Merge pull request #1 from mengxr/vrilleup-master a461082 [Xiangrui Meng] make superscript show up correctly in doc 861ec48 [Xiangrui Meng] simplify axpy 62969fa [Xiangrui Meng] use BDV directly in symmetricEigs change the computation mode to local-svd, local-eigs, and dist-eigs update tests and docs c273771 [Li Pu] automatically determine SVD compute mode and parameters 7148426 [Li Pu] improve RowMatrix multiply 5543cce [Li Pu] improve svd api 819824b [Li Pu] add flag for dense svd or sparse svd eb15100 [Li Pu] fix binary compatibility 4c7aec3 [Li Pu] improve comments e7850ed [Li Pu] use aggregate and axpy 827411b [Li Pu] fix EOF new line 9c80515 [Li Pu] use non-sparse implementation when k = n fe983b0 [Li Pu] improve scala style 96d2ecb [Li Pu] improve eigenvalue sorting e1db950 [Li Pu] SPARK-1782: svd for sparse matrix using ARPACK
copy ARPACK dsaupd/dseupd code from latest breeze change RowMatrix to use sparse SVD change tests for sparse SVD All tests passed. I will run it against some large matrices. Author: Li Pu <[email protected]> Author: Xiangrui Meng <[email protected]> Author: Li Pu <[email protected]> Closes apache#964 from vrilleup/master and squashes the following commits: 7312ec1 [Li Pu] very minor comment fix 4c618e9 [Li Pu] Merge pull request apache#1 from mengxr/vrilleup-master a461082 [Xiangrui Meng] make superscript show up correctly in doc 861ec48 [Xiangrui Meng] simplify axpy 62969fa [Xiangrui Meng] use BDV directly in symmetricEigs change the computation mode to local-svd, local-eigs, and dist-eigs update tests and docs c273771 [Li Pu] automatically determine SVD compute mode and parameters 7148426 [Li Pu] improve RowMatrix multiply 5543cce [Li Pu] improve svd api 819824b [Li Pu] add flag for dense svd or sparse svd eb15100 [Li Pu] fix binary compatibility 4c7aec3 [Li Pu] improve comments e7850ed [Li Pu] use aggregate and axpy 827411b [Li Pu] fix EOF new line 9c80515 [Li Pu] use non-sparse implementation when k = n fe983b0 [Li Pu] improve scala style 96d2ecb [Li Pu] improve eigenvalue sorting e1db950 [Li Pu] SPARK-1782: svd for sparse matrix using ARPACK
copy ARPACK dsaupd/dseupd code from latest breeze
change RowMatrix to use sparse SVD
change tests for sparse SVD
All tests passed. I will run it against some large matrices.