-
Notifications
You must be signed in to change notification settings - Fork 28.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-1157][MLlib] L-BFGS Optimizer based on Breeze's implementation. #353
Conversation
Merged build triggered. |
Merged build started. |
Merged build finished. |
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/13872/ |
Merged build triggered. |
Merged build started. |
Merged build finished. |
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/13873/ |
Merged build triggered. |
Merged build started. |
Merged build finished. All automated tests passed. |
All automated tests passed. |
@dbtsai Did you compare L-BFGS with MLlib's implementation of GD on some real data sets? |
val miniBatchSize = nexamples * miniBatchFraction | ||
var i = 0 | ||
|
||
val costFun = new DiffFunction[BDV[Double]] { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better create a private class for the cost function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tested the optimizer with several real data, for example, small ones from UCI Machine Learning Repository, and some big data like mnist8m (although the property and stability of optimizer don't depend on the size of dataset), L-BFGS gives the same or better result compared with GD. For some dataset, GD will converge really slow after 40~50 iterations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For cost function, I intend to do it in this way because in the code of cost function, I want to access and modify variables outside the cost function, for example, "i", "lossHistory", and if I create a private class for this, it will be extra effort to achieve this without changing breeze DiffFunction signature.
@mengxr As you suggested, I moved the costFun to private CostFun class. |
Merged build triggered. |
Merged build started. |
Merged build finished. All automated tests passed. |
All automated tests passed. |
|
||
package org.apache.spark.mllib.optimization | ||
|
||
import scala.Array |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Scala imports Array
by default.
Jenkins, retest this please. |
Timeout for lastest jenkins run. It seems that CI is not stable now. |
Merged build triggered. |
Merged build started. |
Merged build finished. |
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14126/ |
Jenkins, retest this please. |
Merged build triggered. |
Merged build started. |
Merged build finished. All automated tests passed. |
All automated tests passed. |
Thanks - merged this! |
This PR uses Breeze's L-BFGS implement, and Breeze dependency has already been introduced by Xiangrui's sparse input format work in SPARK-1212. Nice work, @mengxr ! When use with regularized updater, we need compute the regVal and regGradient (the gradient of regularized part in the cost function), and in the currently updater design, we can compute those two values by the following way. Let's review how updater works when returning newWeights given the input parameters. w' = w - thisIterStepSize * (gradient + regGradient(w)) Note that regGradient is function of w! If we set gradient = 0, thisIterStepSize = 1, then regGradient(w) = w - w' As a result, for regVal, it can be computed by val regVal = updater.compute( weights, new DoubleMatrix(initialWeights.length, 1), 0, 1, regParam)._2 and for regGradient, it can be obtained by val regGradient = weights.sub( updater.compute(weights, new DoubleMatrix(initialWeights.length, 1), 1, 1, regParam)._1) The PR includes the tests which compare the result with SGD with/without regularization. We did a comparison between LBFGS and SGD, and often we saw 10x less steps in LBFGS while the cost of per step is the same (just computing the gradient). The following is the paper by Prof. Ng at Stanford comparing different optimizers including LBFGS and SGD. They use them in the context of deep learning, but worth as reference. http://cs.stanford.edu/~jngiam/papers/LeNgiamCoatesLahiriProchnowNg2011.pdf Author: DB Tsai <[email protected]> Closes #353 from dbtsai/dbtsai-LBFGS and squashes the following commits: 984b18e [DB Tsai] L-BFGS Optimizer based on Breeze's implementation. Also fixed indentation issue in GradientDescent optimizer. (cherry picked from commit 6843d63) Signed-off-by: Patrick Wendell <[email protected]>
This PR uses Breeze's L-BFGS implement, and Breeze dependency has already been introduced by Xiangrui's sparse input format work in SPARK-1212. Nice work, @mengxr ! When use with regularized updater, we need compute the regVal and regGradient (the gradient of regularized part in the cost function), and in the currently updater design, we can compute those two values by the following way. Let's review how updater works when returning newWeights given the input parameters. w' = w - thisIterStepSize * (gradient + regGradient(w)) Note that regGradient is function of w! If we set gradient = 0, thisIterStepSize = 1, then regGradient(w) = w - w' As a result, for regVal, it can be computed by val regVal = updater.compute( weights, new DoubleMatrix(initialWeights.length, 1), 0, 1, regParam)._2 and for regGradient, it can be obtained by val regGradient = weights.sub( updater.compute(weights, new DoubleMatrix(initialWeights.length, 1), 1, 1, regParam)._1) The PR includes the tests which compare the result with SGD with/without regularization. We did a comparison between LBFGS and SGD, and often we saw 10x less steps in LBFGS while the cost of per step is the same (just computing the gradient). The following is the paper by Prof. Ng at Stanford comparing different optimizers including LBFGS and SGD. They use them in the context of deep learning, but worth as reference. http://cs.stanford.edu/~jngiam/papers/LeNgiamCoatesLahiriProchnowNg2011.pdf Author: DB Tsai <[email protected]> Closes #353 from dbtsai/dbtsai-LBFGS and squashes the following commits: 984b18e [DB Tsai] L-BFGS Optimizer based on Breeze's implementation. Also fixed indentation issue in GradientDescent optimizer.
This PR uses Breeze's L-BFGS implement, and Breeze dependency has already been introduced by Xiangrui's sparse input format work in SPARK-1212. Nice work, @mengxr ! When use with regularized updater, we need compute the regVal and regGradient (the gradient of regularized part in the cost function), and in the currently updater design, we can compute those two values by the following way. Let's review how updater works when returning newWeights given the input parameters. w' = w - thisIterStepSize * (gradient + regGradient(w)) Note that regGradient is function of w! If we set gradient = 0, thisIterStepSize = 1, then regGradient(w) = w - w' As a result, for regVal, it can be computed by val regVal = updater.compute( weights, new DoubleMatrix(initialWeights.length, 1), 0, 1, regParam)._2 and for regGradient, it can be obtained by val regGradient = weights.sub( updater.compute(weights, new DoubleMatrix(initialWeights.length, 1), 1, 1, regParam)._1) The PR includes the tests which compare the result with SGD with/without regularization. We did a comparison between LBFGS and SGD, and often we saw 10x less steps in LBFGS while the cost of per step is the same (just computing the gradient). The following is the paper by Prof. Ng at Stanford comparing different optimizers including LBFGS and SGD. They use them in the context of deep learning, but worth as reference. http://cs.stanford.edu/~jngiam/papers/LeNgiamCoatesLahiriProchnowNg2011.pdf Author: DB Tsai <[email protected]> Closes apache#353 from dbtsai/dbtsai-LBFGS and squashes the following commits: 984b18e [DB Tsai] L-BFGS Optimizer based on Breeze's implementation. Also fixed indentation issue in GradientDescent optimizer.
Small upstream bump
Enable SSL to test manageiq-providers-openstack-test-public-clouds
This PR uses Breeze's L-BFGS implement, and Breeze dependency has already been introduced by Xiangrui's sparse input format work in SPARK-1212. Nice work, @mengxr !
When use with regularized updater, we need compute the regVal and regGradient (the gradient of regularized part in the cost function), and in the currently updater design, we can compute those two values by the following way.
Let's review how updater works when returning newWeights given the input parameters.
w' = w - thisIterStepSize * (gradient + regGradient(w)) Note that regGradient is function of w!
If we set gradient = 0, thisIterStepSize = 1, then
regGradient(w) = w - w'
As a result, for regVal, it can be computed by
and for regGradient, it can be obtained by
The PR includes the tests which compare the result with SGD with/without regularization.
We did a comparison between LBFGS and SGD, and often we saw 10x less
steps in LBFGS while the cost of per step is the same (just computing
the gradient).
The following is the paper by Prof. Ng at Stanford comparing different
optimizers including LBFGS and SGD. They use them in the context of
deep learning, but worth as reference.
http://cs.stanford.edu/~jngiam/papers/LeNgiamCoatesLahiriProchnowNg2011.pdf