-
Notifications
You must be signed in to change notification settings - Fork 691
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix lbfgsb linesearch out of bound and findAlpha method #633
Fix lbfgsb linesearch out of bound and findAlpha method #633
Conversation
def minimize(f: DiffFunction[Double], init: Double = 1.0): Double = { | ||
minimizeWithBound(f, init = 1.0, bound = Double.PositiveInfinity) | ||
} | ||
|
||
/** | ||
* Performs a line search on the function f, returning a point satisfying | ||
* the Strong Wolfe conditions. Based on the line search detailed in | ||
* Nocedal & Wright Numerical Optimization p58. | ||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we update the annotation? It looks like out of date.
f8932dc
to
14b54bf
Compare
looks great! thanks so much! So sorry for the long delay. work and life have been extra busy |
state.x + (dir *:* stepSize) | ||
val newX = state.x + (dir :* stepSize) | ||
var i = 0 | ||
while (i < newX.length) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i prefer cforRange these days
cc @dlwh @yanboliang @dbtsai |
I have run tests for bound constraint LiR/LoR and huber regression against this fix, all passed. This looks good to me. Thanks ! |
LGTM. Ping @dlwh to cut a new release for our usage in Spark if this looks okay. Thanks. |
thanks! |
i'll cut a release this week |
## What changes were proposed in this pull request? MLlib ```LogisticRegression``` should support bound constrained optimization (only for L2 regularization). Users can add bound constraints to coefficients to make the solver produce solution in the specified range. Under the hood, we call Breeze [```L-BFGS-B```](https://github.com/scalanlp/breeze/blob/master/math/src/main/scala/breeze/optimize/LBFGSB.scala) as the solver for bound constrained optimization. But in the current breeze implementation, there are some bugs in L-BFGS-B, and scalanlp/breeze#633 fixed them. We need to upgrade dependent breeze later, and currently we use the workaround L-BFGS-B in this PR temporary for reviewing. ## How was this patch tested? Unit tests. Author: Yanbo Liang <[email protected]> Closes #17715 from yanboliang/spark-20047. (cherry picked from commit 606432a) Signed-off-by: DB Tsai <[email protected]>
## What changes were proposed in this pull request? MLlib ```LogisticRegression``` should support bound constrained optimization (only for L2 regularization). Users can add bound constraints to coefficients to make the solver produce solution in the specified range. Under the hood, we call Breeze [```L-BFGS-B```](https://github.com/scalanlp/breeze/blob/master/math/src/main/scala/breeze/optimize/LBFGSB.scala) as the solver for bound constrained optimization. But in the current breeze implementation, there are some bugs in L-BFGS-B, and scalanlp/breeze#633 fixed them. We need to upgrade dependent breeze later, and currently we use the workaround L-BFGS-B in this PR temporary for reviewing. ## How was this patch tested? Unit tests. Author: Yanbo Liang <[email protected]> Closes apache#17715 from yanboliang/spark-20047.
@@ -58,12 +58,18 @@ class StrongWolfeLineSearch(maxZoomIter: Int, maxLineSearchIter: Int) extends Cu | |||
val c1 = 1e-4 | |||
val c2 = 0.9 | |||
|
|||
def minimize(f: DiffFunction[Double], init: Double = 1.0): Double = { | |||
minimizeWithBound(f, init = 1.0, bound = Double.PositiveInfinity) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LBFGS is passing init value not always equals to 1 to this method. It that right to ignore it here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In LBFGS-B, we can use 1 as the init value, according to paper, or do you have some better init value ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally, I don't have a better init value, but LBFGS has: https://github.com/scalanlp/breeze/blob/master/math/src/main/scala/breeze/optimize/LBFGS.scala#L76 :)
Let me describe my problem: I've updated spark in my project to latest version (2.2) and some test on logistic regression start failing.
Spark 2.2 uses Breeze 0.13.1 and when I'm explicitly downgrading it to 0.13 tests stop failing.
It seems like in my case regression could not be successfully trained using LBFGS with init value equal to 1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh! you're right. we should fix it. like:
def minimize(f: DiffFunction[Double], init: Double = 1.0): Double = {
minimizeWithBound(f, init, bound = Double.PositiveInfinity)
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for finding this bug!
i think it's more that you're ignoring the passed in value of init
…On Thu, Jul 20, 2017 at 4:35 PM, WeichenXu ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In math/src/main/scala/breeze/optimize/StrongWolfe.scala
<#633 (comment)>:
> @@ -58,12 +58,18 @@ class StrongWolfeLineSearch(maxZoomIter: Int, maxLineSearchIter: Int) extends Cu
val c1 = 1e-4
val c2 = 0.9
+ def minimize(f: DiffFunction[Double], init: Double = 1.0): Double = {
+ minimizeWithBound(f, init = 1.0, bound = Double.PositiveInfinity)
In LBFGS-B, we can use 1 as the init value, according to paper, or do you
have some better init value ?
—
You are receiving this because you modified the open/close state.
Reply to this email directly, view it on GitHub
<#633 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AAAloepiOk2B2kTaPR896BT-RmplS8gFks5sP-RbgaJpZM4MvWUO>
.
|
MLlib ```LogisticRegression``` should support bound constrained optimization (only for L2 regularization). Users can add bound constraints to coefficients to make the solver produce solution in the specified range. Under the hood, we call Breeze [```L-BFGS-B```](https://github.com/scalanlp/breeze/blob/master/math/src/main/scala/breeze/optimize/LBFGSB.scala) as the solver for bound constrained optimization. But in the current breeze implementation, there are some bugs in L-BFGS-B, and scalanlp/breeze#633 fixed them. We need to upgrade dependent breeze later, and currently we use the workaround L-BFGS-B in this PR temporary for reviewing. Unit tests. Author: Yanbo Liang <[email protected]> Closes apache#17715 from yanboliang/spark-20047.
What to solve
After a deep check to the LBFGS-B in breeze, I found two serious bug in LBFGS-B:
LBFGSB.findAlpha
method is also wrong.Fix Line search with bound
According to the LBFGS-B paper
http://users.iems.northwestern.edu/~nocedal/PDFfiles/limited.pdf
The line search in LBFGS-B should be restricted in the bound, but the implementation in breeze do not,
this will cause the optimizer run out of bound, it will cause wrong result or cause line search fail, in some case.
This is the root cause of issue #572, and a series of features is blocking because of this bug:
So that it should be fixed ASAP.
cc @yanboliang
Strong wolfe line search with bound restriction
As mentioned in paper, the best linesearch method for LBFGS-B is strong wolfe line search with bound restriction, this require some modification based on
StrongWolfeLineSearch
in breeze:modified strong wolfe condition with bound
We know that strong wolfe condition is to satisfy sufficient decrease condition and curvature condition, BUT according to the paper, the strong wolfe line search in LBFGS-B, the condition should be modified as following:
Algorithm for strong wolfe condition with bound
Without bound, we already have the following algos (Nodecal & Wright Numerical Optimization p58)
So that, with bound, we can modify this algo into following:
modification in LBFGS-B
determineStepSize
, first calculate the max step size which won't exceed bound box, then use it as the step bound, callStrongWolfeLineSearch.minimizeWithBound
takeStep
method, in this method check whether thenewX
exceed bound and correct it. (This is used to avoid numerical error that cause thenewX
run out of bound)Fix
findAlpha
methodUnfortunately, the
findAlpha
method here is also wrong, the wrong implementation cause thesubspaceMin
point walk out of bound and so that in some case it will also cause the algos crash.I trace to the code here through several failed testcases with
Huber loss
.In summary, there are at least 2 mistakes in this method, let me explain what the method should do first:
find the maximum
alpha
, satisfiy0.0 <= alpha <= 1.0
, and, for each dimensioni
, should satisfy:xCauchy
is the Cauchy point inside the bound box which will satisfyand
du
is the direction vector (details please refer to the paper), the key point is , for eachi
, the condition above should be satisfied, so that thesubspace minimum point
computed will be restricted in the bound box.So that the algo to find the maximum alpha should be:
Note that we should handle the case
du_i == 0
carefully, otherwise it may generateNaN
in computation and will cause the whole algo crash.Then we can check the implementation in breeze, the logic in
findAlpha
is wrong. The place where it should usemath.min
it usemath.max
. And the code write(ub_i - xc_i) / du_i
to be the wrong codeub_i - xc_i / du_i
The second mistake in
findAlpha
is that it do not handle the case that components ofdu
is zero. This may cause computation run intoNaN
. We should skip the zero components ofdu
.Numerical error handling
In theory, the
cauchy point
, thesubspaceMin point
, theX
point, in the computation, should all be restricted in the bound box. BUT because of floating point error, it may slightly exceed the bound, so I add aadjustWithinBound
method to correct it. So that it can avoid the point step out of bound which may cause other bugs.Test
The following typical algos have been tested: