-
Notifications
You must be signed in to change notification settings - Fork 221
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
exit early when nan in grad or hessian #1046
exit early when nan in grad or hessian #1046
Conversation
I agree that this is a good idea because the current behavior will keep iterating until |
Tests fail because TwiceDifferentiableHV does not have an H field, but you could check for NaNs in the Hessian-Vector product. I know it should be a different counter, but that can be another PR. |
I excluded TwiceDifferentiableHV from the check. |
Codecov Report
@@ Coverage Diff @@
## master #1046 +/- ##
==========================================
+ Coverage 85.26% 85.29% +0.02%
==========================================
Files 43 43
Lines 3210 3216 +6
==========================================
+ Hits 2737 2743 +6
Misses 473 473
|
Is there a way to hide this warning? PySR and SymbolicRegression.jl have started generating TONS of these warnings during the expression search (which is fine; they will just skip such an expression). |
Does this interfere with the behavior of |
Yeah I think this is a breaking change. See # Hard-coded backtrack until we find a finite function value
iterfinite = 0
while !isfinite(ϕx_1) && iterfinite < iterfinitemax
iterfinite += 1
α_1 = α_2
α_2 = α_1/2
ϕx_1 = ϕ(α_2)
end ^ i.e., it will backtrack when it hits a NaN. Whereas this PR makes it exit when it hits a NaN, which is fundamentally different behavior. Could you please roll this change back and make it optional? It has changed the algorithmic behavior of PySR and SymbolicRegression.jl. |
can you share an example that is broken now but used to work? |
Gradient-based optimisers in Optim work as follows:
The check here is in step 1 checking if it results in NaN gradient/Hessian. For the current point in step 1 to have a NaN gradient, it must not have had a NaN objective because it was the output of the line search in step 3. None of the line search algorithms in step 3 will terminate at a point with a NaN objective if the initial point had a finite objective value but they can terminate at a point with NaN gradient and/or Hessian. So if the gradient/Hessian in step 1 has NaNs then the search direction will have NaNs and the line search will fail because all steps along a NaN search direction lead to a NaN x and then NaN objective value. So the rest of the iterations will all be spent failing line searches and doing unnecessary function evaluations. Given the above reasoning, I am curious what exactly did this PR break? |
Hm, I wonder if it was due to the special handling of NaNs in my package - I treat NaNs as a signal. I will have a look this weekend and see what the issue is. I fixed the version to 1.7.6 in SymbolicRegression.jl for now. In any case, the warnings would be nice to turn off; is that possible? |
It should be possible to turn the warnings off with a flag but that requires a new option in Optim so the decision will be Patrick's. It's also possible to turn off warnings from your end as a user. |
Thanks @mohamed82008. I made a PR in #1049 to turn off the warning messages when |
I think this is false. This check happens after the backtracking out of non-finite values loop. |
Thanks. Could you please see #1049? That is blocking me from updating to the latest Optim. |
This PR checks that the gradient and Hessian are all finite, otherwise it breaks early with a warning.