-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid overflow in generic_vecnorm() #11789
Conversation
I think there is a significant speed penalty with the proposed solution. Might be better to keep the original version but add something like cc: @stevengj |
Ah, OK, now I see why I was written that way. We could look at how OpenBLAS does this, as it returns the correct solution. |
@andreasnoack With your solution, the test case would return An intermediate solution would be to compute |
Maybe we could enable the fast code with |
I would say accuracy is more important than speed here. On OSX the default BLAS (Accelerate) returns 7.00649232E-44 in double precision, which is a strange value. Code:
The Matlab result is the one that seems to make the most sense. |
@dpo We could use the same logic as reference BLAS. See http://www.netlib.org/lapack/explore-html/da/d7f/dnrm2_8f_source.html. Our present implementation doesn't look to be that fast anyway and in most performance critical cases we'll use BLAS. |
@andreasnoack Yes, thanks. That was a silly error! I played around with function twonorm(x :: Vector{Float64})
scale = maximum(abs(x))
scale == 0.0 && return 0.0
return scale * sqrt(sum((x ./ scale).^2))
end which returns the correct result. Here's how the BLAS does it: function twonormblas(x :: Vector{Float64})
scale = 0.0
ssq = 1.0
n = length(x)
@inbounds for i = 1 : n
if (x[i] != 0.0)
absxi = abs(x[i])
if (scale < absxi)
ssq = 1.0 + ssq * (scale / absxi)^2
scale = absxi
else
ssq = ssq + (absxi / scale)^2
end
end
end
return scale * sqrt(ssq)
end but it's not very fast. |
I just timed our present implementation in |
Interesting. I wonder what I did wrong then:
This is OSX 10.9 with OpenBLAS. |
Not sure why you get so slow results. My implementation ( julia> for k = 2:8
x = rand(10^k);
norm2a(x)
t1 = @elapsed norm2a(x)
t2 = @elapsed norm(x)
t3 = @elapsed LinAlg.generic_vecnorm2(x)
@printf("%2d %8.2e %8.2e %8.2e\n", k, t1, t2, t3)
end
2 7.52e-07 2.27e-06 2.43e-06
3 4.15e-06 7.22e-07 8.30e-06
4 7.06e-05 3.52e-06 9.48e-05
5 3.83e-04 3.29e-05 7.35e-04
6 4.24e-03 5.75e-04 8.17e-03
7 3.90e-02 5.93e-03 7.43e-02
8 4.03e-01 6.25e-02 7.66e-01 |
@andreasnoack The differences comes from the fact that @dpo compared his implementation to |
@nalimilan Could you rebase this one? |
No need to take the inverse too early. Fixes #11788.
I've rebased and will merge when lights are green |
I thought you were going to switch to the |
Yes. I reread this too fast. I'll prepare a pull request with the BLAS algorithm such that we can close #11788. |
|
No need to take the inverse too early. Fixes #11788.
I've kept the existing type promotion code, although I'm not sure how it works.