Avoid overflow in generic_vecnorm() #11789

nalimilan · 2015-06-20T18:09:16Z

No need to take the inverse too early. Fixes #11788.

I've kept the existing type promotion code, although I'm not sure how it works.

andreasnoack · 2015-06-20T20:08:49Z

I think there is a significant speed penalty with the proposed solution. Might be better to keep the original version but add something like scale=min(one(scale),scale).

cc: @stevengj

nalimilan · 2015-06-20T22:45:25Z

Ah, OK, now I see why I was written that way. We could look at how OpenBLAS does this, as it returns the correct solution.

nalimilan · 2015-06-21T13:56:17Z

@andreasnoack With your solution, the test case would return 0, which is less accurate than BLAS. That would be unfortunate, as the BLAS version is called for larger matrices. FWIW, the reference BLAS documentation explicitly says that nrm2 is provided not for speed, but for accuracy -- which is usually the hardest part to get right.

An intermediate solution would be to compute 1/scale beforehand as currently, but take the slow path if it's non-finite. That would not improve the accuracy in non-overflow cases, though, so the inconsistency with BLAS would remain.

nalimilan · 2015-06-21T14:04:49Z

Maybe we could enable the fast code with @fastmath?

dpo · 2015-06-21T15:53:27Z

I would say accuracy is more important than speed here. On OSX the default BLAS (Accelerate) returns 7.00649232E-44 in double precision, which is a strange value.

Code:

      program testnorm
      real dnrm2
      double precision x(2)
      x(1) = 2.4D-322
      x(2) = 4.4D-323
      write(*,*) dnrm2(2, x, 1)
      end

The Matlab result is the one that seems to make the most sense.

andreasnoack · 2015-06-21T17:08:06Z

@dpo dnrm2 should be declared double precision. That will give you the right result.

We could use the same logic as reference BLAS. See http://www.netlib.org/lapack/explore-html/da/d7f/dnrm2_8f_source.html. Our present implementation doesn't look to be that fast anyway and in most performance critical cases we'll use BLAS.

dpo · 2015-06-21T17:23:19Z

@andreasnoack Yes, thanks. That was a silly error!

I played around with

function twonorm(x :: Vector{Float64})
  scale = maximum(abs(x))
  scale == 0.0 && return 0.0
  return scale * sqrt(sum((x ./ scale).^2))
end

which returns the correct result. Here's how the BLAS does it:

function twonormblas(x :: Vector{Float64})
  scale = 0.0
  ssq = 1.0
  n = length(x)
  @inbounds for i = 1 : n
    if (x[i] != 0.0)
      absxi = abs(x[i])
      if (scale < absxi)
        ssq = 1.0 + ssq * (scale / absxi)^2
        scale = absxi
      else
        ssq = ssq + (absxi / scale)^2
      end
    end
  end
  return scale * sqrt(ssq)
end

but it's not very fast.

andreasnoack · 2015-06-21T17:28:55Z

I just timed our present implementation in generic_vecnorm2 and a translation of the BLAS version and the latter is almost double as fast. It has the advantage of only traversing the array once.

dpo · 2015-06-21T17:33:13Z

Interesting. I wonder what I did wrong then:

julia> for k = 2:8
       x = rand(10^k);
       twonormblas(x); t1 = @elapsed twonormblas(x);
       t2=@elapsed norm(x); @printf("%2d  %8.2e  %8.2e\n", k, t1, t2);
       end
 2  3.84e-06  3.72e-06
 3  5.47e-05  1.23e-06
 4  3.84e-04  3.16e-06
 5  3.81e-03  5.87e-05
 6  3.65e-02  4.61e-04
 7  3.63e-01  4.48e-03
 8  3.72e+00  5.04e-02

This is OSX 10.9 with OpenBLAS.

andreasnoack · 2015-06-21T17:46:06Z

Not sure why you get so slow results. My implementation (norm2a) is almost identical to yours and I get

julia> for k = 2:8
              x = rand(10^k);
              norm2a(x)
              t1 = @elapsed norm2a(x)
              t2 = @elapsed norm(x)
              t3 = @elapsed LinAlg.generic_vecnorm2(x)
              @printf("%2d  %8.2e  %8.2e  %8.2e\n", k, t1, t2, t3)
         end
 2  7.52e-07  2.27e-06  2.43e-06
 3  4.15e-06  7.22e-07  8.30e-06
 4  7.06e-05  3.52e-06  9.48e-05
 5  3.83e-04  3.29e-05  7.35e-04
 6  4.24e-03  5.75e-04  8.17e-03
 7  3.90e-02  5.93e-03  7.43e-02
 8  4.03e-01  6.25e-02  7.66e-01

nalimilan · 2015-06-21T17:55:00Z

@andreasnoack The differences comes from the fact that @dpo compared his implementation to vecnorm (which calls BLAS), while you compared it to generic_vecnorm2. Since it's actually faster than the current version, let's go with the same algorithm as BLAS, and leave possibly faster code to @fastmath.

andreasnoack · 2015-08-11T13:01:47Z

@nalimilan Could you rebase this one?

No need to take the inverse too early. Fixes #11788.

andreasnoack · 2015-08-11T13:36:20Z

I've rebased and will merge when lights are green

stevengj · 2015-08-11T15:22:28Z

I thought you were going to switch to the dnrm2 algorithm?

andreasnoack · 2015-08-11T15:31:27Z

Yes. I reread this too fast. I'll prepare a pull request with the BLAS algorithm such that we can close #11788.

stevengj · 2015-08-11T16:57:41Z

~~@andreasnoack, make sure that in doing so you avoid max and scalarmax and instead adopt something like the strategy in #12564.~~ Actually, it's even simpler, just a scale < absxi check is fine here for determining the scale factor. If there are NaNs, they will propagate in the sum, so no need to put them in the scale factor.

nalimilan force-pushed the nl/norm branch from f110415 to 0030786 Compare June 20, 2015 18:10

nalimilan mentioned this pull request Jun 20, 2015

Error in 2-norm computation #11788

Closed

Avoid overflow in generic_vecnorm()

2996f74

No need to take the inverse too early. Fixes #11788.

andreasnoack force-pushed the nl/norm branch from 0030786 to 2996f74 Compare August 11, 2015 13:35

andreasnoack mentioned this pull request Aug 11, 2015

Avoid overflow in p norms #12571

Merged

andreasnoack closed this in 24335c4 Sep 3, 2015

stevengj deleted the nl/norm branch September 3, 2015 20:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Avoid overflow in generic_vecnorm() #11789

Avoid overflow in generic_vecnorm() #11789

nalimilan commented Jun 20, 2015

andreasnoack commented Jun 20, 2015

nalimilan commented Jun 20, 2015

nalimilan commented Jun 21, 2015

nalimilan commented Jun 21, 2015

dpo commented Jun 21, 2015

andreasnoack commented Jun 21, 2015

dpo commented Jun 21, 2015

andreasnoack commented Jun 21, 2015

dpo commented Jun 21, 2015

andreasnoack commented Jun 21, 2015

nalimilan commented Jun 21, 2015

andreasnoack commented Aug 11, 2015

andreasnoack commented Aug 11, 2015

stevengj commented Aug 11, 2015

andreasnoack commented Aug 11, 2015

stevengj commented Aug 11, 2015

Avoid overflow in generic_vecnorm() #11789

Avoid overflow in generic_vecnorm() #11789

Conversation

nalimilan commented Jun 20, 2015

andreasnoack commented Jun 20, 2015

nalimilan commented Jun 20, 2015

nalimilan commented Jun 21, 2015

nalimilan commented Jun 21, 2015

dpo commented Jun 21, 2015

andreasnoack commented Jun 21, 2015

dpo commented Jun 21, 2015

andreasnoack commented Jun 21, 2015

dpo commented Jun 21, 2015

andreasnoack commented Jun 21, 2015

nalimilan commented Jun 21, 2015

andreasnoack commented Aug 11, 2015

andreasnoack commented Aug 11, 2015

stevengj commented Aug 11, 2015

andreasnoack commented Aug 11, 2015

stevengj commented Aug 11, 2015