Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid overflow in generic_vecnorm() #11789

Closed
wants to merge 1 commit into from
Closed

Avoid overflow in generic_vecnorm() #11789

wants to merge 1 commit into from

Conversation

nalimilan
Copy link
Member

No need to take the inverse too early. Fixes #11788.

I've kept the existing type promotion code, although I'm not sure how it works.

@andreasnoack
Copy link
Member

I think there is a significant speed penalty with the proposed solution. Might be better to keep the original version but add something like scale=min(one(scale),scale).

cc: @stevengj

@nalimilan
Copy link
Member Author

Ah, OK, now I see why I was written that way. We could look at how OpenBLAS does this, as it returns the correct solution.

@nalimilan
Copy link
Member Author

@andreasnoack With your solution, the test case would return 0, which is less accurate than BLAS. That would be unfortunate, as the BLAS version is called for larger matrices. FWIW, the reference BLAS documentation explicitly says that nrm2 is provided not for speed, but for accuracy -- which is usually the hardest part to get right.

An intermediate solution would be to compute 1/scale beforehand as currently, but take the slow path if it's non-finite. That would not improve the accuracy in non-overflow cases, though, so the inconsistency with BLAS would remain.

@nalimilan
Copy link
Member Author

Maybe we could enable the fast code with @fastmath?

@dpo
Copy link
Contributor

dpo commented Jun 21, 2015

I would say accuracy is more important than speed here. On OSX the default BLAS (Accelerate) returns 7.00649232E-44 in double precision, which is a strange value.

Code:

      program testnorm
      real dnrm2
      double precision x(2)
      x(1) = 2.4D-322
      x(2) = 4.4D-323
      write(*,*) dnrm2(2, x, 1)
      end

The Matlab result is the one that seems to make the most sense.

@andreasnoack
Copy link
Member

@dpo dnrm2 should be declared double precision. That will give you the right result.

We could use the same logic as reference BLAS. See http://www.netlib.org/lapack/explore-html/da/d7f/dnrm2_8f_source.html. Our present implementation doesn't look to be that fast anyway and in most performance critical cases we'll use BLAS.

@dpo
Copy link
Contributor

dpo commented Jun 21, 2015

@andreasnoack Yes, thanks. That was a silly error!

I played around with

function twonorm(x :: Vector{Float64})
  scale = maximum(abs(x))
  scale == 0.0 && return 0.0
  return scale * sqrt(sum((x ./ scale).^2))
end

which returns the correct result. Here's how the BLAS does it:

function twonormblas(x :: Vector{Float64})
  scale = 0.0
  ssq = 1.0
  n = length(x)
  @inbounds for i = 1 : n
    if (x[i] != 0.0)
      absxi = abs(x[i])
      if (scale < absxi)
        ssq = 1.0 + ssq * (scale / absxi)^2
        scale = absxi
      else
        ssq = ssq + (absxi / scale)^2
      end
    end
  end
  return scale * sqrt(ssq)
end

but it's not very fast.

@andreasnoack
Copy link
Member

I just timed our present implementation in generic_vecnorm2 and a translation of the BLAS version and the latter is almost double as fast. It has the advantage of only traversing the array once.

@dpo
Copy link
Contributor

dpo commented Jun 21, 2015

Interesting. I wonder what I did wrong then:

julia> for k = 2:8
       x = rand(10^k);
       twonormblas(x); t1 = @elapsed twonormblas(x);
       t2=@elapsed norm(x); @printf("%2d  %8.2e  %8.2e\n", k, t1, t2);
       end
 2  3.84e-06  3.72e-06
 3  5.47e-05  1.23e-06
 4  3.84e-04  3.16e-06
 5  3.81e-03  5.87e-05
 6  3.65e-02  4.61e-04
 7  3.63e-01  4.48e-03
 8  3.72e+00  5.04e-02

This is OSX 10.9 with OpenBLAS.

@andreasnoack
Copy link
Member

Not sure why you get so slow results. My implementation (norm2a) is almost identical to yours and I get

julia> for k = 2:8
              x = rand(10^k);
              norm2a(x)
              t1 = @elapsed norm2a(x)
              t2 = @elapsed norm(x)
              t3 = @elapsed LinAlg.generic_vecnorm2(x)
              @printf("%2d  %8.2e  %8.2e  %8.2e\n", k, t1, t2, t3)
         end
 2  7.52e-07  2.27e-06  2.43e-06
 3  4.15e-06  7.22e-07  8.30e-06
 4  7.06e-05  3.52e-06  9.48e-05
 5  3.83e-04  3.29e-05  7.35e-04
 6  4.24e-03  5.75e-04  8.17e-03
 7  3.90e-02  5.93e-03  7.43e-02
 8  4.03e-01  6.25e-02  7.66e-01

@nalimilan
Copy link
Member Author

@andreasnoack The differences comes from the fact that @dpo compared his implementation to vecnorm (which calls BLAS), while you compared it to generic_vecnorm2. Since it's actually faster than the current version, let's go with the same algorithm as BLAS, and leave possibly faster code to @fastmath.

@andreasnoack
Copy link
Member

@nalimilan Could you rebase this one?

No need to take the inverse too early. Fixes #11788.
@andreasnoack
Copy link
Member

I've rebased and will merge when lights are green

@stevengj
Copy link
Member

I thought you were going to switch to the dnrm2 algorithm?

@andreasnoack
Copy link
Member

Yes. I reread this too fast. I'll prepare a pull request with the BLAS algorithm such that we can close #11788.

@stevengj
Copy link
Member

@andreasnoack, make sure that in doing so you avoid max and scalarmax and instead adopt something like the strategy in #12564. Actually, it's even simpler, just a scale < absxi check is fine here for determining the scale factor. If there are NaNs, they will propagate in the sum, so no need to put them in the scale factor.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants