only use accurate powf function #19890

vtjnash · 2017-01-06T03:33:54Z

The powi intrinsic optimization over calling powf is that it is inaccurate.
We don't need that.

When it is equally accurate (e.g. tiny constant powers),
LLVM will already recognize and optimize any call to a function named powf,
and produce the same speedup.

fix #19872

vtjnash · 2017-01-06T04:13:57Z

Note that this reverts #2741 (comment) (e.g. restores f2b3192), since being correct is more important than being a few % faster.

JeffBezanson · 2017-01-06T04:31:58Z

test/math.jl

@@ -959,9 +959,13 @@ end
 end

 @testset "issue #13748" begin
-    let A = [1 2; 3 4]; B = [5 6; 7 8]; C = [9 10; 11 12]
-        @test muladd(A,B,C) == A*B + C
-    end


Did you mean to delete this?

yuyichao · 2017-01-06T12:52:09Z

src/codegen.cpp

@@ -5964,7 +5961,7 @@ static void init_julia_llvm_env(Module *m)
        &pow,


This breaks compilation here.

What's the reason again that we can't do the static_cast unconditionally? Including <math.h> in C++ code causes using std::pow; which provides additional overloads for float and long double.

More precisely what glibc + libstdc++/libc++ do when <math.h> is directly included by c++ code is roughly

double pow(double, double); namespace std { using ::pow; float pow(float, float); long double pow(long double, long double); // template version for promotion... } using std::pow;

It shouldn't be possible to overwrite a previously defined ::pow(double, double) so I expect MSVC to do something similar and it should be safe in general to specify the overload type using a static_cast.

My main concern would be whether those are the same function. I would expect that the C++ version is getting name mangled might not end up picking up the openlibm version.

I don't think that is an issue.

These should be inline functions that calls the C ::pow, ::powf, ::powl.

For C++ implementations that provide these it shouldn't be allowed to overwrite the existing ::pow(double, double) symbol so the it should always be the C one.

At least in the generic linux binary cglobal(:pow) and ccall(:pow, ..) points to the libm version.

StefanKarpinski · 2017-01-24T01:24:22Z

Are we moving ahead with this? If not, we should remove the 0.6.0 label. If so, it would be good to get the tests passing...

vtjnash · 2017-01-24T04:27:01Z

That's the plan, just made an error in rebasing. But it also now made a good excuse to try out the new llvmcall support.

vchuravy · 2017-01-24T06:59:03Z

@nanosoldier runbenchmarks(ALL, vs=":master")

tkelman · 2017-01-24T07:08:18Z

replutil test failure on win32 is real - probably related to #17251, will see if locally running with --precompiled=no makes it pass

edit: yep, get to choose either decent startup time or decent backtraces

$ usr/bin/julia -e 'versioninfo()'
Julia Version 0.6.0-dev.2275
Commit 8994702* (2017-01-24 04:04 UTC)
Platform Info:
  OS: Windows (i686-w64-mingw32)
  CPU: Intel(R) Core(TM) i7-6700HQ CPU @ 2.60GHz
  WORD_SIZE: 32
  BLAS: libopenblas (DYNAMIC_ARCH NO_AFFINITY Nehalem)
  LAPACK: libopenblas
  LIBM: libopenlibm
  LLVM: libLLVM-3.9.1 (ORCJIT, skylake)

$ time(usr/bin/julia -e '(-1)^(-0.25)')
ERROR: DomainError:
Stacktrace:
 [1] nan_dom_err; at .\math.jl:311 [inlined]
 [2] ^ at .\math.jl:680
 [3] ^(::Int32, ::Float64) at .\promotion.jl:245

real    0m1.554s
user    0m0.000s
sys     0m0.015s

$ time(usr/bin/julia --precompiled=no -e '(-1)^(-0.25)')
ERROR: DomainError:
Exponentiation yielding a complex result requires a complex argument.
Replace x^y with (x+0im)^y, Complex(x)^y, or similar.
Stacktrace:
 [1] nan_dom_err at .\math.jl:311 [inlined]
 [2] ^(::Float64, ::Float64) at .\math.jl:680
 [3] ^(::Int32, ::Float64) at .\promotion.jl:245

real    0m11.287s
user    0m0.000s
sys     0m0.015s

nanosoldier · 2017-01-24T10:59:18Z

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @jrevels

StefanKarpinski · 2017-01-24T15:34:38Z

Another failure in test/replutil.jl...?

JeffBezanson · 2017-01-24T18:57:39Z

test/replutil tends to generate action-at-a-distance test failures, since it looks at e.g. what is printed for method tables of Base functions. Change some methods and the output changes.

tkelman · 2017-01-24T19:19:32Z

This failure is very specific to debug info in backtraces for the pow function. Not action at a distance at all, for this PR.

vtjnash · 2017-01-24T19:34:18Z

[1] nan_dom_err; at .\math.jl:311 [inlined]

looks like the name-demanger is getting confused here, nothing to do with #17251

vtjnash · 2017-01-24T19:36:48Z

possible performance regressions were detected

Just a note that these performance reductions are both expected and intentional. It's good that we have a record of them though, and will probably want to link back here / note this on the nightly as well.

tkelman · 2017-01-24T19:53:20Z

We should probably revert exp back to using openlibm then, if changing the pow calculation used by julia is going to be this expensive. Or change its code to opt in to the other pow version, if it isn't needed for accuracy in the innards of the exp implementstion? cc @musm

KristofferC · 2017-01-24T19:57:02Z

The julia code for exp seemed very well tested for accuracy so it should be possible to keep the performance for exp which is where a lot of the slowdowns are being reported.

tkelman · 2017-01-24T19:57:57Z

could @fastmath be usable for allowing users to annotate whether they prefer accuracy or speed for this?

vtjnash · 2017-01-24T20:08:42Z

yes, FastMath.pow implements the fast (previous) version

vtjnash · 2017-01-27T18:44:23Z

It looks like exp was just computing 2 ^ constant, and my change was stopping it from doing constant propagation / inlining. Easy fix.

tkelman · 2017-01-27T18:54:38Z

great @nanosoldier runbenchmarks(ALL, vs=":master")

musm · 2017-01-27T19:55:44Z

base/math.jl

-^(x::Float32, y::Int32) = powi_llvm(x, y)
-^(x::Float16, y::Integer) = Float16(Float32(x)^y)
+@inline ^(x::Float64, y::Float64) = nan_dom_err(ccall("llvm.pow.f64", llvmcall, Float64, (Float64, Float64), x, y), x + y)
+@inline ^(x::Float32, y::Float32) = nan_dom_err(ccall("llvm.pow.f32", llvmcall, Float32, (Float32, Float32), x, y), x + y)


I might have missed something, but why now llvm for ^ when both arguments are floats? the libm pow function is accurate.

It resolves to the same function, but give an optimization hint for llvm that we mean that pow, and not just some random function named pow.

musm · 2017-01-27T21:22:44Z

julia/base/math.jl

Line 508 in e577f8e

function ldexp{T<:AbstractFloat}(x::T, e::Integer)

Also needs ^ = Base.FastMath.pow_fast for similar reason. Should trigger a regression.

nanosoldier · 2017-01-27T22:53:09Z

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @jrevels

StefanKarpinski · 2017-02-21T21:34:05Z

It seems like there ought to be more efficient ways to compute x^p precisely for floating-point x and fixed p than calling the general pow function.

simonbyrne · 2017-02-21T21:45:03Z

Probably not for arbitrary x: pow(x,y) is typically implemented as a slightly-extended precision version of exp(y*log(x)).

The powi intrinsic optimization over calling powf is that it is inaccurate. We don't need that. When it is equally accurate (e.g. tiny constant powers), LLVM will already recognize and optimize any call to a function named `powf`, and produce the same speedup. fix #19872

tkelman · 2017-02-22T20:23:30Z

@nanosoldier runbenchmarks(ALL, vs=":master")

nanosoldier · 2017-02-22T23:43:32Z

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @jrevels

tkelman · 2017-02-23T04:03:49Z

so the simple version regressed - is that the Val literal lowering change making inlining or constant prop less predictable?

vtjnash · 2017-02-23T17:58:39Z

Moving the value into the type domain can potentially thwart constant folding. In this case, it's this constant-value function that got broken:

exp_small_thres(::Type{Float64}) = 2.0^-28

… inlining of `^`

vtjnash · 2017-02-23T19:15:57Z

We should really stop dispatching on Type{Val{...}}. Working with types is much harder on inference and the runtime than Val{...}. Constructing the value is effectively free (it's a singleton, so it's already constructed), and it's much more likely to be properly constant folded and dispatched. Plus there are a lot of fast paths in the runtime that can be accessed by a singleton which cannot exist for types (They instead often get handed off to the very slow fallback code).

tkelman · 2017-02-24T00:39:59Z

were you waiting for something before running @nanosoldier runbenchmarks(ALL, vs=":master") again?

JeffBezanson · 2017-02-24T01:08:57Z

+100 #19890 (comment)

The only real problem is that Val{x}() is a bit long. I believe we could define

@pure Val(x) = Val{x}()

which would allow it to be constant-folded and inferred just as well, so you could write Val(3).

nanosoldier · 2017-02-24T04:06:27Z

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @jrevels

vtjnash · 2017-02-24T04:55:25Z

@pure Val(x) = Val{x}()

true. that then just has the problem that calling the Val(x) constructor requires the slow path :P

I'll also just reiterate that the fact there's a number of methods foo(::Type{Val{true}}) is just completely awful. Just call it foo_true() if you're really hate your code reader that much.

JeffBezanson · 2017-02-24T05:01:59Z

Agreed; we should not have public APIs with Val{true} and Val{false}. A rare case where a name with an underscore is less ugly.

vtjnash · 2017-02-24T05:05:14Z

~/julia/base$ grep -RI 'Val{true}\|Val{false}' . | wc -l
      68

anyways, enough OT - I think we can go ahead and merge this?

JeffBezanson · 2017-02-24T05:15:01Z

Yes, seems like it's time.

stevengj · 2017-02-24T15:39:08Z

The constant-folding of powers is fixed by #20783, following the suggestion to use Val{p}().

musm · 2017-02-28T09:11:23Z

base/math.jl

+    @eval exponent_max(::Type{$T}) = $(Int(exponent_mask(T) >> significand_bits(T)) - exponent_bias(T))
+    # maximum float exponent without bias
+    @eval exponent_raw_max(::Type{$T}) = $(Int(exponent_mask(T) >> significand_bits(T)))
+end


@vtjnash Out of curiosity, what was the reason behind moving these functions from float.jl to math.jl? I had some code that imported Base.significand_bits, and now it needs Base.Math.significand_bits to work on master.

They were incorrectly marked @pure, and only used here. You can import them from Base.Math on old code also, to be compatible.

vtjnash requested a review from simonbyrne January 6, 2017 03:33

JeffBezanson reviewed Jan 6, 2017

View reviewed changes

vtjnash force-pushed the jn/19872 branch from b165043 to 81a0cc6 Compare January 6, 2017 04:35

yuyichao reviewed Jan 6, 2017

View reviewed changes

kshyatt added the maths Mathematical functions label Jan 6, 2017

simonbyrne added this to the 0.6.0 milestone Jan 12, 2017

vtjnash force-pushed the jn/19872 branch from 81a0cc6 to 8be7301 Compare January 23, 2017 19:20

vtjnash force-pushed the jn/19872 branch from 8be7301 to 8994702 Compare January 24, 2017 04:05

tkelman mentioned this pull request Jan 27, 2017

fix demangling of inlined frames to not depend on linfo lookup #20218

Merged

vtjnash force-pushed the jn/19872 branch from 8994702 to e577f8e Compare January 27, 2017 18:42

musm reviewed Jan 27, 2017

View reviewed changes

update DomainError showerror handling to ignoring inlining

5a1f971

vtjnash force-pushed the jn/19872 branch from 1282205 to b8f7ef1 Compare February 22, 2017 20:00

vtjnash force-pushed the jn/19872 branch from b8f7ef1 to 1c5d733 Compare February 22, 2017 20:01

make several pure-marked math functions actually pure

cc7990b

vtjnash force-pushed the jn/19872 branch from cc7990b to 4f534bc Compare February 23, 2017 19:09

make sure that the indirection through the Val{p} type doesn't stop…

0c5eac2

… inlining of `^`

vtjnash force-pushed the jn/19872 branch from 4f534bc to 0c5eac2 Compare February 23, 2017 19:12

stevengj mentioned this pull request Feb 24, 2017

change x^p for literal p from x^Val{p} to x^Val{p}() #20783

Closed

vtjnash merged commit ddd40d9 into master Feb 24, 2017

vtjnash deleted the jn/19872 branch February 24, 2017 16:57

musm reviewed Feb 28, 2017

View reviewed changes

saschatimme mentioned this pull request Sep 21, 2017

Surprising performance for ^(::Float64, ::Int) vs ^(::Complex128, ::Int) #23804

Closed

vtjnash mentioned this pull request Nov 14, 2017

improvements to accuracy/performance for float^integer #24500

Merged

KristofferC mentioned this pull request Aug 28, 2021

^(::Float, ::Integer) #42031

Merged

		@@ -5964,7 +5961,7 @@ static void init_julia_llvm_env(Module *m)
		&pow,

only use accurate powf function #19890

only use accurate powf function #19890

Conversation

vtjnash commented Jan 6, 2017

vtjnash commented Jan 6, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yuyichao Jan 6, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

StefanKarpinski commented Jan 24, 2017

vtjnash commented Jan 24, 2017

vchuravy commented Jan 24, 2017

tkelman commented Jan 24, 2017 • edited Loading

nanosoldier commented Jan 24, 2017

StefanKarpinski commented Jan 24, 2017

JeffBezanson commented Jan 24, 2017

tkelman commented Jan 24, 2017

vtjnash commented Jan 24, 2017

vtjnash commented Jan 24, 2017

tkelman commented Jan 24, 2017

KristofferC commented Jan 24, 2017

tkelman commented Jan 24, 2017

vtjnash commented Jan 24, 2017

vtjnash commented Jan 27, 2017

tkelman commented Jan 27, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

musm commented Jan 27, 2017

nanosoldier commented Jan 27, 2017

StefanKarpinski commented Feb 21, 2017

simonbyrne commented Feb 21, 2017

tkelman commented Feb 22, 2017

nanosoldier commented Feb 22, 2017

tkelman commented Feb 23, 2017

vtjnash commented Feb 23, 2017

vtjnash commented Feb 23, 2017

tkelman commented Feb 24, 2017

JeffBezanson commented Feb 24, 2017

nanosoldier commented Feb 24, 2017

vtjnash commented Feb 24, 2017

JeffBezanson commented Feb 24, 2017

vtjnash commented Feb 24, 2017

JeffBezanson commented Feb 24, 2017

stevengj commented Feb 24, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yuyichao Jan 6, 2017 •

edited

Loading

tkelman commented Jan 24, 2017 •

edited

Loading