Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

only use accurate powf function #19890

Merged
merged 4 commits into from
Feb 24, 2017
Merged

only use accurate powf function #19890

merged 4 commits into from
Feb 24, 2017

Conversation

vtjnash
Copy link
Member

@vtjnash vtjnash commented Jan 6, 2017

The powi intrinsic optimization over calling powf is that it is inaccurate.
We don't need that.

When it is equally accurate (e.g. tiny constant powers),
LLVM will already recognize and optimize any call to a function named powf,
and produce the same speedup.

fix #19872

@vtjnash
Copy link
Member Author

vtjnash commented Jan 6, 2017

Note that this reverts #2741 (comment) (e.g. restores f2b3192), since being correct is more important than being a few % faster.

test/math.jl Outdated
@@ -959,9 +959,13 @@ end
end

@testset "issue #13748" begin
let A = [1 2; 3 4]; B = [5 6; 7 8]; C = [9 10; 11 12]
@test muladd(A,B,C) == A*B + C
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you mean to delete this?

src/codegen.cpp Outdated
@@ -5964,7 +5961,7 @@ static void init_julia_llvm_env(Module *m)
&pow,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This breaks compilation here.

What's the reason again that we can't do the static_cast unconditionally? Including <math.h> in C++ code causes using std::pow; which provides additional overloads for float and long double.

Copy link
Contributor

@yuyichao yuyichao Jan 6, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More precisely what glibc + libstdc++/libc++ do when <math.h> is directly included by c++ code is roughly

double pow(double, double);

namespace std {
using ::pow;
float pow(float, float);
long double pow(long double, long double);
// template version for promotion...
}

using std::pow;

It shouldn't be possible to overwrite a previously defined ::pow(double, double) so I expect MSVC to do something similar and it should be safe in general to specify the overload type using a static_cast.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My main concern would be whether those are the same function. I would expect that the C++ version is getting name mangled might not end up picking up the openlibm version.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that is an issue.

  1. These should be inline functions that calls the C ::pow, ::powf, ::powl.
  2. For C++ implementations that provide these it shouldn't be allowed to overwrite the existing ::pow(double, double) symbol so the it should always be the C one.
  3. At least in the generic linux binary cglobal(:pow) and ccall(:pow, ..) points to the libm version.

@kshyatt kshyatt added the maths Mathematical functions label Jan 6, 2017
@simonbyrne simonbyrne added this to the 0.6.0 milestone Jan 12, 2017
@StefanKarpinski
Copy link
Member

Are we moving ahead with this? If not, we should remove the 0.6.0 label. If so, it would be good to get the tests passing...

@vtjnash
Copy link
Member Author

vtjnash commented Jan 24, 2017

That's the plan, just made an error in rebasing. But it also now made a good excuse to try out the new llvmcall support.

@vchuravy
Copy link
Member

@nanosoldier runbenchmarks(ALL, vs=":master")

@tkelman
Copy link
Contributor

tkelman commented Jan 24, 2017

replutil test failure on win32 is real - probably related to #17251, will see if locally running with --precompiled=no makes it pass

edit: yep, get to choose either decent startup time or decent backtraces

$ usr/bin/julia -e 'versioninfo()'
Julia Version 0.6.0-dev.2275
Commit 8994702* (2017-01-24 04:04 UTC)
Platform Info:
  OS: Windows (i686-w64-mingw32)
  CPU: Intel(R) Core(TM) i7-6700HQ CPU @ 2.60GHz
  WORD_SIZE: 32
  BLAS: libopenblas (DYNAMIC_ARCH NO_AFFINITY Nehalem)
  LAPACK: libopenblas
  LIBM: libopenlibm
  LLVM: libLLVM-3.9.1 (ORCJIT, skylake)

$ time(usr/bin/julia -e '(-1)^(-0.25)')
ERROR: DomainError:
Stacktrace:
 [1] nan_dom_err; at .\math.jl:311 [inlined]
 [2] ^ at .\math.jl:680
 [3] ^(::Int32, ::Float64) at .\promotion.jl:245

real    0m1.554s
user    0m0.000s
sys     0m0.015s

$ time(usr/bin/julia --precompiled=no -e '(-1)^(-0.25)')
ERROR: DomainError:
Exponentiation yielding a complex result requires a complex argument.
Replace x^y with (x+0im)^y, Complex(x)^y, or similar.
Stacktrace:
 [1] nan_dom_err at .\math.jl:311 [inlined]
 [2] ^(::Float64, ::Float64) at .\math.jl:680
 [3] ^(::Int32, ::Float64) at .\promotion.jl:245

real    0m11.287s
user    0m0.000s
sys     0m0.015s

@nanosoldier
Copy link
Collaborator

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @jrevels

@StefanKarpinski
Copy link
Member

Another failure in test/replutil.jl...?

@JeffBezanson
Copy link
Member

test/replutil tends to generate action-at-a-distance test failures, since it looks at e.g. what is printed for method tables of Base functions. Change some methods and the output changes.

@tkelman
Copy link
Contributor

tkelman commented Jan 24, 2017

This failure is very specific to debug info in backtraces for the pow function. Not action at a distance at all, for this PR.

@vtjnash
Copy link
Member Author

vtjnash commented Jan 24, 2017

[1] nan_dom_err; at .\math.jl:311 [inlined]

looks like the name-demanger is getting confused here, nothing to do with #17251

@vtjnash
Copy link
Member Author

vtjnash commented Jan 24, 2017

possible performance regressions were detected

Just a note that these performance reductions are both expected and intentional. It's good that we have a record of them though, and will probably want to link back here / note this on the nightly as well.

@tkelman
Copy link
Contributor

tkelman commented Jan 24, 2017

We should probably revert exp back to using openlibm then, if changing the pow calculation used by julia is going to be this expensive. Or change its code to opt in to the other pow version, if it isn't needed for accuracy in the innards of the exp implementstion? cc @musm

@KristofferC
Copy link
Member

The julia code for exp seemed very well tested for accuracy so it should be possible to keep the performance for exp which is where a lot of the slowdowns are being reported.

@tkelman
Copy link
Contributor

tkelman commented Jan 24, 2017

could @fastmath be usable for allowing users to annotate whether they prefer accuracy or speed for this?

@vtjnash
Copy link
Member Author

vtjnash commented Jan 24, 2017

yes, FastMath.pow implements the fast (previous) version

@vtjnash
Copy link
Member Author

vtjnash commented Jan 27, 2017

It looks like exp was just computing 2 ^ constant, and my change was stopping it from doing constant propagation / inlining. Easy fix.

@tkelman
Copy link
Contributor

tkelman commented Jan 27, 2017

great @nanosoldier runbenchmarks(ALL, vs=":master")

base/math.jl Outdated
^(x::Float32, y::Int32) = powi_llvm(x, y)
^(x::Float16, y::Integer) = Float16(Float32(x)^y)
@inline ^(x::Float64, y::Float64) = nan_dom_err(ccall("llvm.pow.f64", llvmcall, Float64, (Float64, Float64), x, y), x + y)
@inline ^(x::Float32, y::Float32) = nan_dom_err(ccall("llvm.pow.f32", llvmcall, Float32, (Float32, Float32), x, y), x + y)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might have missed something, but why now llvm for ^ when both arguments are floats? the libm pow function is accurate.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It resolves to the same function, but give an optimization hint for llvm that we mean that pow, and not just some random function named pow.

@musm
Copy link
Contributor

musm commented Jan 27, 2017

function ldexp{T<:AbstractFloat}(x::T, e::Integer)

Also needs ^ = Base.FastMath.pow_fast for similar reason. Should trigger a regression.

@nanosoldier
Copy link
Collaborator

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @jrevels

@StefanKarpinski
Copy link
Member

It seems like there ought to be more efficient ways to compute x^p precisely for floating-point x and fixed p than calling the general pow function.

@simonbyrne
Copy link
Contributor

Probably not for arbitrary x: pow(x,y) is typically implemented as a slightly-extended precision version of exp(y*log(x)).

The powi intrinsic optimization over calling powf is that it is inaccurate.
We don't need that.

When it is equally accurate (e.g. tiny constant powers),
LLVM will already recognize and optimize any call to a function named `powf`,
and produce the same speedup.

fix #19872
@tkelman
Copy link
Contributor

tkelman commented Feb 22, 2017

@nanosoldier runbenchmarks(ALL, vs=":master")

@nanosoldier
Copy link
Collaborator

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @jrevels

@tkelman
Copy link
Contributor

tkelman commented Feb 23, 2017

so the simple version regressed - is that the Val literal lowering change making inlining or constant prop less predictable?

@vtjnash
Copy link
Member Author

vtjnash commented Feb 23, 2017

Moving the value into the type domain can potentially thwart constant folding. In this case, it's this constant-value function that got broken:

exp_small_thres(::Type{Float64}) = 2.0^-28

@vtjnash
Copy link
Member Author

vtjnash commented Feb 23, 2017

We should really stop dispatching on Type{Val{...}}. Working with types is much harder on inference and the runtime than Val{...}. Constructing the value is effectively free (it's a singleton, so it's already constructed), and it's much more likely to be properly constant folded and dispatched. Plus there are a lot of fast paths in the runtime that can be accessed by a singleton which cannot exist for types (They instead often get handed off to the very slow fallback code).

@tkelman
Copy link
Contributor

tkelman commented Feb 24, 2017

were you waiting for something before running @nanosoldier runbenchmarks(ALL, vs=":master") again?

@JeffBezanson
Copy link
Member

+100 #19890 (comment)

The only real problem is that Val{x}() is a bit long. I believe we could define

@pure Val(x) = Val{x}()

which would allow it to be constant-folded and inferred just as well, so you could write Val(3).

@nanosoldier
Copy link
Collaborator

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @jrevels

@vtjnash
Copy link
Member Author

vtjnash commented Feb 24, 2017

@pure Val(x) = Val{x}()

true. that then just has the problem that calling the Val(x) constructor requires the slow path :P

I'll also just reiterate that the fact there's a number of methods foo(::Type{Val{true}}) is just completely awful. Just call it foo_true() if you're really hate your code reader that much.

@JeffBezanson
Copy link
Member

Agreed; we should not have public APIs with Val{true} and Val{false}. A rare case where a name with an underscore is less ugly.

@vtjnash
Copy link
Member Author

vtjnash commented Feb 24, 2017

~/julia/base$ grep -RI 'Val{true}\|Val{false}' . | wc -l
      68

anyways, enough OT - I think we can go ahead and merge this?

@JeffBezanson
Copy link
Member

Yes, seems like it's time.

@stevengj
Copy link
Member

The constant-folding of powers is fixed by #20783, following the suggestion to use Val{p}().

@vtjnash vtjnash merged commit ddd40d9 into master Feb 24, 2017
@vtjnash vtjnash deleted the jn/19872 branch February 24, 2017 16:57
@eval exponent_max(::Type{$T}) = $(Int(exponent_mask(T) >> significand_bits(T)) - exponent_bias(T))
# maximum float exponent without bias
@eval exponent_raw_max(::Type{$T}) = $(Int(exponent_mask(T) >> significand_bits(T)))
end
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vtjnash Out of curiosity, what was the reason behind moving these functions from float.jl to math.jl? I had some code that imported Base.significand_bits, and now it needs Base.Math.significand_bits to work on master.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They were incorrectly marked @pure, and only used here. You can import them from Base.Math on old code also, to be compatible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
maths Mathematical functions
Projects
None yet
Development

Successfully merging this pull request may close these issues.

^(::Float64, ::Integer) incorrect subnormal results