Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

indicator for fast FMA #9855

Closed
simonbyrne opened this issue Jan 20, 2015 · 16 comments
Closed

indicator for fast FMA #9855

simonbyrne opened this issue Jan 20, 2015 · 16 comments
Labels
maths Mathematical functions

Comments

@simonbyrne
Copy link
Contributor

Now we have a fma function, it would be useful to have some method for determining whether or not this is more efficient than the naive x*y+z, particularly for use in algorithms that use double-double style arithmetic (C provides the FP_FAST_FMA macro for this purpose).

From #8112 (comment), it seems that the best option is to expose TargetLowering::isFMAFasterThanFMulAndFAdd (presumably in base/sysinfo.jl?)

@ivarne
Copy link
Member

ivarne commented Jan 20, 2015

Isn't this the problem muladd from #9840 is intended to solve?

@simonbyrne
Copy link
Contributor Author

Not quite: they are related but distinct.

  • The purpose muladd is to compute x*y+z in the fastest way possible (e.g. in a Horner evaluation scheme for a polynomial).
  • The purpose of this is to determine whether fma is fast (i.e. in hardware), or slow (i.e. in software) in which case you might want to use a different approach altogether. An example is here: if you have a hardware fma, it can reduce to 5 operations, otherwise you need 12 (using a software fma would most likely be even slower).

@simonbyrne
Copy link
Contributor Author

As a rough point of reference, a software fma is about 10x slower than a (non-fused) multiply-add on my computer.

@nalimilan
Copy link
Member

Couldn't a function be designed so that it would automatically use the fastest solution? Maybe I'm being naive though.

@simonbyrne
Copy link
Contributor Author

@nalimilan I'm not sure what you mean: do you mean something like specifying two code paths and letting the compiler pick the one it likes the most?

@nalimilan
Copy link
Member

@simonbyrne No, I wonder whether u+muladd(fma(-u,f,2(f-u)),g,q) couldn't be automatically translated to the detailed steps you show here if fma is known to be slow. But I guess it's more complex than that.

@simonbyrne
Copy link
Contributor Author

Ah I see: you could do that, but it would probably be around twice as slow as the alternative: a decent software fma (e.g. openlibm) does a few extra operations to avoid things like overflow and double rounding, which are not needed in this particular case.

@eschnett
Copy link
Contributor

Another point: fma is guaranteed to avoid rounding the intermediate result, which improves accuracy, and this is what allows alternative algorithms to be used. muladd, on the other hand, may be fast, but may still round the intermediate result. For example, ARM has both vmla and fma instructions -- the former rounds the intermediate result, the latter doesn't.

@JeffreySarnoff
Copy link
Contributor

When using double-double algorithms for extended precision math or to get Float64 results from Float32 friendly GPUs, fma is essential (it is possible to do without -- but everything takes much, much longer and gets more complicated). It is that the fma only-rounds-once that matters for this.

After trying it both ways, I am using fma everywhere possible -- whether or not the fma is slow (software emulated); the alternative is backwards facing and confounding for careful numerics.

@musm
Copy link
Contributor

musm commented Sep 20, 2016

how can TargetLowering::isFMAFasterThanFMulAndFAdd be exposed, its a cpp fun?

@eschnett
Copy link
Contributor

The canonical way would be to either introduce an intrinsic function that returns a Bool, or to provide a C wrapper that can be called via ccall. Both require changes to Julia's core.

@simonbyrne
Copy link
Contributor Author

There is currently an attempt to do this in the code via comparing muladd to fma, however it doesn't seem to work:

const FMA_NATIVE = muladd(nextfloat(1.0),nextfloat(1.0),-nextfloat(1.0,2)) == -4.930380657631324e-32

as on my machine (which does have FMA) I get:

julia> Base.Math.FMA_NATIVE
false

@JeffreySarnoff
Copy link
Contributor

The sign is wrong, try this:

const FMA_NATIVE = muladd(nextfloat(1.0),nextfloat(1.0),-nextfloat(1.0,2)) == 4.930380657631324e-32

@simonbyrne
Copy link
Contributor Author

🤦‍♂

@simonbyrne
Copy link
Contributor Author

(fixed in #32318)

@simonbyrne
Copy link
Contributor Author

I'll close this for now, move all discussion to #33011.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
maths Mathematical functions
Projects
None yet
Development

No branches or pull requests

7 participants