Consider FMA #5

jfbastien · 2017-04-25T04:43:17Z

Fused multiply-add is being discussed in WebAssembly/simd#8. Let's move the discussion here so we don't drop it.

I'd consider FMA after the first shot at SIMD because, similar to #4, it doesn't have universal platform support. Specific benchmarks may benefit from its presence. As with other ops, maybe we should consider adding support for scalar and vector FMA at the same time.

FMA is part of IEEE 754-2008, but isn't universally supported and can't be polyfilled efficiently. Maybe implementations should simply not advertise FMA if the hardware doesn't support it, forcing users to feature-detect, as opposed to offering non-fused fallback which yield different numerical results.

FMA offers multiply+add with one less rounding, sometimes better runtime performance (sometimes because the hardware itself is magical, it avoids register issues, has its own execution unit, etc), and may have better instruction-cache impact.

It's supported in languages like C or C++ through opt-in with pragma fp_contract, explicit math.h fma function and friends, or fast math flags. I would disallow such optimizations by WebAssembly compilers, only the developer-side compiler should be allowed to emit that opcode.

It's supported in recent x86, and all ARMv8.

The text was updated successfully, but these errors were encountered:

Maratyszcza · 2017-11-05T07:18:05Z

It would help performance to just have a multiply-add instruction, which could be optionally implemented with FMA

lemaitre · 2018-04-18T15:01:27Z

I think 3 operations are actually needed:

addmul(a, b, c) -> implement exactly a + (b * c) with intermediate rounding
fusedaddmul(a, b, c) -> implement exactly a + b * c without any intermediate rounding
fastaddmul(a, b, c) -> picks the fastest between addmul and fusedaddmul

The first operation is actually not needed at all: this can be easily detect by the VM as a sequence of a mul and add. To be noted that some hardware have special instructions for this very case.

I think here it would be really important to detect the presence of hadware instruction for FMA.

gnzlbg · 2019-03-02T15:02:02Z

Performing the operation with intermediate rounding can already be expressed using a + b * c. In my experience, it is rare for applications to actually require an FMA without any intermediate rounding, particularly when that might be much slower in practice. If there is a strong need for this, we can always add this later.

What most apps want is "perform a + b * c as fast as possible", so maybe we should start by adding that.

munrocket · 2019-12-09T23:34:34Z

Here my example: without FMA error-free Dekker’s multiplication algorithm requires 17 floating-point operations. But if it will be available in WASM, it will be required only 2 FLOP. [1] This algorithm probably have a biggest performance boost because it requires precision. Theoretically FMA can gives 17/2 * 100% - 100% = +750% to this algo, but here you can find real benchmark. You can find another applications and papers where it is usefull.

References:

Mioara Joldes, Jean-Michel Muller, Valentina Popescu. Tight and rigourous error bounds for basic building blocks of double-word arithmetic, 2017
Sylvie Boldo, Guillaume Melquiond. Emulation of a FMA and correctly-rounded sums:
proved algorithms using rounding to odd, 2010
Sylvie Boldo, Guillaume Melquiond. When double rounding is odd. 2005

ngzhian · 2021-03-18T19:57:42Z

WebAssembly/simd#79 was an initial proposal for qfma, linking to here here since we can't transfer PRs.

jrus · 2022-05-16T00:00:18Z

it is rare for applications to actually require an FMA without any intermediate rounding [...] What most apps want is "perform a + b * c as fast as possible"

There are a whole bunch of fundamental computational building blocks where people care about the precise bits of a floating point number, and careful error analysis has been done to make guarantees about maximum errors, where using FMA greatly simplifies and speeds up the algorithm but naively performing "a + b * c" with intermediate rounding totally wrecks the result and a more complicated algorithm must be used instead if the mul and add operations are separate.

These are especially common in computational geometry where slight rounding errors can break invariants that cause higher-level algorithms to give hilariously incorrect results, go into an infinite loop, or crash. For context see https://www.cs.cmu.edu/~quake/robust.html or for a list of relevant papers, try https://scholar.google.com/scholar?q=geometric+predicate+FMA – as munrocket points out correct double-double arithmetic is another (closely related) example.

Likewise, substituting FMA for a + b * c will often wreck computations which expect (rely on) intermediate rounding. Opting people into an FMA they didn’t explicitly ask for will break their code.

FMA is widely supported by hardware. It would be great to have some syntax/function for an explicit FMA in wasm (and every other programming language). In contexts where it is unavailable the code author can write a separate fall-back algorithm.

ngzhian transferred this issue from WebAssembly/simd Mar 17, 2021

ngzhian added the instruction-proposal label Mar 19, 2021

ngzhian added the in-overview Instruction has been added to Overview.md label Feb 18, 2022

penzn mentioned this issue Oct 14, 2022

Poll on list or set determinism and deterministic FMA #92

Closed

BrushXue mentioned this issue Aug 6, 2023

Tutorial case backwardFacingStep2D hangs forever after compiling with Xcode CLT >= 14.3 BrushXue/OpenFOAM-AppleM1#8

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consider FMA #5

Consider FMA #5

jfbastien commented Apr 25, 2017

Maratyszcza commented Nov 5, 2017

lemaitre commented Apr 18, 2018 •

edited

Loading

gnzlbg commented Mar 2, 2019

munrocket commented Dec 9, 2019 •

edited

Loading

ngzhian commented Mar 18, 2021

jrus commented May 16, 2022 •

edited

Loading

Consider FMA #5

Consider FMA #5

Comments

jfbastien commented Apr 25, 2017

Maratyszcza commented Nov 5, 2017

lemaitre commented Apr 18, 2018 • edited Loading

gnzlbg commented Mar 2, 2019

munrocket commented Dec 9, 2019 • edited Loading

ngzhian commented Mar 18, 2021

jrus commented May 16, 2022 • edited Loading

lemaitre commented Apr 18, 2018 •

edited

Loading

munrocket commented Dec 9, 2019 •

edited

Loading

jrus commented May 16, 2022 •

edited

Loading