Trap floating point exceptions #27705

antoine-levitt · 2018-06-21T02:25:16Z

Some languages and compilers allow trapping of floating point exceptions, e.g. gfortran -ffpe-trap https://gcc.gnu.org/onlinedocs/gfortran/Debugging-Options.html

Is it possible to have a similar functionality in julia? That would be very useful to debug a NaN or Inf suddenly appearing in a program.

#6170 looks related

c42f · 2018-08-16T05:12:21Z

I'd say it would be relatively easy to get this working on linux as there's the feenableexcept() function which we can use to change the floating point exception mask. This should generate SIGFPE which we can turn into an exception using the same machinery as DivideError.

It will require minor changes to the runtime (see, eg, https://github.com/JuliaLang/julia/blob/master/src/signals-unix.c#L743 ) so that the SIGFPE is turned into something other than a DivideError.

As to the correct julia API for calling feenableexcept, perhaps we'd want a context manager style approach there

with_fpe(FE_OVERFLOW) do
   some_code_generating_nans()
end

See also #5234 for somewhat related discussion.

c42f · 2018-08-16T22:12:27Z

As it turns out, we can do the following (at least on linux x86_64 with julia >= 0.6) without changing the runtime:

# Bits for x86 FPU control word
const FE_INVALID    = 0x1
const FE_DIVBYZERO  = 0x4
const FE_OVERFLOW   = 0x8
const FE_UNDERFLOW  = 0x10
const FE_INEXACT    = 0x20

fpexceptions() = ccall(:fegetexcept, Cint, ())

function setfpexceptions(f, mode)
    prev = ccall(:feenableexcept, Cint, (Cint,), mode)
    try
        f()
    finally
        ccall(:fedisableexcept, Cint, (Cint,), mode & ~prev)
    end
end

[edit: fixed some brokenness]

Thence,

julia> x = 0.0
0.0

julia> 1.0/x
Inf

julia> setfpexceptions(FE_DIVBYZERO) do
           1.0/x 
       end
ERROR: DivideError: integer division error
Stacktrace:
 [1] /(::Float64, ::Float64) at ./float.jl:0
 [2] setfpexceptions(::##1#2, ::UInt8) at /home/tcfoster/sigfpe.jl:13

Unfortunately the system throws an integer division error, but at least you get a backtrace.

StefanKarpinski · 2018-08-17T17:13:29Z

Seems like a good thing to have official support for and throw the right exception.

c42f · 2018-08-18T12:01:35Z

Yep. I wonder how these are best mapped to exceptions. The IEEE 754 standard defines five standard exceptions. We could just map these to our existing exceptions where possible:

Invalid operation -> DomainError? (normally gives qNaN instead)
Division by zero -> DivideError (currently we document that this is for integer division by zero. But I'm not sure there's a reason to distinguish 1/0 from 1.0/0.0?)
Overflow -> OverflowError
Underflow -> A new UnderflowError?
Inexact -> InexactError? Difficult to make useful, as any inexact floating point operations in the runtime will cause a trap if this is enabled which seems to cause LLVM to go boom. Probably among other things.

Alternatively we could just define a new FloatingPointError(reason) and map them all to that? That might be simpler and more useful as this is more likely to be a debugging tool than anything else.

antoine-levitt · 2018-08-22T13:13:31Z

Alternatively we could just define a new FloatingPointError(reason) and map them all to that? That might be simpler and more useful as this is more likely to be a debugging tool than anything else.

That seems like the best option since, as you say, trapping FPE is basically a debugging tool, and it would be annoying to have them caught by code that is not expecting them. Maybe an abstract HardwareFPException with more specific exceptions inheriting from it?

c42f · 2018-08-24T09:56:00Z

HardwareFPException with more specific exceptions inheriting from it

That would also work and is easy to implement. On balance I'm inclined to have a single type for simplicity. Given it's probably a debugging tool, and that we don't catch exceptions by type in any case.

c42f · 2018-10-24T23:54:50Z

I just discovered significant prior discussion related to these issues, particularly at

#2976
#5234 (comment)

@simonbyrne are you still interested in thinking about floating point exceptions? This issue is slightly different from the previous ones, in that it asks whether we should have a way to turn SIG_FPEs into julia exceptions immediately via the signal handler. That should be fairly easy, but I'm not completely sure about the correct API. Currently I think it should be a debugging tool only, perhaps emitting a single FloatingPointException type with internal error code.

I do think the prior discussion (eg, #5234 (comment)) shows that using dynamically scoped FPE masks leads to inherently non-composable code, and should not be used for "real work". This is also my experience in trying to turn on Inexact -> InexactError, which breaks the assumptions of pretty much every piece of code which ever dreams of using a floating point number. In my opinion only a statically scoped solution (applying to floating point operations strictly within the current function) would allow floating point exception flags to be used in a composable way for real production use. But I think that would be a separate issue, and much more difficult to implement.

simonbyrne · 2018-10-25T01:11:39Z

The other issue is that LLVM isn't aware exceptions, so may reorder operations or propagate constants in a way so that exceptions aren't triggered. The situation has changed somewhat with the addition of LLVM constrained intrinsics, but we need to figure out how to integrate them.

My current thinking is that floating point exceptions and rounding should be done using Cassette.jl, as this would let you overload the necessary intrinsics and allow users to add custom hooks.

c42f · 2018-10-25T01:46:46Z

That's interesting, thanks. I figured having a solid general solution for FPEs would require some fairly deep integration with the compiler.

Would that subsume the feature request in this issue (ie, the ability to do simple fail-fast SIGFPE trapping for debugging)? To me these seem like they might be somewhat orthogonal features.

simonbyrne · 2018-10-25T04:32:36Z

My comment was specifically referring to your concerns about the dynamic scoping, but you're right they are somewhat orthogonal.

I actually did try this out on a branch 4 years ago, and I was surprised how well it worked given my scant knowledge of C, but there were a few issues that would need to be figured out.

c42f · 2018-10-25T05:02:52Z

Hah, I had an extremely similar branch with the following relevant commit: adeaa4b

Enabling and disabling the FPE processor flags seemed pretty ugly and system dependent when I looked into it.

c42f · 2018-10-27T02:32:05Z

So, if we were going to implement a version of this for debugging purposes, how about the following concrete and minimal proposal:

Add a single concrete type FloatingPointException(code)
Add a function setfpe(code1 | code2 | ...) which enables floating point exceptions with the given bitmask codes, and returns the previously set fpe bits.
Do not supply dynamic scoping (ie, avoid a with_fpe style interface), so as to not hide the global fpe state that this manipulates, nor pretend that it is composable.

antoine-levitt · 2018-10-27T07:30:48Z

The interface is a bit low-level; adding exception types for every exception would allow for an interface like setfpe(FloatOverFlowError, FloatUnderFlowError) which feels cleaner. But then you wouldn't be able to do setfpe(setfpe() | code), and this is pretty low-level anyway, so your proposal looks good! It would also be useful to add a FPE_ALL_BUT_INEXACT code, which is the one likely to be used in practice.

c42f · 2018-10-27T10:04:41Z

Yes, I'm not sure about the bitmasks. But I think you want setfpe to be able to return the current flags in some form so that if really necessary you can simulate dynamic scoping with

old_fpes = setfpe(new_fpes)
some_code_to_be_debugged()
setfpe(old_fpes)

and this seems like the simplest way to achieve it with the least number of new functions and types. I guess setfpe could also in principle take a Function so that it works with the do syntax

setfpe(new_fpes) do
    some_code_to_be_debugged()
end

though I'm not sure we should encourage that!

simonbyrne · 2018-10-27T19:30:56Z

Given all the trouble we had with setrounding (#27166), I would suggest we put the minimal necessary internal changes in Julia (basically, figuring out which error is triggered), and everything else in a package.

c42f · 2018-10-28T06:38:47Z

Ok, so the single exception type and support for recognizing SIGFPE is the minimal possible change, though testing this properly will also require setfpe (or equivalent) so I think that's also required. I won't do the version that takes the Function... that's just asking for a recurrence of the issues which led to setrounding's demise :-)

johnomotani · 2021-10-24T17:55:11Z

This issue has been quiet for a long time - but this would be a very useful debugging tool! Just arrived here searching for the ability to use SIGFPE...

StefanKarpinski · 2021-11-05T16:16:39Z

One thing that may make setfpe more tractable than setrounding is that it seems naively more reasonable to ask for FPE globally, whereas with rounding, you'll always want to switch back to other rounding modes in a dynamically scoped way in order to compute things like transcendental functions.

mvsoom · 2022-10-09T14:37:19Z

This would be a killer feature. Especially in the age of machine learning.

chriselrod · 2023-03-14T03:16:05Z

As it turns out, we can do the following (at least on linux x86_64 with julia >= 0.6) without changing the runtime:

This is really cool.

I'm imagining defining a debug mode where all Array{T}(undef, sz...) where T<:Union{Float32,Float64}s and other operations fill with signaling NaNs by default, to capture accidental uses of uninitialized memory.
I'd also extend it to arrays of aggregates in the obvious way (duals with values are partials all being sNaNs).

sNaN demo, first a normal qNaN and then the sNaN:

julia> x = NaN
NaN

julia> setfpexceptions(FE_INVALID) do
           2.0*x
       end
NaN

julia> x = reinterpret(Float64,8189<<50)
NaN

julia> setfpexceptions(FE_INVALID) do
           2.0*x
       end
ERROR: DivideError: integer division error
Stacktrace:
 [1] *(x::Float64, y::Float64)
   @ Base ./float.jl:410
 [2] (::var"#19#20")()
   @ Main ./REPL[43]:2
 [3] setfpexceptions(f::var"#19#20", mode::UInt8)
   @ Main ./REPL[18]:4
 [4] top-level scope
   @ REPL[43]:1

Another fun use case is to use Float64 for your exact integer math, while supporting SIMD and efficiently checking for "overflow".

julia> x = Float64.(1:256);

julia> function mysum(x) # simd
           s = zero(eltype(x))
           for i ∈ eachindex(x)
               @fastmath s += x[i]
           end
           s
       end
mysum (generic function with 1 method)

julia> setfpexceptions(FE_INEXACT) do
           mysum(x)
       end
32896.0

julia> x[5] = 1e18 # too big for exact
1.0e18

julia> x[101] = -1e18 # cancels
-1.0e18

julia> mysum(x) # intermediate rounding inside SIMD code
32812.0

julia> Float64(mysum(big.(x)))
32790.0

julia> setfpexceptions(FE_INEXACT) do
           mysum(x)
       end
ERROR: DivideError: integer division error
Stacktrace:
 [1] add_fast
   @ ./fastmath.jl:172 [inlined]
 [2] mysum(x::Vector{Float64})
   @ Main ./REPL[59]:4
 [3] (::var"#35#36")()
   @ Main ./REPL[69]:2
 [4] setfpexceptions(f::var"#35#36", mode::UInt8)
   @ Main ./REPL[18]:4
 [5] top-level scope
   @ REPL[69]:1

In theory, you could try/catch.
If it fails, you could try again with BigInt.
Of if you're doing something like running a performance optimization pass, you could simply bail out without transforming anything to save on compile time.

chriselrod · 2023-03-16T16:48:02Z

Unfortunately, this doesn't seem to work on my M1/ARM Linux.

seleneonowe · 2024-08-04T14:18:55Z

Unfortunately, this doesn't seem to work on my M1/ARM Linux.

For anyone reading this thread looking for a temporary solution while there remains no supported way of trapping these exceptions, if you want to replicate c42f's code snippet from earlier in this thread on other architectures, you can modify it thus:

if Sys.ARCH == :x86_64
    const FE_INVALID  = 0x1
    const FE_DIVBYZERO  = 0x4
    const FE_OVERFLOW   = 0x8
    const FE_UNDERFLOW  = 0x10
    const FE_INEXACT  = 0x20
elseif Sys.ARCH == :aarch64
    const FE_INVALID  = 0x1
    const FE_DIVBYZERO  = 0x2
    const FE_OVERFLOW   = 0x4
    const FE_UNDERFLOW  = 0x8
    const FE_INEXACT  = 0x10
else
    error("You need to look up the corresponding values for FE exceptions in your architecture, which is: $(Sys.ARCH)")
end

fpexceptions() = ccall(:fegetexcept, Cint, ())

function setfpexceptions(f, modes...)
    mode = foldl(|, modes)
    prev = ccall(:feenableexcept, Cint, (Cint,), mode)
    try
        f()
    finally
        ccall(:fedisableexcept, Cint, (Cint,), mode & ~prev)
    end
end

Also made a modification to it so you can set multiple flags in the arguments:

setfpexceptions(FE_DIVBYZERO, FE_INVALID, FE_OVERFLOW) do
    # your code here
end

Tested on Julia 1.10.4 on two linux machines, one with x86 and the other M1 arm architecture.

antoine-levitt mentioned this issue Jun 16, 2021

[WIP] Autodiff stresses JuliaMolSim/DFTK.jl#443

Closed

2 tasks

mcabbott mentioned this issue Jan 17, 2022

NaN gradients for sqrt FluxML/Zygote.jl#1101

Open

brenhinkeller added the error handling Handling of exceptions by Julia or the user label Nov 21, 2022

simonbyrne linked a pull request Dec 19, 2022 that will close this issue

Add support for floating point exceptions #47930

Draft

simonbyrne mentioned this issue Sep 12, 2023

add floating point environment to task state #51277

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Trap floating point exceptions #27705

Trap floating point exceptions #27705

antoine-levitt commented Jun 21, 2018

c42f commented Aug 16, 2018

c42f commented Aug 16, 2018 •

edited

Loading

StefanKarpinski commented Aug 17, 2018

c42f commented Aug 18, 2018

antoine-levitt commented Aug 22, 2018

c42f commented Aug 24, 2018

c42f commented Oct 24, 2018

simonbyrne commented Oct 25, 2018

c42f commented Oct 25, 2018 •

edited

Loading

simonbyrne commented Oct 25, 2018

c42f commented Oct 25, 2018

c42f commented Oct 27, 2018

antoine-levitt commented Oct 27, 2018

c42f commented Oct 27, 2018

simonbyrne commented Oct 27, 2018

c42f commented Oct 28, 2018

johnomotani commented Oct 24, 2021

StefanKarpinski commented Nov 5, 2021

mvsoom commented Oct 9, 2022

chriselrod commented Mar 14, 2023

chriselrod commented Mar 16, 2023

seleneonowe commented Aug 4, 2024

Trap floating point exceptions #27705

Trap floating point exceptions #27705

Comments

antoine-levitt commented Jun 21, 2018

c42f commented Aug 16, 2018

c42f commented Aug 16, 2018 • edited Loading

StefanKarpinski commented Aug 17, 2018

c42f commented Aug 18, 2018

antoine-levitt commented Aug 22, 2018

c42f commented Aug 24, 2018

c42f commented Oct 24, 2018

simonbyrne commented Oct 25, 2018

c42f commented Oct 25, 2018 • edited Loading

simonbyrne commented Oct 25, 2018

c42f commented Oct 25, 2018

c42f commented Oct 27, 2018

antoine-levitt commented Oct 27, 2018

c42f commented Oct 27, 2018

simonbyrne commented Oct 27, 2018

c42f commented Oct 28, 2018

johnomotani commented Oct 24, 2021

StefanKarpinski commented Nov 5, 2021

mvsoom commented Oct 9, 2022

chriselrod commented Mar 14, 2023

chriselrod commented Mar 16, 2023

seleneonowe commented Aug 4, 2024

c42f commented Aug 16, 2018 •

edited

Loading

c42f commented Oct 25, 2018 •

edited

Loading