Fixed floating point exception in adiabatic conversion `initialize!` #569

seleneonowe · 2024-08-04T13:54:58Z

Description

I was running some simulations where I saw the NaR warning during runtime. To debug exactly where this first appears, I switched on trapping for floating point exceptions.

In this calculation in adiabatic_conversion.jl:

    σ_lnp_B .= 1 .- σ_levels_half[1:end-1]./σ_levels_thick .*
                    log.(σ_levels_half[2:end]./σ_levels_half[1:end-1])
    σ_lnp_B[1] = σ_levels_half[1] <= 0 ? log(2) : σ_lnp_B[1]    # set α₁ = log(2), eq. 3.19

if σ_levels_half[1] is less than or equal to zero, then we would have a division by zero, or invalid arguments to log.

This doesn't normally cause any bugs, because this simply returns NaN in the first element and in any of those cases the next line immediately throws away the NaN and swaps it for log(2).

However, if we are trapping all floating point exceptions for debug purposes, the first expression would always raise e.g. FE_DIVBYZERO where we divide by zero, and FE_INVALID where we take the logarithm of a negative number; even though any problems caused by these exceptions are already being treated in the next line.

This means floating point exceptions caused by genuine bugs in model initialization that happen after AdiabaticConversion is initialized will be masked by this benign error. It is better if these lines are reordered so that no invalid operations are intentionally performed.

To reproduce

In the REPL, run the following :

using SpeedyWeather

begin
    if Sys.ARCH == :x86_64
        const FE_INVALID  = 0x1
        const FE_DIVBYZERO  = 0x4
        const FE_OVERFLOW   = 0x8
        const FE_UNDERFLOW  = 0x10
        const FE_INEXACT  = 0x20
    elseif Sys.ARCH == :aarch64
        const FE_INVALID  = 0x1
        const FE_DIVBYZERO  = 0x2
        const FE_OVERFLOW   = 0x4
        const FE_UNDERFLOW  = 0x8
        const FE_INEXACT  = 0x10
     else
        # let me know if you use some other architecture.
        error("Unsupported architecture: $(Sys.ARCH)")
end
    
    fpexceptions() = ccall(:fegetexcept, Cint, ())

    function setfpexceptions(f, modes...)
        mode = foldl(|, modes)
        prev = ccall(:feenableexcept, Cint, (Cint,), mode)
        try
            f()
        finally
            ccall(:fedisableexcept, Cint, (Cint,), mode & ~prev)
        end
    end
end

spectral_grid = SpectralGrid(trunc=41, nlev=8)

model = PrimitiveWetModel(; spectral_grid)

setfpexceptions(FE_DIVBYZERO, FE_INVALID) do
           simulation = initialize!(model)
end

This throws an error with a stacktrace pointing to the changed lines.
Expected behaviour would be FE_DIVBYZERO and FE_INVALID never being triggered intentionally where possible (otherwise it becomes very hard to debug occurrences of this happening unintentionally).

Fix

Reorder the calculation so that rather than replacing the NaN after it is generated, it is never generated in the first place.

…lf sigma value was less than or equal to zero.

milankl · 2024-09-05T12:34:52Z

Hi @seleneonowe, sorry for late response, I've been travelling/vacationing mostly in August, back at the desk now. Thanks for your pull request, this is very much appreciated. Indeed, this code was initially written to have a NaN intermediately which is then supposed to be replaced with a finite value. Very happy to change that to what you suggest!

I had originally written it this way for clarity, that you only have to write the actual formula once, meaning less error prone...

milankl · 2024-09-05T12:38:53Z

src/dynamics/adiabatic_conversion.jl

+    if σ_levels_half[1] <= 0
+        σ_lnp_B[1] = log(2)   # set α₁ = log(2), eq. 3.19
+    else
+        σ_lnp_B[1] *= 1 - σ_levels_half[1] / σ_levels_thick[1] * log(σ_levels_half[2] / σ_levels_half[1])


What's the *= for? I believe this array is intialized with zeros, so it should be set with =?

seleneonowe and others added 2 commits August 4, 2024 11:54

reordered a calculation that otherwise guaranteed FPE if the first ha…

1341a28

…lf sigma value was less than or equal to zero.

Merge branch 'main' into hr/fpe-fixes

b0c9e40

milankl reviewed Sep 5, 2024

View reviewed changes

milankl added the precision 🎯 Number formats, rounding, NaNs, Infs, ... label Sep 5, 2024

milankl added 2 commits September 18, 2024 18:06

Merge branch 'main' into hr/fpe-fixes

9bedf32

Merge branch 'main' into hr/fpe-fixes

0064d3f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixed floating point exception in adiabatic conversion `initialize!` #569

Fixed floating point exception in adiabatic conversion `initialize!` #569

seleneonowe commented Aug 4, 2024

milankl commented Sep 5, 2024 •

edited

Loading

milankl Sep 5, 2024

Fixed floating point exception in adiabatic conversion initialize! #569

Are you sure you want to change the base?

Fixed floating point exception in adiabatic conversion initialize! #569

Conversation

seleneonowe commented Aug 4, 2024

Description

To reproduce

Fix

milankl commented Sep 5, 2024 • edited Loading

milankl Sep 5, 2024

Choose a reason for hiding this comment

Fixed floating point exception in adiabatic conversion `initialize!` #569

Fixed floating point exception in adiabatic conversion `initialize!` #569

milankl commented Sep 5, 2024 •

edited

Loading