Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed floating point exception in adiabatic conversion initialize! #569

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

seleneonowe
Copy link

Description

I was running some simulations where I saw the NaR warning during runtime. To debug exactly where this first appears, I switched on trapping for floating point exceptions.

In this calculation in adiabatic_conversion.jl:

    σ_lnp_B .= 1 .- σ_levels_half[1:end-1]./σ_levels_thick .*
                    log.(σ_levels_half[2:end]./σ_levels_half[1:end-1])
    σ_lnp_B[1] = σ_levels_half[1] <= 0 ? log(2) : σ_lnp_B[1]    # set α₁ = log(2), eq. 3.19

if σ_levels_half[1] is less than or equal to zero, then we would have a division by zero, or invalid arguments to log.

This doesn't normally cause any bugs, because this simply returns NaN in the first element and in any of those cases the next line immediately throws away the NaN and swaps it for log(2).

However, if we are trapping all floating point exceptions for debug purposes, the first expression would always raise e.g. FE_DIVBYZERO where we divide by zero, and FE_INVALID where we take the logarithm of a negative number; even though any problems caused by these exceptions are already being treated in the next line.

This means floating point exceptions caused by genuine bugs in model initialization that happen after AdiabaticConversion is initialized will be masked by this benign error. It is better if these lines are reordered so that no invalid operations are intentionally performed.

To reproduce

In the REPL, run the following :

using SpeedyWeather
begin
    if Sys.ARCH == :x86_64
        const FE_INVALID  = 0x1
        const FE_DIVBYZERO  = 0x4
        const FE_OVERFLOW   = 0x8
        const FE_UNDERFLOW  = 0x10
        const FE_INEXACT  = 0x20
    elseif Sys.ARCH == :aarch64
        const FE_INVALID  = 0x1
        const FE_DIVBYZERO  = 0x2
        const FE_OVERFLOW   = 0x4
        const FE_UNDERFLOW  = 0x8
        const FE_INEXACT  = 0x10
     else
        # let me know if you use some other architecture.
        error("Unsupported architecture: $(Sys.ARCH)")
end
    
    fpexceptions() = ccall(:fegetexcept, Cint, ())

    function setfpexceptions(f, modes...)
        mode = foldl(|, modes)
        prev = ccall(:feenableexcept, Cint, (Cint,), mode)
        try
            f()
        finally
            ccall(:fedisableexcept, Cint, (Cint,), mode & ~prev)
        end
    end
end
spectral_grid = SpectralGrid(trunc=41, nlev=8)
model = PrimitiveWetModel(; spectral_grid)
setfpexceptions(FE_DIVBYZERO, FE_INVALID) do
           simulation = initialize!(model)
end

This throws an error with a stacktrace pointing to the changed lines.
Expected behaviour would be FE_DIVBYZERO and FE_INVALID never being triggered intentionally where possible (otherwise it becomes very hard to debug occurrences of this happening unintentionally).

Fix

Reorder the calculation so that rather than replacing the NaN after it is generated, it is never generated in the first place.

@milankl
Copy link
Member

milankl commented Sep 5, 2024

Hi @seleneonowe, sorry for late response, I've been travelling/vacationing mostly in August, back at the desk now. Thanks for your pull request, this is very much appreciated. Indeed, this code was initially written to have a NaN intermediately which is then supposed to be replaced with a finite value. Very happy to change that to what you suggest!

I had originally written it this way for clarity, that you only have to write the actual formula once, meaning less error prone...

if σ_levels_half[1] <= 0
σ_lnp_B[1] = log(2) # set α₁ = log(2), eq. 3.19
else
σ_lnp_B[1] *= 1 - σ_levels_half[1] / σ_levels_thick[1] * log(σ_levels_half[2] / σ_levels_half[1])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the *= for? I believe this array is intialized with zeros, so it should be set with =?

@milankl milankl added the precision 🎯 Number formats, rounding, NaNs, Infs, ... label Sep 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
precision 🎯 Number formats, rounding, NaNs, Infs, ...
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants