Non-standard function throws an error when used in a `KernelFunctionOperation` on the GPU #3438

tomchor · 2024-01-22T17:40:05Z

I'd like to use the Lambert W function in a KernelFunctionOperation but it doesn't seem to work on the GPU. Here's a MWE that works on the CPU:

using Oceananigans
using Oceananigans.Grids: xnode, ynode
using CUDA: has_cuda_gpu
using LambertW: lambertw

arch = has_cuda_gpu() ? GPU() : CPU()
grid = RectilinearGrid(arch, size = (4, 4, 4), extent = (1,1,1))

@inline W(x, y) = lambertw((y/x)^2)
@inline W(i, j, k, grid) = W(xnode(i, grid, Center()), ynode(j, grid, Center()))
op = KernelFunctionOperation{Center, Center, Center}(W, grid)
compute!(Field(op))

When running on a GPU this throws a huge error message, that you can check in full here, but here are the first few lines:

ERROR: LoadError: InvalidIRError: compiling MethodInstance for Oceananigans.AbstractOperations.gpu__compute!(::KernelAbstractions.CompilerMetadata{KernelAbstractions.NDIteration.StaticSize{(4, 4, 4)}, KernelAbstractions.NDIteration.DynamicCheck, Nothing, Nothing, KernelAbstractions.NDIteration.NDRange{3, KernelAbstractions.NDIteration.StaticSize{(1, 1, 4)}, KernelAbstractions.NDIteration.StaticSize{(16, 16, 1)}, Nothing, Nothing}}, ::OffsetArrays.OffsetArray{Float64, 3, CUDA.CuDeviceArray{Float64, 3, 1}}, ::KernelFunctionOperation{Center, Center, Center, RectilinearGrid{Float64, Periodic, Periodic, Bounded, Float64, Float64, Float64, OffsetArrays.OffsetVector{Float64, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}}, OffsetArrays.OffsetVector{Float64, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}}, OffsetArrays.OffsetVector{Float64, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}}, Nothing}, Float64, typeof(W), Tuple{}}, ::Tuple{Colon, Colon, Colon}) resulted in invalid LLVM IR
Reason: unsupported call to an unknown function (call to julia.get_pgcstack)
Stacktrace:
 [1] multiple call sites
   @ unknown:0
Reason: unsupported call through a literal pointer (call to jl_gc_run_pending_finalizers)
Stacktrace:
 [1] enable_finalizers
   @ ./gcutils.jl:126
 [2] _trylock
   @ ./lock.jl:133
 [3] multiple call sites
   @ unknown:0
Reason: unsupported dynamic function invocation (call to kwcall(::Any, ::typeof(sprint), f::Function, args...) @ Base strings/io.jl:107)

I imagine lambertw isn't implemented by CUDA, but that doesn't seem to be what the error says, so I wonder if there's something else going on here. I also saw that ClimaAtmos seems to be one of the packages that use LambertW.jl, so maybe the Clima people have encountered (and hopefully solved) this before?

The text was updated successfully, but these errors were encountered:

glwagner · 2024-01-22T23:01:44Z

I suggest starting a thread on LambertW.jl and ask whether the code is GPU compatible. It will probably be more straightforward to test this first using CUDA.jl or KernelAbstractions.jl and then, if that works, get it working in Oceananigans.

glwagner · 2024-01-22T23:04:19Z

Also you will never get an error of the form "X function is not implemented by CUDA". The way GPU compilation works, functions with known CUDA analogues are converted appropriately at compile time. The rest are attempted to compile directly. The error you're getting probably means that lambertw cannot be completely type-inferred and compiled. For example the error involves a function invoked from strings/io.jl --- suggesting that lambertw may be trying to print something from your GPU kernel...

glwagner · 2024-01-22T23:08:43Z

You might be hitting this warning which I don't think is valid to include in a GPU kernel:

https://github.com/JuliaMath/LambertW.jl/blob/6b6e2c81d920c4791f1c5204acf2eded84d8922d/src/LambertW.jl#L189

glwagner · 2024-01-22T23:11:16Z

Probably the easiest thing to do is to fork LambertW.jl and remove that warning. The rest seems ok, though a max iterations of 1000 seems a bit high if you want performance. It depends what you want, but as a hack you can return a NaN upon non-convergence rather than throwing a warning.

glwagner · 2024-01-23T02:16:45Z

Do you get a warning during CPU execution ?

tomchor · 2024-01-23T02:19:44Z

Do you get a warning during CPU execution ?

Nope. Everything seems to run pretty smoothly:

julia> using Oceananigans

julia> using Oceananigans.Grids: xnode, ynode

julia> using CUDA: has_cuda_gpu

julia> using LambertW: lambertw

julia> arch = has_cuda_gpu() ? GPU() : CPU()
CPU()

julia> grid = RectilinearGrid(arch, size = (4, 4, 4), extent = (1,1,1))
4×4×4 RectilinearGrid{Float64, Periodic, Periodic, Bounded} on CPU with 3×3×3 halo
├── Periodic x ∈ [0.0, 1.0)  regularly spaced with Δx=0.25
├── Periodic y ∈ [0.0, 1.0)  regularly spaced with Δy=0.25
└── Bounded  z ∈ [-1.0, 0.0] regularly spaced with Δz=0.25

julia> @inline W(x, y) = lambertw((y/x)^2)
W (generic function with 1 method)

julia> @inline W(i, j, k, grid) = W(xnode(i, grid, Center()), ynode(j, grid, Center()))
W (generic function with 2 methods)

julia> op = KernelFunctionOperation{Center, Center, Center}(W, grid)
KernelFunctionOperation at (Center, Center, Center)
├── grid: 4×4×4 RectilinearGrid{Float64, Periodic, Periodic, Bounded} on CPU with 3×3×3 halo
├── kernel_function: W (generic function with 2 methods)
└── arguments: ()

julia> compute!(Field(op))
4×4×4 Field{Center, Center, Center} on RectilinearGrid on CPU
├── grid: 4×4×4 RectilinearGrid{Float64, Periodic, Periodic, Bounded} on CPU with 3×3×3 halo
├── boundary conditions: FieldBoundaryConditions
│   └── west: Periodic, east: Periodic, south: Periodic, north: Periodic, bottom: ZeroFlux, top: ZeroFlux, immersed: ZeroFlux
├── operand: KernelFunctionOperation at (Center, Center, Center)
├── status: time=0.0
└── data: 10×10×10 OffsetArray(::Array{Float64, 3}, -2:7, -2:7, -2:7) with eltype Float64 with indices -2:7×-2:7×-2:7
    └── max=2.84593, min=0.020004, mean=0.833143

jlapeyre · 2024-01-25T19:21:51Z

EDIT: This has nothing to do with Oceananigans.jl per se. Better pursued on LambertW.jl

I agree with #3438 (comment) .

This part of the stack trace suggests that it is the @warn that is causing the problem. It should be possible to remove that somehow.

Better would be to remove the @warn entirely and instead return the result along with info on the convergence. And maybe convenience interface for people who want to ignore it. That's a more robust interface for other reasons as well.

I don't know anything about running on GPUs. Does @warn cause failure if it is anywhere in the package being compiled? or anywhere in the function being called? Or does execution have to hit the @warn so that io is attempted at run time?

EDIT: I missed this above:

Do you get a warning during CPU execution ?

Nope. Everything seems to run pretty smoothly:

So it seems that execution does not have to hit the @warn for the reported failure. Also the stack trace indicates that the error happens when the macro is expanded.

EDIT: so the following comment may be relevant, but perhaps not

It would be nice if there were a way to redirect io or send it to dev null or otherwise disable everywhere when running on a GPU.

Reason: unsupported call to an unknown function (call to jl_f__call_latest)
Stacktrace:
  [1] #invokelatest#2
    @ ./essentials.jl:816
  [2] invokelatest
    @ ./essentials.jl:813
  [3] macro expansion
    @ ./logging.jl:381
  [4] lambertw_root_finding
    @ /glade/work/tomasc/.julia/packages/LambertW/tom9a/src/LambertW.jl:188
  [5] lambertw_branch_zero
    @ /glade/work/tomasc/.julia/packages/LambertW/tom9a/src/LambertW.jl:117
  [6] _lambertw
    @ /glade/work/tomasc/.julia/packages/LambertW/tom9a/src/LambertW.jl:93
  [7] lambertw (repeats 2 times)
    @ /glade/work/tomasc/.julia/packages/LambertW/tom9a/src/LambertW.jl:73
  [8] W
    @ /glade/derecho/scratch/tomasc/twake4/headland_simulations/mwe.jl:9

glwagner · 2024-01-25T19:39:05Z

I tested it and can confirm if the line with @warn is commented out, the code runs without erroring.

tomchor · 2024-01-26T00:52:48Z

Thanks @glwagner. Sorry I didn't test before but I assumed that since we never reached that warning it couldn't cause problems.

I'll close this issue since it's clearly out of the scope for Oceananigans

glwagner · 2024-01-26T05:41:40Z

So it seems that execution does not have to hit the @warn for the reported failure. Also the stack trace indicates that the error happens when the macro is expanded.

Precisely.

And maybe convenience interface for people who want to ignore it. That's a more robust interface for other reasons as well.

Perhaps a positional argument to lambertw, either boolean or perhaps even better a type (to use multiple dispatch) to control behavior, things like

WarnFailures() (throw warning for failures)
MarkFailures(value=NaN) (mark failures with a specific value, do not throw warning)
IgnoreFailures() ?
WithSolverInfo() (return a type that contains the root, boolean converged, and possibly also number of iterations)

It would be nice if there were a way to redirect io or send it to dev null or otherwise disable everywhere when running on a GPU.

It is interesting to consider auto-sanitization of GPU code...

tomchor mentioned this issue Jan 24, 2024

Cannot use lambertw on the GPU JuliaMath/LambertW.jl#30

Closed

tomchor closed this as completed Jan 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Non-standard function throws an error when used in a `KernelFunctionOperation` on the GPU #3438

Non-standard function throws an error when used in a `KernelFunctionOperation` on the GPU #3438

tomchor commented Jan 22, 2024

glwagner commented Jan 22, 2024

glwagner commented Jan 22, 2024 •

edited

Loading

glwagner commented Jan 22, 2024

glwagner commented Jan 22, 2024

glwagner commented Jan 23, 2024

tomchor commented Jan 23, 2024

jlapeyre commented Jan 25, 2024 •

edited

Loading

glwagner commented Jan 25, 2024

tomchor commented Jan 26, 2024

glwagner commented Jan 26, 2024

Non-standard function throws an error when used in a KernelFunctionOperation on the GPU #3438

Non-standard function throws an error when used in a KernelFunctionOperation on the GPU #3438

Comments

tomchor commented Jan 22, 2024

glwagner commented Jan 22, 2024

glwagner commented Jan 22, 2024 • edited Loading

glwagner commented Jan 22, 2024

glwagner commented Jan 22, 2024

glwagner commented Jan 23, 2024

tomchor commented Jan 23, 2024

jlapeyre commented Jan 25, 2024 • edited Loading

glwagner commented Jan 25, 2024

tomchor commented Jan 26, 2024

glwagner commented Jan 26, 2024

Non-standard function throws an error when used in a `KernelFunctionOperation` on the GPU #3438

Non-standard function throws an error when used in a `KernelFunctionOperation` on the GPU #3438

glwagner commented Jan 22, 2024 •

edited

Loading

jlapeyre commented Jan 25, 2024 •

edited

Loading