Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Non-standard function throws an error when used in a KernelFunctionOperation on the GPU #3438

Closed
tomchor opened this issue Jan 22, 2024 · 10 comments

Comments

@tomchor
Copy link
Collaborator

tomchor commented Jan 22, 2024

I'd like to use the Lambert W function in a KernelFunctionOperation but it doesn't seem to work on the GPU. Here's a MWE that works on the CPU:

using Oceananigans
using Oceananigans.Grids: xnode, ynode
using CUDA: has_cuda_gpu
using LambertW: lambertw

arch = has_cuda_gpu() ? GPU() : CPU()
grid = RectilinearGrid(arch, size = (4, 4, 4), extent = (1,1,1))

@inline W(x, y) = lambertw((y/x)^2)
@inline W(i, j, k, grid) = W(xnode(i, grid, Center()), ynode(j, grid, Center()))
op = KernelFunctionOperation{Center, Center, Center}(W, grid)
compute!(Field(op))

When running on a GPU this throws a huge error message, that you can check in full here, but here are the first few lines:

ERROR: LoadError: InvalidIRError: compiling MethodInstance for Oceananigans.AbstractOperations.gpu__compute!(::KernelAbstractions.CompilerMetadata{KernelAbstractions.NDIteration.StaticSize{(4, 4, 4)}, KernelAbstractions.NDIteration.DynamicCheck, Nothing, Nothing, KernelAbstractions.NDIteration.NDRange{3, KernelAbstractions.NDIteration.StaticSize{(1, 1, 4)}, KernelAbstractions.NDIteration.StaticSize{(16, 16, 1)}, Nothing, Nothing}}, ::OffsetArrays.OffsetArray{Float64, 3, CUDA.CuDeviceArray{Float64, 3, 1}}, ::KernelFunctionOperation{Center, Center, Center, RectilinearGrid{Float64, Periodic, Periodic, Bounded, Float64, Float64, Float64, OffsetArrays.OffsetVector{Float64, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}}, OffsetArrays.OffsetVector{Float64, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}}, OffsetArrays.OffsetVector{Float64, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}}, Nothing}, Float64, typeof(W), Tuple{}}, ::Tuple{Colon, Colon, Colon}) resulted in invalid LLVM IR
Reason: unsupported call to an unknown function (call to julia.get_pgcstack)
Stacktrace:
 [1] multiple call sites
   @ unknown:0
Reason: unsupported call through a literal pointer (call to jl_gc_run_pending_finalizers)
Stacktrace:
 [1] enable_finalizers
   @ ./gcutils.jl:126
 [2] _trylock
   @ ./lock.jl:133
 [3] multiple call sites
   @ unknown:0
Reason: unsupported dynamic function invocation (call to kwcall(::Any, ::typeof(sprint), f::Function, args...) @ Base strings/io.jl:107)

I imagine lambertw isn't implemented by CUDA, but that doesn't seem to be what the error says, so I wonder if there's something else going on here. I also saw that ClimaAtmos seems to be one of the packages that use LambertW.jl, so maybe the Clima people have encountered (and hopefully solved) this before?

@glwagner
Copy link
Member

I suggest starting a thread on LambertW.jl and ask whether the code is GPU compatible. It will probably be more straightforward to test this first using CUDA.jl or KernelAbstractions.jl and then, if that works, get it working in Oceananigans.

@glwagner
Copy link
Member

glwagner commented Jan 22, 2024

Also you will never get an error of the form "X function is not implemented by CUDA". The way GPU compilation works, functions with known CUDA analogues are converted appropriately at compile time. The rest are attempted to compile directly. The error you're getting probably means that lambertw cannot be completely type-inferred and compiled. For example the error involves a function invoked from strings/io.jl --- suggesting that lambertw may be trying to print something from your GPU kernel...

@glwagner
Copy link
Member

You might be hitting this warning which I don't think is valid to include in a GPU kernel:

https://github.com/JuliaMath/LambertW.jl/blob/6b6e2c81d920c4791f1c5204acf2eded84d8922d/src/LambertW.jl#L189

@glwagner
Copy link
Member

Probably the easiest thing to do is to fork LambertW.jl and remove that warning. The rest seems ok, though a max iterations of 1000 seems a bit high if you want performance. It depends what you want, but as a hack you can return a NaN upon non-convergence rather than throwing a warning.

@glwagner
Copy link
Member

Do you get a warning during CPU execution ?

@tomchor
Copy link
Collaborator Author

tomchor commented Jan 23, 2024

Do you get a warning during CPU execution ?

Nope. Everything seems to run pretty smoothly:

julia> using Oceananigans

julia> using Oceananigans.Grids: xnode, ynode

julia> using CUDA: has_cuda_gpu

julia> using LambertW: lambertw

julia> arch = has_cuda_gpu() ? GPU() : CPU()
CPU()

julia> grid = RectilinearGrid(arch, size = (4, 4, 4), extent = (1,1,1))
4×4×4 RectilinearGrid{Float64, Periodic, Periodic, Bounded} on CPU with 3×3×3 halo
├── Periodic x  [0.0, 1.0)  regularly spaced with Δx=0.25
├── Periodic y  [0.0, 1.0)  regularly spaced with Δy=0.25
└── Bounded  z  [-1.0, 0.0] regularly spaced with Δz=0.25

julia> @inline W(x, y) = lambertw((y/x)^2)
W (generic function with 1 method)

julia> @inline W(i, j, k, grid) = W(xnode(i, grid, Center()), ynode(j, grid, Center()))
W (generic function with 2 methods)

julia> op = KernelFunctionOperation{Center, Center, Center}(W, grid)
KernelFunctionOperation at (Center, Center, Center)
├── grid: 4×4×4 RectilinearGrid{Float64, Periodic, Periodic, Bounded} on CPU with 3×3×3 halo
├── kernel_function: W (generic function with 2 methods)
└── arguments: ()

julia> compute!(Field(op))
4×4×4 Field{Center, Center, Center} on RectilinearGrid on CPU
├── grid: 4×4×4 RectilinearGrid{Float64, Periodic, Periodic, Bounded} on CPU with 3×3×3 halo
├── boundary conditions: FieldBoundaryConditions
│   └── west: Periodic, east: Periodic, south: Periodic, north: Periodic, bottom: ZeroFlux, top: ZeroFlux, immersed: ZeroFlux
├── operand: KernelFunctionOperation at (Center, Center, Center)
├── status: time=0.0
└── data: 10×10×10 OffsetArray(::Array{Float64, 3}, -2:7, -2:7, -2:7) with eltype Float64 with indices -2:7×-2:7×-2:7
    └── max=2.84593, min=0.020004, mean=0.833143

@jlapeyre
Copy link

jlapeyre commented Jan 25, 2024

EDIT: This has nothing to do with Oceananigans.jl per se. Better pursued on LambertW.jl

I agree with #3438 (comment) .

This part of the stack trace suggests that it is the @warn that is causing the problem. It should be possible to remove that somehow.

Better would be to remove the @warn entirely and instead return the result along with info on the convergence. And maybe convenience interface for people who want to ignore it. That's a more robust interface for other reasons as well.

I don't know anything about running on GPUs. Does @warn cause failure if it is anywhere in the package being compiled? or anywhere in the function being called? Or does execution have to hit the @warn so that io is attempted at run time?

EDIT: I missed this above:

Do you get a warning during CPU execution ?

Nope. Everything seems to run pretty smoothly:

So it seems that execution does not have to hit the @warn for the reported failure. Also the stack trace indicates that the error happens when the macro is expanded.

EDIT: so the following comment may be relevant, but perhaps not

It would be nice if there were a way to redirect io or send it to dev null or otherwise disable everywhere when running on a GPU.

Reason: unsupported call to an unknown function (call to jl_f__call_latest)
Stacktrace:
  [1] #invokelatest#2
    @ ./essentials.jl:816
  [2] invokelatest
    @ ./essentials.jl:813
  [3] macro expansion
    @ ./logging.jl:381
  [4] lambertw_root_finding
    @ /glade/work/tomasc/.julia/packages/LambertW/tom9a/src/LambertW.jl:188
  [5] lambertw_branch_zero
    @ /glade/work/tomasc/.julia/packages/LambertW/tom9a/src/LambertW.jl:117
  [6] _lambertw
    @ /glade/work/tomasc/.julia/packages/LambertW/tom9a/src/LambertW.jl:93
  [7] lambertw (repeats 2 times)
    @ /glade/work/tomasc/.julia/packages/LambertW/tom9a/src/LambertW.jl:73
  [8] W
    @ /glade/derecho/scratch/tomasc/twake4/headland_simulations/mwe.jl:9

@glwagner
Copy link
Member

I tested it and can confirm if the line with @warn is commented out, the code runs without erroring.

@tomchor
Copy link
Collaborator Author

tomchor commented Jan 26, 2024

Thanks @glwagner. Sorry I didn't test before but I assumed that since we never reached that warning it couldn't cause problems.

I'll close this issue since it's clearly out of the scope for Oceananigans

@tomchor tomchor closed this as completed Jan 26, 2024
@glwagner
Copy link
Member

So it seems that execution does not have to hit the @warn for the reported failure. Also the stack trace indicates that the error happens when the macro is expanded.

Precisely.

And maybe convenience interface for people who want to ignore it. That's a more robust interface for other reasons as well.

Perhaps a positional argument to lambertw, either boolean or perhaps even better a type (to use multiple dispatch) to control behavior, things like

  • WarnFailures() (throw warning for failures)
  • MarkFailures(value=NaN) (mark failures with a specific value, do not throw warning)
  • IgnoreFailures() ?
  • WithSolverInfo() (return a type that contains the root, boolean converged, and possibly also number of iterations)

It would be nice if there were a way to redirect io or send it to dev null or otherwise disable everywhere when running on a GPU.

It is interesting to consider auto-sanitization of GPU code...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants