Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

maximum(abs, v) doesn't work on GPU in Julia 1.10.0 with grid size larger than (10, 10, 10) #3427

Closed
xkykai opened this issue Jan 12, 2024 · 16 comments · Fixed by #3403
Closed
Labels
bug 🐞 Even a perfect program still has bugs GPU 👾 Where Oceananigans gets its powers from

Comments

@xkykai
Copy link
Collaborator

xkykai commented Jan 12, 2024

(as discussed with @simone-silvestri)
I encountered this bug when trying to upgrade to julia 1.10.0. What happens is maximum(abs, v) doesn't work for grids larger than (10, 10, 10). However maximum(abs, u), maximum(abs, w), maximum(abs, b), maximum(u), maximum(v), maximum(w), and maximum(b) work just fine.

Here's a MWE tested on Supercloud and Tartarus:

using Oceananigans

grid = RectilinearGrid(GPU(),
                       size = (16, 16, 16),
                       x = (0, 1),
                       y = (0, 1),
                       z = (-1, 0),
                       topology = (Periodic, Periodic, Bounded))

model = NonhydrostaticModel(; grid)

u, v, w = model.velocities

maximum(u)
maximum(w)
maximum(v)

maximum(abs, u)
maximum(abs, w)
maximum(abs, v)
ERROR: LoadError: CUDA error: too many resources requested for launch (code 701, ERROR_LAUNCH_OUT_OF_RESOURCES)
Stacktrace:
  [1] throw_api_error(res::CUDA.cudaError_enum)
    @ CUDA ~/.julia/packages/CUDA/35NC6/lib/cudadrv/libcuda.jl:27
  [2] check
    @ ~/.julia/packages/CUDA/35NC6/lib/cudadrv/libcuda.jl:34 [inlined]
  [3] cuLaunchKernel
    @ ~/.julia/packages/CUDA/35NC6/lib/utils/call.jl:26 [inlined]
  [4] (::CUDA.var"#863#864"{Bool, Int64, CUDA.CuStream, CUDA.CuFunction, CUDA.CuDim3, CUDA.CuDim3})(kernelParams::Vector{Ptr{Nothing}})
    @ CUDA ~/.julia/packages/CUDA/35NC6/lib/cudadrv/execution.jl:69
  [5] macro expansion
    @ ~/.julia/packages/CUDA/35NC6/lib/cudadrv/execution.jl:33 [inlined]
  [6] macro expansion
    @ ./none:0 [inlined]
  [7] pack_arguments(::CUDA.var"#863#864"{…}, ::CUDA.KernelState, ::CartesianIndices{…}, ::CartesianIndices{…}, ::CUDA.CuDeviceArray{…}, ::Oceananigans.AbstractOperations.ConditionalOperation{…})
    @ CUDA ./none:0
  [8] launch(f::CUDA.CuFunction, args::Vararg{…}; blocks::Union{…}, threads::Union{…}, cooperative::Bool, shmem::Integer, stream::CUDA.CuStream) where N
    @ CUDA ~/.julia/packages/CUDA/35NC6/lib/cudadrv/execution.jl:62 [inlined]
  [9] #868
    @ CUDA ~/.julia/packages/CUDA/35NC6/lib/cudadrv/execution.jl:136 [inlined]
 [10] macro expansion
    @ CUDA ~/.julia/packages/CUDA/35NC6/lib/cudadrv/execution.jl:95 [inlined]
 [11] macro expansion
    @ CUDA ./none:0 [inlined]
 [12] convert_arguments
    @ CUDA ./none:0 [inlined]
 [13] #cudacall#867
    @ CUDA ~/.julia/packages/CUDA/35NC6/lib/cudadrv/execution.jl:135 [inlined]
 [14] cudacall
    @ CUDA ~/.julia/packages/CUDA/35NC6/lib/cudadrv/execution.jl:134 [inlined]
 [15] macro expansion
    @ CUDA ~/.julia/packages/CUDA/35NC6/src/compiler/execution.jl:219 [inlined]
 [16] macro expansion
    @ CUDA ./none:0 [inlined]
 [17] call(::CUDA.HostKernel{…}, ::typeof(identity), ::typeof(max), ::Nothing, ::CartesianIndices{…}, ::CartesianIndices{…}, ::Val{…}, ::CUDA.CuDeviceArray{…}, ::Oceananigans.AbstractOperations.ConditionalOperation{…}; call_kwargs::@Kwargs{…})
    @ CUDA ./none:0
 [18] (::CUDA.HostKernel{…})(::Function, ::Vararg{…}; threads::Int64, blocks::Int64, kwargs::@Kwargs{…})
    @ CUDA ~/.julia/packages/CUDA/35NC6/src/compiler/execution.jl:340
 [19] macro expansion
    @ ~/.julia/packages/CUDA/35NC6/src/compiler/execution.jl:106 [inlined]
 [20] mapreducedim!(f::typeof(identity), op::typeof(max), R::SubArray{…}, A::Oceananigans.AbstractOperations.ConditionalOperation{…}; init::Nothing)
    @ CUDA ~/.julia/packages/CUDA/35NC6/src/mapreduce.jl:271
 [21] mapreducedim!(f::typeof(identity), op::typeof(max), R::SubArray{…}, A::Oceananigans.AbstractOperations.ConditionalOperation{…})
    @ CUDA ~/.julia/packages/CUDA/35NC6/src/mapreduce.jl:169
 [22] mapreducedim!(f::Function, op::Function, R::SubArray{…}, A::Oceananigans.AbstractOperations.ConditionalOperation{…})
    @ GPUArrays ~/.julia/packages/GPUArrays/5XhED/src/host/mapreduce.jl:10
 [23] #maximum!#860
    @ Base ./reducedim.jl:1034 [inlined]
 [24] maximum!(f::Function, r::Field{…}, a::Oceananigans.AbstractOperations.ConditionalOperation{…}; condition::Nothing, mask::Float64, kwargs::@Kwargs{…})
    @ Oceananigans.Fields ~/.julia/packages/Oceananigans/r28zw/src/Fields/field.jl:618
 [25] maximum(f::Function, c::Field{…}; condition::Nothing, mask::Float64, dims::Function)
    @ Oceananigans.Fields ~/.julia/packages/Oceananigans/r28zw/src/Fields/field.jl:648
 [26] maximum(f::Function, c::Field{…})
    @ Oceananigans.Fields ~/.julia/packages/Oceananigans/r28zw/src/Fields/field.jl:637
 [27] top-level scope
    @ ~/SaltyOceanParameterizations.jl/CUDA_MWE.jl:20
 [28] include(fname::String)
    @ Base.MainInclude ./client.jl:489
 [29] top-level scope
    @ REPL[19]:1
 [30] top-level scope
    @ ~/.julia/packages/CUDA/35NC6/src/initialization.jl:190
in expression starting at /home/xinkai/SaltyOceanParameterizations.jl/CUDA_MWE.jl:20
Some type information was truncated. Use `show(err)` to see complete types.

Note that line 20 is the last line of the code snippet above (maximum(abs, v))

Here's the Julia version info:

Julia Version 1.10.0
Commit 3120989f39b (2023-12-25 18:01 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 48 × Intel(R) Xeon(R) Silver 4214 CPU @ 2.20GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-15.0.7 (ORCJIT, cascadelake)
  Threads: 1 on 48 virtual cores

Here's the CUDA runtime version:

CUDA runtime 11.8, artifact installation
CUDA driver 11.8
NVIDIA driver 520.61.5

CUDA libraries:
- CUBLAS: 11.11.3
- CURAND: 10.3.0
- CUFFT: 10.9.0
- CUSOLVER: 11.4.1
- CUSPARSE: 11.7.5
- CUPTI: 18.0.0
- NVML: 11.0.0+520.61.5

Julia packages:
- CUDA: 4.4.1
- CUDA_Driver_jll: 0.5.0+1
- CUDA_Runtime_jll: 0.6.0+0

Toolchain:
- Julia: 1.10.0
- LLVM: 15.0.7
- PTX ISA support: 3.2, 4.0, 4.1, 4.2, 4.3, 5.0, 6.0, 6.1, 6.3, 6.4, 6.5, 7.0, 7.1, 7.2, 7.3, 7.4, 7.5
- Device capability support: sm_35, sm_37, sm_50, sm_52, sm_53, sm_60, sm_61, sm_62, sm_70, sm_72, sm_75, sm_80, sm_86

1 device:
  0: NVIDIA TITAN V (sm_70, 9.027 GiB / 12.000 GiB available)

In Julia 1.9 this does not seem to be a problem.

@xkykai xkykai added bug 🐞 Even a perfect program still has bugs GPU 👾 Where Oceananigans gets its powers from labels Jan 12, 2024
@navidcy
Copy link
Collaborator

navidcy commented Jan 13, 2024

Can you try using the branch ncc/use-julia-v1.9.4 which, despite its original name, uses Julia v1.10.0?

@navidcy
Copy link
Collaborator

navidcy commented Jan 13, 2024

on tartarus with the above-mentioned branch things seem OK

navidcy:Oceananigans.jl/  |ncc/use-julia-v1.9.4|$ julia-1.10 --project
               _
   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 1.10.0 (2023-12-25)
 _/ |\__'_|_|_|\__'_|  |
|__/                   |

julia> using Oceananigans
[ Info: Oceananigans will use 48 threads

julia> grid = RectilinearGrid(GPU(),
                              size = (16, 16, 16),
                              x = (0, 1),
                              y = (0, 1),
                              z = (-1, 0),
                              topology = (Periodic, Periodic, Bounded))
16×16×16 RectilinearGrid{Float64, Periodic, Periodic, Bounded} on GPU with 3×3×3 halo
├── Periodic x ∈ [0.0, 1.0)  regularly spaced with Δx=0.0625
├── Periodic y ∈ [0.0, 1.0)  regularly spaced with Δy=0.0625
└── Bounded  z ∈ [-1.0, 0.0] regularly spaced with Δz=0.0625

julia> model = NonhydrostaticModel(; grid)
NonhydrostaticModel{GPU, RectilinearGrid}(time = 0 seconds, iteration = 0)
├── grid: 16×16×16 RectilinearGrid{Float64, Periodic, Periodic, Bounded} on GPU with 3×3×3 halo
├── timestepper: QuasiAdamsBashforth2TimeStepper
├── tracers: ()
├── closure: Nothing
├── buoyancy: Nothing
└── coriolis: Nothing

julia> u, v, w = model.velocities
NamedTuple with 3 Fields on 16×16×16 RectilinearGrid{Float64, Periodic, Periodic, Bounded} on GPU with 3×3×3 halo:
├── u: 16×16×16 Field{Face, Center, Center} on RectilinearGrid on GPU
├── v: 16×16×16 Field{Center, Face, Center} on RectilinearGrid on GPU
└── w: 16×16×17 Field{Center, Center, Face} on RectilinearGrid on GPU

julia> maximum(u)
0.0

julia> maximum(w)
0.0

julia> maximum(v)
0.0

julia> maximum(abs, u)
0.0

julia> maximum(abs, w)
0.0

julia> maximum(abs, v)
0.0

@navidcy
Copy link
Collaborator

navidcy commented Jan 13, 2024

While using main indeed I can reproduce the error above...

navidcy:Oceananigans.jl/  |main ✓|$ julia-1.10 --project
               _
   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 1.10.0 (2023-12-25)
 _/ |\__'_|_|_|\__'_|  |
|__/                   |

julia> using Oceananigans
┌ Warning: The active manifest file has dependencies that were resolved with a different julia version (1.9.3). Unexpected behavior may occur.
└ @ ~/Oceananigans.jl/Manifest.toml:0
┌ Warning: The project dependencies or compat requirements have changed since the manifest was last resolved.
│ It is recommended to `Pkg.resolve()` or consider `Pkg.update()` if necessary.
└ @ Pkg.API ~/julia-1.10/usr/share/julia/stdlib/v1.10/Pkg/src/API.jl:1800
Precompiling Oceananigans
  1 dependency successfully precompiled in 21 seconds. 143 already precompiled.
[ Info: Oceananigans will use 48 threads

julia> grid = RectilinearGrid(GPU(),
                              size = (16, 16, 16),
                              x = (0, 1),
                              y = (0, 1),
                              z = (-1, 0),
                              topology = (Periodic, Periodic, Bounded))
16×16×16 RectilinearGrid{Float64, Periodic, Periodic, Bounded} on GPU with 3×3×3 halo
├── Periodic x ∈ [0.0, 1.0)  regularly spaced with Δx=0.0625
├── Periodic y ∈ [0.0, 1.0)  regularly spaced with Δy=0.0625
└── Bounded  z ∈ [-1.0, 0.0] regularly spaced with Δz=0.0625

julia> model = NonhydrostaticModel(; grid)
NonhydrostaticModel{GPU, RectilinearGrid}(time = 0 seconds, iteration = 0)
├── grid: 16×16×16 RectilinearGrid{Float64, Periodic, Periodic, Bounded} on GPU with 3×3×3 halo
├── timestepper: QuasiAdamsBashforth2TimeStepper
├── tracers: ()
├── closure: Nothing
├── buoyancy: Nothing
└── coriolis: Nothing

julia> u, v, w = model.velocities
NamedTuple with 3 Fields on 16×16×16 RectilinearGrid{Float64, Periodic, Periodic, Bounded} on GPU with 3×3×3 halo:
├── u: 16×16×16 Field{Face, Center, Center} on RectilinearGrid on GPU
├── v: 16×16×16 Field{Center, Face, Center} on RectilinearGrid on GPU
└── w: 16×16×17 Field{Center, Center, Face} on RectilinearGrid on GPU

julia> maximum(u)
0.0

julia> maximum(w)
0.0

julia> maximum(v)
0.0

julia> maximum(abs, u)
0.0

julia> maximum(abs, w)
ERROR: CUDA error: too many resources requested for launch (code 701, ERROR_LAUNCH_OUT_OF_RESOURCES)
Stacktrace:
  [1] throw_api_error(res::CUDA.cudaError_enum)
    @ CUDA ~/.julia/packages/CUDA/nbRJk/lib/cudadrv/libcuda.jl:27
  [2] check
    @ ~/.julia/packages/CUDA/nbRJk/lib/cudadrv/libcuda.jl:34 [inlined]
  [3] cuLaunchKernel
    @ ~/.julia/packages/CUDA/nbRJk/lib/utils/call.jl:26 [inlined]
  [4] (::CUDA.var"#867#868"{Bool, Int64, CUDA.CuStream, CUDA.CuFunction, CUDA.CuDim3, CUDA.CuDim3})(kernelParams::Vector{Ptr{Nothing}})
    @ CUDA ~/.julia/packages/CUDA/nbRJk/lib/cudadrv/execution.jl:69
  [5] macro expansion
    @ ~/.julia/packages/CUDA/nbRJk/lib/cudadrv/execution.jl:33 [inlined]
  [6] macro expansion
    @ ./none:0 [inlined]
  [7] pack_arguments(::CUDA.var"#867#868"{…}, ::CUDA.KernelState, ::CartesianIndices{…}, ::CartesianIndices{…}, ::CUDA.CuDeviceArray{…}, ::Oceananigans.AbstractOperations.ConditionalOperation{…})
    @ CUDA ./none:0
  [8] launch(f::CUDA.CuFunction, args::Vararg{…}; blocks::Union{…}, threads::Union{…}, cooperative::Bool, shmem::Integer, stream::CUDA.CuStream) where N
    @ CUDA ~/.julia/packages/CUDA/nbRJk/lib/cudadrv/execution.jl:62 [inlined]
  [9] #872
    @ CUDA ~/.julia/packages/CUDA/nbRJk/lib/cudadrv/execution.jl:136 [inlined]
 [10] macro expansion
    @ CUDA ~/.julia/packages/CUDA/nbRJk/lib/cudadrv/execution.jl:95 [inlined]
 [11] macro expansion
    @ CUDA ./none:0 [inlined]
 [12] convert_arguments
    @ CUDA ./none:0 [inlined]
 [13] #cudacall#871
    @ CUDA ~/.julia/packages/CUDA/nbRJk/lib/cudadrv/execution.jl:135 [inlined]
 [14] cudacall
    @ CUDA ~/.julia/packages/CUDA/nbRJk/lib/cudadrv/execution.jl:134 [inlined]
 [15] macro expansion
    @ CUDA ~/.julia/packages/CUDA/nbRJk/src/compiler/execution.jl:223 [inlined]
 [16] macro expansion
    @ CUDA ./none:0 [inlined]
 [17] call(::CUDA.HostKernel{…}, ::typeof(identity), ::typeof(max), ::Nothing, ::CartesianIndices{…}, ::CartesianIndices{…}, ::Val{…}, ::CUDA.CuDeviceArray{…}, ::Oceananigans.AbstractOperations.ConditionalOperation{…}; call_kwargs::@Kwargs{…})
    @ CUDA ./none:0
 [18] (::CUDA.HostKernel{…})(::Function, ::Vararg{…}; threads::Int64, blocks::Int64, kwargs::@Kwargs{…})
    @ CUDA ~/.julia/packages/CUDA/nbRJk/src/compiler/execution.jl:345
 [19] macro expansion
    @ ~/.julia/packages/CUDA/nbRJk/src/compiler/execution.jl:106 [inlined]
 [20] mapreducedim!(f::typeof(identity), op::typeof(max), R::SubArray{…}, A::Oceananigans.AbstractOperations.ConditionalOperation{…}; init::Nothing)
    @ CUDA ~/.julia/packages/CUDA/nbRJk/src/mapreduce.jl:271
 [21] mapreducedim!(f::typeof(identity), op::typeof(max), R::SubArray{…}, A::Oceananigans.AbstractOperations.ConditionalOperation{…})
    @ CUDA ~/.julia/packages/CUDA/nbRJk/src/mapreduce.jl:169
 [22] mapreducedim!(f::Function, op::Function, R::SubArray{…}, A::Oceananigans.AbstractOperations.ConditionalOperation{…})
    @ GPUArrays ~/.julia/packages/GPUArrays/EZkix/src/host/mapreduce.jl:10
 [23] #maximum!#860
    @ Base ./reducedim.jl:1034 [inlined]
 [24] maximum!(f::Function, r::Field{…}, a::Oceananigans.AbstractOperations.ConditionalOperation{…}; condition::Nothing, mask::Float64, kwargs::@Kwargs{…})
    @ Oceananigans.Fields ~/Oceananigans.jl/src/Fields/field.jl:618
 [25] maximum(f::Function, c::Field{…}; condition::Nothing, mask::Float64, dims::Function)
    @ Oceananigans.Fields ~/Oceananigans.jl/src/Fields/field.jl:648
 [26] maximum(f::Function, c::Field{…})
    @ Oceananigans.Fields ~/Oceananigans.jl/src/Fields/field.jl:637
 [27] top-level scope
    @ REPL[9]:1
 [28] top-level scope
    @ ~/.julia/packages/CUDA/nbRJk/src/initialization.jl:205
Some type information was truncated. Use `show(err)` to see complete types.

julia> maximum(abs, v)
ERROR: CUDA error: too many resources requested for launch (code 701, ERROR_LAUNCH_OUT_OF_RESOURCES)
Stacktrace:
  [1] throw_api_error(res::CUDA.cudaError_enum)
    @ CUDA ~/.julia/packages/CUDA/nbRJk/lib/cudadrv/libcuda.jl:27
  [2] check
    @ ~/.julia/packages/CUDA/nbRJk/lib/cudadrv/libcuda.jl:34 [inlined]
  [3] cuLaunchKernel
    @ ~/.julia/packages/CUDA/nbRJk/lib/utils/call.jl:26 [inlined]
  [4] (::CUDA.var"#867#868"{Bool, Int64, CUDA.CuStream, CUDA.CuFunction, CUDA.CuDim3, CUDA.CuDim3})(kernelParams::Vector{Ptr{Nothing}})
    @ CUDA ~/.julia/packages/CUDA/nbRJk/lib/cudadrv/execution.jl:69
  [5] macro expansion
    @ ~/.julia/packages/CUDA/nbRJk/lib/cudadrv/execution.jl:33 [inlined]
  [6] macro expansion
    @ ./none:0 [inlined]
  [7] pack_arguments(::CUDA.var"#867#868"{…}, ::CUDA.KernelState, ::CartesianIndices{…}, ::CartesianIndices{…}, ::CUDA.CuDeviceArray{…}, ::Oceananigans.AbstractOperations.ConditionalOperation{…})
    @ CUDA ./none:0
  [8] launch(f::CUDA.CuFunction, args::Vararg{…}; blocks::Union{…}, threads::Union{…}, cooperative::Bool, shmem::Integer, stream::CUDA.CuStream) where N
    @ CUDA ~/.julia/packages/CUDA/nbRJk/lib/cudadrv/execution.jl:62 [inlined]
  [9] #872
    @ CUDA ~/.julia/packages/CUDA/nbRJk/lib/cudadrv/execution.jl:136 [inlined]
 [10] macro expansion
    @ CUDA ~/.julia/packages/CUDA/nbRJk/lib/cudadrv/execution.jl:95 [inlined]
 [11] macro expansion
    @ CUDA ./none:0 [inlined]
 [12] convert_arguments
    @ CUDA ./none:0 [inlined]
 [13] #cudacall#871
    @ CUDA ~/.julia/packages/CUDA/nbRJk/lib/cudadrv/execution.jl:135 [inlined]
 [14] cudacall
    @ CUDA ~/.julia/packages/CUDA/nbRJk/lib/cudadrv/execution.jl:134 [inlined]
 [15] macro expansion
    @ CUDA ~/.julia/packages/CUDA/nbRJk/src/compiler/execution.jl:223 [inlined]
 [16] macro expansion
    @ CUDA ./none:0 [inlined]
 [17] call(::CUDA.HostKernel{…}, ::typeof(identity), ::typeof(max), ::Nothing, ::CartesianIndices{…}, ::CartesianIndices{…}, ::Val{…}, ::CUDA.CuDeviceArray{…}, ::Oceananigans.AbstractOperations.ConditionalOperation{…}; call_kwargs::@Kwargs{…})
    @ CUDA ./none:0
 [18] (::CUDA.HostKernel{…})(::Function, ::Vararg{…}; threads::Int64, blocks::Int64, kwargs::@Kwargs{…})
    @ CUDA ~/.julia/packages/CUDA/nbRJk/src/compiler/execution.jl:345
 [19] macro expansion
    @ ~/.julia/packages/CUDA/nbRJk/src/compiler/execution.jl:106 [inlined]
 [20] mapreducedim!(f::typeof(identity), op::typeof(max), R::SubArray{…}, A::Oceananigans.AbstractOperations.ConditionalOperation{…}; init::Nothing)
    @ CUDA ~/.julia/packages/CUDA/nbRJk/src/mapreduce.jl:271
 [21] mapreducedim!(f::typeof(identity), op::typeof(max), R::SubArray{…}, A::Oceananigans.AbstractOperations.ConditionalOperation{…})
    @ CUDA ~/.julia/packages/CUDA/nbRJk/src/mapreduce.jl:169
 [22] mapreducedim!(f::Function, op::Function, R::SubArray{…}, A::Oceananigans.AbstractOperations.ConditionalOperation{…})
    @ GPUArrays ~/.julia/packages/GPUArrays/EZkix/src/host/mapreduce.jl:10
 [23] #maximum!#860
    @ Base ./reducedim.jl:1034 [inlined]
 [24] maximum!(f::Function, r::Field{…}, a::Oceananigans.AbstractOperations.ConditionalOperation{…}; condition::Nothing, mask::Float64, kwargs::@Kwargs{…})
    @ Oceananigans.Fields ~/Oceananigans.jl/src/Fields/field.jl:618
 [25] maximum(f::Function, c::Field{…}; condition::Nothing, mask::Float64, dims::Function)
    @ Oceananigans.Fields ~/Oceananigans.jl/src/Fields/field.jl:648
 [26] maximum(f::Function, c::Field{…})
    @ Oceananigans.Fields ~/Oceananigans.jl/src/Fields/field.jl:637
 [27] top-level scope
    @ REPL[10]:1
 [28] top-level scope
    @ ~/.julia/packages/CUDA/nbRJk/src/initialization.jl:205
Some type information was truncated. Use `show(err)` to see complete types.

That suggests that it's because the package dependencies on main were resolved with Julia v1.9.3.

┌ Warning: The active manifest file has dependencies that were resolved with a different julia version (1.9.3). Unexpected behavior may occur.

This issue will be resolved when #3403 is merged.

@glwagner
Copy link
Member

It looks like the conditional reduction is too heavy for mapreduce. Perhaps @simone-silvestri has ideas to resolve this.

@simone-silvestri
Copy link
Collaborator

The operation should not be too large since the grid is very small. Probably this is a symptom of a bug that does not affect the results but results in a waste of computational resources somewhere in conditional operation. I ll have a look

@glwagner
Copy link
Member

I think the size dependence has to do with how mapreduce works; it breaks the reduction into chunks and (10, 10, 10) might be just one chunk.

@josuemtzmo
Copy link
Collaborator

josuemtzmo commented Mar 12, 2024

I also had this issue, as new into GPU running, I was super confused about this error. It will be helpful if this issue is not fixable, to at least point out in the documentation.

I encountered this error by running a simulation based on the tutorial (Langmuir turbulence) in GPUs. Note that the print function prints the maximum(abs, u), maximum(abs, v), maximum(abs, w):

     msg = @sprintf("i: %04d, t: %s, Δt: %s, umax = (%.1e, %.1e, %.1e) ms⁻¹, wall time: %s\n",
                   iteration(simulation),
                   prettytime(time(simulation)),
                   prettytime(simulation.Δt),
                   maximum(abs, u), maximum(abs, v), maximum(abs, w),
                   prettytime(simulation.run_wall_time))

thus resulting in the error:

LoadError: CUDA error: too many resources requested for launch

For reference, the code works once the maximum functions are removed:

     msg = @sprintf("i: %04d, t: %s, �~Tt: %s, wall time: %s\n",
                   iteration(simulation),
                   prettytime(time(simulation)),
                   prettytime(simulation.�~Tt),
                   prettytime(simulation.run_wall_time))

@navidcy navidcy reopened this Mar 12, 2024
@navidcy
Copy link
Collaborator

navidcy commented Mar 12, 2024

reopening this

@glwagner
Copy link
Member

@simone-silvestri has declared an interest in fixing this

@simone-silvestri
Copy link
Collaborator

can you try maximum without abs?

@glwagner
Copy link
Member

I think its the abs (probably any function) that's the main issue

@josuemtzmo
Copy link
Collaborator

@simone-silvestri, effectively if I try maximum without abs the printing function works well. @glwagner is right, any function within the maximum creates the same issue (I tested with sum).

@glwagner
Copy link
Member

glwagner commented Mar 13, 2024

Well sum definitely won't work (it has to be a simple single-argument transformation) but you could try a function like

square(x) = x * x

or log if you want to be adventurous

@ali-ramadhan
Copy link
Member

Is this still an issue? @xkykai's MWE runs fine for me (I went up to 256x256x256), and I've been doing maximum(abs, u) on the GPU for a few versions.

Out of curiousity, @josuemtzmo are you able to reproduce the error on the latest versions of Julia, CUDA.jl, and Oceananigans.jl?


I'm using Oceananigans v0.91.7 with

julia> versioninfo()
Julia Version 1.10.4
Commit 48d4fd4843 (2024-06-04 10:41 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Windows (x86_64-w64-mingw32)
  CPU: 24 × AMD Ryzen 9 5900X 12-Core Processor
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-15.0.7 (ORCJIT, znver3)
Threads: 1 default, 0 interactive, 1 GC (on 24 virtual cores)

and

julia> Oceananigans.CUDA.versioninfo()
CUDA runtime 12.5, artifact installation
CUDA driver 12.5
NVIDIA driver 556.12.0

CUDA libraries:
- CUBLAS: 12.5.3
- CURAND: 10.3.6
- CUFFT: 11.2.3
- CUSOLVER: 11.6.3
- CUSPARSE: 12.5.1
- CUPTI: 2024.2.1 (API 23.0.0)
- NVML: 12.0.0+556.12

Julia packages:
- CUDA: 5.4.3
- CUDA_Driver_jll: 0.9.2+0
- CUDA_Runtime_jll: 0.14.1+0

Toolchain:
- Julia: 1.10.4
- LLVM: 15.0.7

1 device:
  0: NVIDIA GeForce RTX 3080 (sm_86, 5.794 GiB / 10.000 GiB available)

@josuemtzmo
Copy link
Collaborator

josuemtzmo commented Aug 20, 2024

Hello,

I've tested it in Oceananigans v0.91.8 with:

julia> versioninfo()
Julia Version 1.10.4
Commit 48d4fd48430 (2024-06-04 10:41 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 64 × Intel(R) Xeon(R) Gold 6130 CPU @ 2.10GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-15.0.7 (ORCJIT, skylake-avx512)
Threads: 1 default, 0 interactive, 1 GC (on 64 virtual cores)
Environment:
  JULIA_CUDA_MEMORY_POOL = none

julia> Oceananigans.CUDA.versioninfo()
CUDA runtime 12.1, artifact installation
CUDA driver 12.1
NVIDIA driver 530.30.2

CUDA libraries:
- CUBLAS: 12.1.3
- CURAND: 10.3.2
- CUFFT: 11.0.2
- CUSOLVER: 11.4.5
- CUSPARSE: 12.1.0
- CUPTI: 2023.1.1 (API 18.0.0)
- NVML: 12.0.0+530.30.2

Julia packages:
- CUDA: 5.4.3
- CUDA_Driver_jll: 0.9.2+0
- CUDA_Runtime_jll: 0.14.1+0

Toolchain:
- Julia: 1.10.4
- LLVM: 15.0.7

Environment:
- JULIA_CUDA_MEMORY_POOL: none

Preferences:
- CUDA_Runtime_jll.version: 12.1

1 device:
  0: Tesla V100-PCIE-32GB (sm_70, 30.884 GiB / 32.000 GiB available)

and the issue seems solved.
I agree with @ali-ramadhan, it seems that this issue was fixed at some point, although I haven't managed to pinpoint the version, I think I had the issue when I was using CUDA v5.1.2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug 🐞 Even a perfect program still has bugs GPU 👾 Where Oceananigans gets its powers from
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants