Complex AbstractOperations cannot be computed on GPU #1241

tomchor · 2020-12-02T17:55:48Z

I have the following output writer set-up in my simulation:

import Oceananigans.Fields: ComputedField
u, v, w = model.velocities.u, model.velocities.v, model.velocities.w
b, pHY′, pNHS = model.tracers.b, model.pressures.pHY′, model.pressures.pNHS
ν, νₑ = model.closure.ν, model.diffusivities.νₑ
#-----

#-----
import Oceananigans.AbstractOperations: ∂x, ∂y, ∂z

ddx = ∂x(u)^2 + ∂x(v)^2 + ∂x(w)^2
ddy = ∂y(u)^2 + ∂y(v)^2 + ∂y(w)^2
ddz = ∂z(u)^2 + ∂z(v)^2 + ∂z(w)^2
#avg_ε = AveragedField(ComputedField(ddx + ddy + ddz), dims=(1,))
avg_ε = AveragedField(ν*(ddx + ddy + ddz), dims=(1,))

outputs = (ε = avg_ε,)
simulation.output_writers[:avg_field_writer] =
    NetCDFOutputWriter(model, outputs,
                       filepath = "avg.jd15_3dbounded.nc",
                       schedule = TimeInterval(2minutes),
                       mode = "c")

This works successfully on CPUs, but running on GPUs I get a huge amount of error lines with some

 [16] compute! at /glade/u/home/tomasc/.julia/packages/Oceananigans/6JcUu/src/Fields/averaged_field.jl:86 [inlined]

Running the simulation without that output works for both GPUs and CPUs.

Am I doing something wrong here?

The text was updated successfully, but these errors were encountered:

glwagner · 2020-12-02T18:08:21Z

Ah, you may be running into some hidden limitations of AbstractOperations --- they seem to fail when the operations are too complex.

It's a shifty problem, because it appears to depend on the julia compiler. It's good that you opened this issue because it would be nice to document our efforts for solving this tricky problem.

I don't think this is a problem of AveragedFields specifically. Rather it's an issue with ComputedFields, are more specifically, evaluating complex AbstractOperations on the GPU.

Over at LESbrary, we are circumventing this issue by hand-writing particularly important complicated kernels. I think ViscousDissipation may be the object you're looking for:

https://github.com/CliMA/LESbrary.jl/blob/master/src/TurbulenceStatistics/viscous_dissipation.jl

tomchor · 2020-12-02T18:30:11Z

Thanks! Yeah, that's what I was trying to do. I see you already have some SGS stuff there as well.

Do you have an example of how to use LESbrary? I'm not that familiar with Julia and just from reading the README I can' quite understand how it's supposed to work.

Thanks!

glwagner · 2020-12-02T18:43:19Z

There's some LESbrary examples but the package definitely needs some love (wink wink):

https://github.com/CliMA/LESbrary.jl/blob/6cfe60cf44031f9357936058f08eed0f1bb5f9fc/examples/langmuir_turbulence.jl#L240

To get LESbrary.jl into your environment you use the package manager, eg:

using Pkg
Pkg.add("https://github.com/CliMA/LESbrary.jl.git")

using LESbrary.TurbulenceStatistics: ViscousDissipation

glwagner · 2020-12-02T19:20:35Z

One strategy that might help with complex AbstractOperations is to define intermediate ComputedFields, and then reuse these ComputedFields in higher-order AbstractOperations. This incurs extra computation but might be a stopgap if a low-level kernel like the one that's been written for ViscousDissipation is not available.

glwagner · 2020-12-02T20:21:18Z

This is probably related to some of the struggles in #870

tomchor · 2020-12-02T21:38:28Z

One strategy that might help with complex AbstractOperations is to define intermediate ComputedFields, and then reuse these ComputedFields in higher-order AbstractOperations. This incurs extra computation but might be a stopgap if a low-level kernel like the one that's been written for ViscousDissipation is not available.

I've tried a version of that already and it didn't work, but there are probably more things I could try. For now it's best not to reinvent the wheel and use LESbrary! But I'll keep this in mind when doing stuff like this in the future.

glwagner · 2020-12-02T21:44:00Z

Ultimately @tomchor we would definitely rather use AbstractOperations to diagnose ViscousDissipation rather than hard-coding it, as we had to resort to in LESbrary.jl. But this is tough issue to solve I think so yes, I think LESbrary.jl stuff is good for now. It'd be nice also to add other diagnostics that cannot be specified by AbstractOperations to LESbrary.jl. And we need documentation and tests, etc, etc...

glwagner · 2020-12-03T20:48:33Z

Another issue is that I believe one cannot embed AveragedFields into AbstractOperations on the GPU, despite that we have developed the abstraction and machinery so that everything does work as expected on the CPU. Embedding AveragedFields in AbstractOperations is needed to calculate things like turbulent kinetic energy and so is important for LES. Currently there's a custom field TurbulentKineticEnergy in the LESbrary.jl for this purpose:

https://github.com/CliMA/LESbrary.jl/blob/master/src/TurbulenceStatistics/turbulent_kinetic_energy.jl

It might be good to come up with a list of AbstractOperations that we would like to work on the GPU eventually, and then write tests for them and add them in a PR. We can then @test_skip them until either we figure out how to solve the issue or, perhaps, the julia compiler changes in a way that makes the tests pass... :-D

francispoulin · 2020-12-04T02:37:47Z

I have not used GPUs and don't appreciate the difficulty here at all but would be happy to discuss this sometime if people wanted to have a brainstorming session. Certainly starting simple is what I would recommend.

ali-ramadhan · 2020-12-04T02:55:17Z

Yeah I think it's just an unknown GPU compilation problem/failure. Unclear to me whether it's the fault of Oceananigans.AbstractOperations or something in CUDA.jl (or even KernelAbstractions.jl).

Certainly starting simple is what I would recommend.

Maybe it makes sense to try and condense some abstract operations into an isolated minimal working example which we can use to open an issue on CUDA.jl if that turns out to be the problem? Might help isolate the problem, and would certainly be much easier to debug.

I recall we encounter a limitation of the GPU compiler when trying to construct a GPU model with too many arbitrary tracers. I think in this case the type information was too large to even fit into the argument of a CUDA kernel or something so maybe this was a hard GPU limitation? Would be unfortunate if something similar is happening for complex abstract operations.

glwagner · 2020-12-04T12:26:58Z

Here's a starting list:

ViscousDissipation, which @tomchor attempts to compute above, also being computed here (and note that there's an open issue about the expression).
TurbulentKineticEnergy, which requires embedding AveragedFields in an operation
ShearProduction, similar to above.

@ali-ramadhan not sure if this is what you mean but when trying to adapt Field to the GPU we did encounter an error that was something like "ptx arguments consume too much parameter space". I believe this is an issue compiling very large objects (Field is large because it has boundary conditions). I don't think that this is the error we get for too-complex AbstractOperations though. I think for abstract operations its a type-inference issue. We should dump the whole error into this issue so that we know. I also think the error might change as the julia compiler evolves, or when we upgrade CUDA.jl.

glwagner · 2020-12-04T12:52:30Z

Here's the specific error we got when we tried to get Field, including all its glorious boundary conditions, to compile on the GPU:

Entry function 'ptxcall_calculate_Gu__66' uses too much parameter space (0x16c8 bytes, 0x1100 max)

dredged up from #746 . Some workarounds were suggested there, but I think our solution is actually better / simpler (adapt fields by unwrapping the underlying data and throwing away boundary conditions, rather than wrestling to get all the field info onto the poor GPU).

francispoulin · 2020-12-04T14:46:45Z

Does someone have a minimal example that reproduces this error? I'm just curious to learn more about the problem.

glwagner · 2020-12-04T15:05:47Z

using Oceananigans
using Oceananigans.AbstractOperations
using Oceananigans.Fields

grid = RegularCartesianGrid(size=(1, 1, 1), extent=(1, 1, 1)) 

model = IncompressibleModel(architecture=GPU(), grid=grid)

u, v, w = model.velocities

Σˣˣ = ∂x(u)
Σʸʸ = ∂y(v)
Σᶻᶻ = ∂z(w)

Σˣʸ = (∂y(u) + ∂x(v)) / 2 
Σˣᶻ = (∂z(u) + ∂x(w)) / 2 
Σʸᶻ = (∂z(v) + ∂y(w)) / 2 

ϵ = model.closure.ν * 2 * (Σˣˣ^2 + Σʸʸ^2 + Σᶻᶻ^2 + 2 * (Σˣʸ^2 + Σˣᶻ^2 + Σʸᶻ^2))

ϵ_field = ComputedField(ϵ)

compute!(ϵ_field)

produces

julia> include("complex_output.jl")
[ Info: Precompiling Oceananigans [9e8cae18-63c1-5223-a75c-80ca9d6e9a09]
[ Info: Oceananigans will use 24 threads
ERROR: LoadError: InvalidIRError: compiling kernel gpu__compute!(Cassette.Context{nametype(CUDACtx),KernelAbstractions.CompilerMetadata{KernelAbstractions.NDIteration.StaticSize{(1, 1, 1)},KernelAbstractions.NDIteration.DynamicCheck,Nothing,Nothing,KernelAbstractions.NDIteration.NDRange{3,KernelAbstractions.NDIteration.StaticSize{(1, 1, 1)},KernelAbstractions.NDIteration.StaticSize{(1, 1, 1)},Nothing,Nothing}},Nothing,KernelAbstractions.var"##PassType#253",Nothing,Cassette.DisableHooks}, typeof(Oceananigans.Fields.gpu__compute!), OffsetArrays.OffsetArray{Float64,3,CUDA.CuDeviceArray{Float64,3,1}}, Oceananigans.AbstractOperations.BinaryOperation{Cell,Cell,Cell,typeof(*),Float64,Oceananigans.AbstractOperations.MultiaryOperation{Cell,Cell,Cell,4,typeof(+),Tuple{Oceananigans.AbstractOperations.BinaryOperation{Cell,Cell,Cell,typeof(^),Oceananigans.AbstractOperations.Derivative{Cell,Cell,Cell,typeof(Oceananigans.Operators.∂xᶜᵃᵃ),OffsetArrays.OffsetArray{Float64,3,CUDA.CuDeviceArray{Float64,3,1}},typeof(identity),RegularCartesianGrid{Float64,Periodic,Periodic,Bounded,OffsetArrays.OffsetArray{Float64,1,StepRangeLen{Float64,Base.TwicePrecision{Float64},Base.TwicePrecision{Float64}}}}},Int64,typeof(identity),typeof(identity),typeof(identity),RegularCartesianGrid{Float64,Periodic,Periodic,Bounded,OffsetArrays.OffsetArray{Float64,1,StepRangeLen{Float64,Base.TwicePrecision{Float64},Base.TwicePrecision{Float64}}}}},Oceananigans.AbstractOperations.BinaryOperation{Cell,Cell,Cell,typeof(^),Oceananigans.AbstractOperations.Derivative{Cell,Cell,Cell,typeof(Oceananigans.Operators.∂yᵃᶜᵃ),OffsetArrays.OffsetArray{Float64,3,CUDA.CuDeviceArray{Float64,3,1}},typeof(identity),RegularCartesianGrid{Float64,Periodic,Periodic,Bounded,OffsetArrays.OffsetArray{Float64,1,StepRangeLen{Float64,Base.TwicePrecision{Float64},Base.TwicePrecision{Float64}}}}},Int64,typeof(identity),typeof(identity),typeof(identity),RegularCartesianGrid{Float64,Periodic,Periodic,Bounded,OffsetArrays.OffsetArray{Float64,1,StepRangeLen{Float64,Base.TwicePrecision{Float64},Base.TwicePrecision{Float64}}}}},Oceananigans.AbstractOperations.BinaryOperation{Cell,Cell,Cell,typeof(^),Oceananigans.AbstractOperations.Derivative{Cell,Cell,Cell,typeof(Oceananigans.Operators.∂zᵃᵃᶜ),OffsetArrays.OffsetArray{Float64,3,CUDA.CuDeviceArray{Float64,3,1}},typeof(identity),RegularCartesianGrid{Float64,Periodic,Periodic,Bounded,OffsetArrays.OffsetArray{Float64,1,StepRangeLen{Float64,Base.TwicePrecision{Float64},Base.TwicePrecision{Float64}}}}},Int64,typeof(identity),typeof(identity),typeof(identity),RegularCartesianGrid{Float64,Periodic,Periodic,Bounded,OffsetArrays.OffsetArray{Float64,1,StepRangeLen{Float64,Base.TwicePrecision{Float64},Base.TwicePrecision{Float64}}}}},Oceananigans.AbstractOperations.BinaryOperation{Face,Face,Cell,typeof(*),Int64,Oceananigans.AbstractOperations.MultiaryOperation{Face,Face,Cell,3,typeof(+),Tuple{Oceananigans.AbstractOperations.BinaryOperation{Face,Face,Cell,typeof(^),Oceananigans.AbstractOperations.BinaryOperation{Face,Face,Cell,typeof(/),Oceananigans.AbstractOperations.BinaryOperation{Face,Face,Cell,typeof(+),Oceananigans.AbstractOperations.Derivative{Face,Face,Cell,typeof(Oceananigans.Operators.∂yᵃᶠᵃ),OffsetArrays.OffsetArray{Float64,3,CUDA.CuDeviceArray{Float64,3,1}},typeof(identity),RegularCartesianGrid{Float64,Periodic,Periodic,Bounded,OffsetArrays.OffsetArray{Float64,1,StepRangeLen{Float64,Base.TwicePrecision{Float64},Base.TwicePrecision{Float64}}}}},Oceananigans.AbstractOperations.Derivative{Face,Face,Cell,typeof(Oceananigans.Operators.∂xᶠᵃᵃ),OffsetArrays.OffsetArray{Float64,3,CUDA.CuDeviceArray{Float64,3,1}},typeof(identity),RegularCartesianGrid{Float64,Periodic,Periodic,Bounded,OffsetArrays.OffsetArray{Float64,1,StepRangeLen{Float64,Base.TwicePrecision{Float64},Base.TwicePrecision{Float64}}}}},typeof(identity),typeof(identity),typeof(identity),RegularCartesianGrid{Float64,Periodic,Periodic,Bounded,OffsetArrays.OffsetArray{Float64,1,StepRangeLen{Float64,Base.TwicePrecision{Float64},Base.TwicePrecision{Float64}}}}},Int64,typeof(identity),typeof(identity),typeof(identity),RegularCartesianGrid{Float64,Periodic,Periodic,Bounded,OffsetArrays.OffsetArray{Float64,1,StepRangeLen{Float64,Base.TwicePrecision{Float64},Base.TwicePrecision{Float64}}}}},Int64,typeof(identity),typeof(identity),typeof(identity),RegularCartesianGrid{Float64,Periodic,Periodic,Bounded,OffsetArrays.OffsetArray{Float64,1,StepRangeLen{Float64,Base.TwicePrecision{Float64},Base.TwicePrecision{Float64}}}}},Oceananigans.AbstractOperations.BinaryOperation{Face,Cell,Face,typeof(^),Oceananigans.AbstractOperations.BinaryOperation{Face,Cell,Face,typeof(/),Oceananigans.AbstractOperations.BinaryOperation{Face,Cell,Face,typeof(+),Oceananigans.AbstractOperations.Derivative{Face,Cell,Face,typeof(Oceananigans.Operators.∂zᵃᵃᶠ),OffsetArrays.OffsetArray{Float64,3,CUDA.CuDeviceArray{Float64,3,1}},typeof(identity),RegularCartesianGrid{Float64,Periodic,Periodic,Bounded,OffsetArrays.OffsetArray{Float64,1,StepRangeLen{Float64,Base.TwicePrecision{Float64},Base.TwicePrecision{Float64}}}}},Oceananigans.AbstractOperations.Derivative{Face,Cell,Face,typeof(Oceananigans.Operators.∂xᶠᵃᵃ),OffsetArrays.OffsetArray{Float64,3,CUDA.CuDeviceArray{Float64,3,1}},typeof(identity),RegularCartesianGrid{Float64,Periodic,Periodic,Bounded,OffsetArrays.OffsetArray{Float64,1,StepRangeLen{Float64,Base.TwicePrecision{Float64},Base.TwicePrecision{Float64}}}}},typeof(identity),typeof(identity),typeof(identity),RegularCartesianGrid{Float64,Periodic,Periodic,Bounded,OffsetArrays.OffsetArray{Float64,1,StepRangeLen{Float64,Base.TwicePrecision{Float64},Base.TwicePrecision{Float64}}}}},Int64,typeof(identity),typeof(identity),typeof(identity),RegularCartesianGrid{Float64,Periodic,Periodic,Bounded,OffsetArrays.OffsetArray{Float64,1,StepRangeLen{Float64,Base.TwicePrecision{Float64},Base.TwicePrecision{Float64}}}}},Int64,typeof(identity),typeof(identity),typeof(identity),RegularCartesianGrid{Float64,Periodic,Periodic,Bounded,OffsetArrays.OffsetArray{Float64,1,StepRangeLen{Float64,Base.TwicePrecision{Float64},Base.TwicePrecision{Float64}}}}},Oceananigans.AbstractOperations.BinaryOperation{Cell,Face,Face,typeof(^),Oceananigans.AbstractOperations.BinaryOperation{Cell,Face,Face,typeof(/),Oceananigans.AbstractOperations.BinaryOperation{Cell,Face,Face,typeof(+),Oceananigans.AbstractOperations.Derivative{Cell,Face,Face,typeof(Oceananigans.Operators.∂zᵃᵃᶠ),OffsetArrays.OffsetArray{Float64,3,CUDA.CuDeviceArray{Float64,3,1}},typeof(identity),RegularCartesianGrid{Float64,Periodic,Periodic,Bounded,OffsetArrays.OffsetArray{Float64,1,StepRangeLen{Float64,Base.TwicePrecision{Float64},Base.TwicePrecision{Float64}}}}},Oceananigans.AbstractOperations.Derivative{Cell,Face,Face,typeof(Oceananigans.Operators.∂yᵃᶠᵃ),OffsetArrays.OffsetArray{Float64,3,CUDA.CuDeviceArray{Float64,3,1}},typeof(identity),RegularCartesianGrid{Float64,Periodic,Periodic,Bounded,OffsetArrays.OffsetArray{Float64,1,StepRangeLen{Float64,Base.TwicePrecision{Float64},Base.TwicePrecision{Float64}}}}},typeof(identity),typeof(identity),typeof(identity),RegularCartesianGrid{Float64,Periodic,Periodic,Bounded,OffsetArrays.OffsetArray{Float64,1,StepRangeLen{Float64,Base.TwicePrecision{Float64},Base.TwicePrecision{Float64}}}}},Int64,typeof(identity),typeof(identity),typeof(identity),RegularCartesianGrid{Float64,Periodic,Periodic,Bounded,OffsetArrays.OffsetArray{Float64,1,StepRangeLen{Float64,Base.TwicePrecision{Float64},Base.TwicePrecision{Float64}}}}},Int64,typeof(identity),typeof(identity),typeof(identity),RegularCartesianGrid{Float64,Periodic,Periodic,Bounded,OffsetArrays.OffsetArray{Float64,1,StepRangeLen{Float64,Base.TwicePrecision{Float64},Base.TwicePrecision{Float64}}}}}},Tuple{typeof(identity),typeof(Oceananigans.Operators.ℑyzᵃᶠᶜ),typeof(Oceananigans.Operators.ℑxzᶠᵃᶜ)},RegularCartesianGrid{Float64,Periodic,Periodic,Bounded,OffsetArrays.OffsetArray{Float64,1,StepRangeLen{Float64,Base.TwicePrecision{Float64},Base.TwicePrecision{Float64}}}}},typeof(identity),typeof(identity),typeof(identity),RegularCartesianGrid{Float64,Periodic,Periodic,Bounded,OffsetArrays.OffsetArray{Float64,1,StepRangeLen{Float64,Base.TwicePrecision{Float64},Base.TwicePrecision{Float64}}}}}},Tuple{typeof(identity),typeof(identity),typeof(identity),typeof(Oceananigans.Operators.ℑxyᶜᶜᵃ)},RegularCartesianGrid{Float64,Periodic,Periodic,Bounded,OffsetArrays.OffsetArray{Float64,1,StepRangeLen{Float64,Base.TwicePrecision{Float64},Base.TwicePrecision{Float64}}}}},typeof(identity),typeof(identity),typeof(identity),RegularCartesianGrid{Float64,Periodic,Periodic,Bounded,OffsetArrays.OffsetArray{Float64,1,StepRangeLen{Float64,Base.TwicePrecision{Float64},Base.TwicePrecision{Float64}}}}}) resulted in invalid LLVM IR
Reason: unsupported dynamic function invocation (call to overdub(overdub_context::Cassette.Context, overdub_arguments...) in Cassette at /home/glwagner/.julia/packages/Cassette/158rp/src/overdub.jl:586)
Stacktrace:
 [1] getindex at /home/glwagner/.julia/packages/Oceananigans/cLFd3/src/AbstractOperations/binary_operations.jl:34
 [2] macro expansion at /home/glwagner/.julia/packages/Oceananigans/cLFd3/src/Fields/computed_field.jl:86
 [3] gpu__compute! at /home/glwagner/.julia/packages/KernelAbstractions/jAutM/src/macros.jl:80
 [4] overdub at /home/glwagner/.julia/packages/Cassette/158rp/src/overdub.jl:0
Reason: unsupported dynamic function invocation (call to overdub)
Stacktrace:
 [1] macro expansion at /home/glwagner/.julia/packages/Oceananigans/cLFd3/src/Fields/computed_field.jl:86
 [2] gpu__compute! at /home/glwagner/.julia/packages/KernelAbstractions/jAutM/src/macros.jl:80
 [3] overdub at /home/glwagner/.julia/packages/Cassette/158rp/src/overdub.jl:0
Stacktrace:
 [1] check_ir(::GPUCompiler.CompilerJob{GPUCompiler.PTXCompilerTarget,CUDA.CUDACompilerParams}, ::LLVM.Module) at /home/glwagner/.julia/packages/GPUCompiler/uTpNx/src/validation.jl:123
 [2] macro expansion at /home/glwagner/.julia/packages/GPUCompiler/uTpNx/src/driver.jl:239 [inlined]
 [3] macro expansion at /home/glwagner/.julia/packages/TimerOutputs/ZmKD7/src/TimerOutput.jl:206 [inlined]
 [4] codegen(::Symbol, ::GPUCompiler.CompilerJob; libraries::Bool, deferred_codegen::Bool, optimize::Bool, strip::Bool, validate::Bool, only_entry::Bool) at /home/glwagner/.julia/packages/GPUCompiler/uTpNx/src/driver.jl:237
 [5] compile(::Symbol, ::GPUCompiler.CompilerJob; libraries::Bool, deferred_codegen::Bool, optimize::Bool, strip::Bool, validate::Bool, only_entry::Bool) at /home/glwagner/.julia/packages/GPUCompiler/uTpNx/src/driver.jl:39
 [6] compile at /home/glwagner/.julia/packages/GPUCompiler/uTpNx/src/driver.jl:35 [inlined]
 [7] cufunction_compile(::GPUCompiler.FunctionSpec; kwargs::Base.Iterators.Pairs{Symbol,Int64,Tuple{Symbol},NamedTuple{(:maxthreads,),Tuple{Int64}}}) at /home/glwagner/.julia/packages/CUDA/YeS8q/src/compiler/execution.jl:310
 [8] check_cache(::Dict{UInt64,Any}, ::Any, ::Any, ::GPUCompiler.FunctionSpec{typeof(Cassette.overdub),Tuple{Cassette.Context{nametype(CUDACtx),KernelAbstractions.CompilerMetadata{KernelAbstractions.NDIteration.StaticSize{(1, 1, 1)},KernelAbstractions.NDIteration.DynamicCheck,Nothing,Nothing,KernelAbstractions.NDIteration.NDRange{3,KernelAbstractions.NDIteration.StaticSize{(1, 1, 1)},KernelAbstractions.NDIteration.StaticSize{(1, 1, 1)},Nothing,Nothing}},Nothing,KernelAbstractions.var"##PassType#253",Nothing,Cassette.DisableHooks},typeof(Oceananigans.Fields.gpu__compute!),OffsetArrays.OffsetArray{Float64,3,CUDA.CuDeviceArray{Float64,3,1}},Oceananigans.AbstractOperations.BinaryOperation{Cell,Cell,Cell,typeof(*),Float64,Oceananigans.AbstractOperations.MultiaryOperation{Cell,Cell,Cell,4,typeof(+),Tuple{Oceananigans.AbstractOperations.BinaryOperation{Cell,Cell,Cell,typeof(^),Oceananigans.AbstractOperations.Derivative{Cell,Cell,Cell,typeof(Oceananigans.Operators.∂xᶜᵃᵃ),OffsetArrays.OffsetArray{Float64,3,CUDA.CuDeviceArray{Float64,3,1}},typeof(identity),RegularCartesianGrid{Float64,Periodic,Periodic,Bounded,OffsetArrays.OffsetArray{Float64,1,StepRangeLen{Float64,Base.TwicePrecision{Float64},Base.TwicePrecision{Float64}}}}},Int64,typeof(identity),typeof(identity),typeof(identity),RegularCartesianGrid{Float64,Periodic,Periodic,Bounded,OffsetArrays.OffsetArray{Float64,1,StepRangeLen{Float64,Base.TwicePrecision{Float64},Base.TwicePrecision{Float64}}}}},Oceananigans.AbstractOperations.BinaryOperation{Cell,Cell,Cell,typeof(^),Oceananigans.AbstractOperations.Derivative{Cell,Cell,Cell,typeof(Oceananigans.Operators.∂yᵃᶜᵃ),OffsetArrays.OffsetArray{Float64,3,CUDA.CuDeviceArray{Float64,3,1}},typeof(identity),RegularCartesianGrid{Float64,Periodic,Periodic,Bounded,OffsetArrays.OffsetArray{Float64,1,StepRangeLen{Float64,Base.TwicePrecision{Float64},Base.TwicePrecision{Float64}}}}},Int64,typeof(identity),typeof(identity),typeof(identity),RegularCartesianGrid{Float64,Periodic,Periodic,Bounded,OffsetArrays.OffsetArray{Float64,1,StepRangeLen{Float64,Base.TwicePrecision{Float64},Base.TwicePrecision{Float64}}}}},Oceananigans.AbstractOperations.BinaryOperation{Cell,Cell,Cell,typeof(^),Oceananigans.AbstractOperations.Derivative{Cell,Cell,Cell,typeof(Oceananigans.Operators.∂zᵃᵃᶜ),OffsetArrays.OffsetArray{Float64,3,CUDA.CuDeviceArray{Float64,3,1}},typeof(identity),RegularCartesianGrid{Float64,Periodic,Periodic,Bounded,OffsetArrays.OffsetArray{Float64,1,StepRangeLen{Float64,Base.TwicePrecision{Float64},Base.TwicePrecision{Float64}}}}},Int64,typeof(identity),typeof(identity),typeof(identity),RegularCartesianGrid{Float64,Periodic,Periodic,Bounded,OffsetArrays.OffsetArray{Float64,1,StepRangeLen{Float64,Base.TwicePrecision{Float64},Base.TwicePrecision{Float64}}}}},Oceananigans.AbstractOperations.BinaryOperation{Face,Face,Cell,typeof(*),Int64,Oceananigans.AbstractOperations.MultiaryOperation{Face,Face,Cell,3,typeof(+),Tuple{Oceananigans.AbstractOperations.BinaryOperation{Face,Face,Cell,typeof(^),Oceananigans.AbstractOperations.BinaryOperation{Face,Face,Cell,typeof(/),Oceananigans.AbstractOperations.BinaryOperation{Face,Face,Cell,typeof(+),Oceananigans.AbstractOperations.Derivative{Face,Face,Cell,typeof(Oceananigans.Operators.∂yᵃᶠᵃ),OffsetArrays.OffsetArray{Float64,3,CUDA.CuDeviceArray{Float64,3,1}},typeof(identity),RegularCartesianGrid{Float64,Periodic,Periodic,Bounded,OffsetArrays.OffsetArray{Float64,1,StepRangeLen{Float64,Base.TwicePrecision{Float64},Base.TwicePrecision{Float64}}}}},Oceananigans.AbstractOperations.Derivative{Face,Face,Cell,typeof(Oceananigans.Operators.∂xᶠᵃᵃ),OffsetArrays.OffsetArray{Float64,3,CUDA.CuDeviceArray{Float64,3,1}},typeof(identity),RegularCartesianGrid{Float64,Periodic,Periodic,Bounded,OffsetArrays.OffsetArray{Float64,1,StepRangeLen{Float64,Base.TwicePrecision{Float64},Base.TwicePrecision{Float64}}}}},typeof(identity),typeof(identity),typeof(identity),RegularCartesianGrid{Float64,Periodic,Periodic,Bounded,OffsetArrays.OffsetArray{Float64,1,StepRangeLen{Float64,Base.TwicePrecision{Float64},Base.TwicePrecision{Float64}}}}},Int64,typeof(identity),typeof(identity),typeof(identity),RegularCartesianGrid{Float64,Periodic,Periodic,Bounded,OffsetArrays.OffsetArray{Float64,1,StepRangeLen{Float64,Base.TwicePrecision{Float64},Base.TwicePrecision{Float64}}}}},Int64,typeof(identity),typeof(identity),typeof(identity),RegularCartesianGrid{Float64,Periodic,Periodic,Bounded,OffsetArrays.OffsetArray{Float64,1,StepRangeLen{Float64,Base.TwicePrecision{Float64},Base.TwicePrecision{Float64}}}}},Oceananigans.AbstractOperations.BinaryOperation{Face,Cell,Face,typeof(^),Oceananigans.AbstractOperations.BinaryOperation{Face,Cell,Face,typeof(/),Oceananigans.AbstractOperations.BinaryOperation{Face,Cell,Face,typeof(+),Oceananigans.AbstractOperations.Derivative{Face,Cell,Face,typeof(Oceananigans.Operators.∂zᵃᵃᶠ),OffsetArrays.OffsetArray{Float64,3,CUDA.CuDeviceArray{Float64,3,1}},typeof(identity),RegularCartesianGrid{Float64,Periodic,Periodic,Bounded,OffsetArrays.OffsetArray{Float64,1,StepRangeLen{Float64,Base.TwicePrecision{Float64},Base.TwicePrecision{Float64}}}}},Oceananigans.AbstractOperations.Derivative{Face,Cell,Face,typeof(Oceananigans.Operators.∂xᶠᵃᵃ),OffsetArrays.OffsetArray{Float64,3,CUDA.CuDeviceArray{Float64,3,1}},typeof(identity),RegularCartesianGrid{Float64,Periodic,Periodic,Bounded,OffsetArrays.OffsetArray{Float64,1,StepRangeLen{Float64,Base.TwicePrecision{Float64},Base.TwicePrecision{Float64}}}}},typeof(identity),typeof(identity),typeof(identity),RegularCartesianGrid{Float64,Periodic,Periodic,Bounded,OffsetArrays.OffsetArray{Float64,1,StepRangeLen{Float64,Base.TwicePrecision{Float64},Base.TwicePrecision{Float64}}}}},Int64,typeof(identity),typeof(identity),typeof(identity),RegularCartesianGrid{Float64,Periodic,Periodic,Bounded,OffsetArrays.OffsetArray{Float64,1,StepRangeLen{Float64,Base.TwicePrecision{Float64},Base.TwicePrecision{Float64}}}}},Int64,typeof(identity),typeof(identity),typeof(identity),RegularCartesianGrid{Float64,Periodic,Periodic,Bounded,OffsetArrays.OffsetArray{Float64,1,StepRangeLen{Float64,Base.TwicePrecision{Float64},Base.TwicePrecision{Float64}}}}},Oceananigans.AbstractOperations.BinaryOperation{Cell,Face,Face,typeof(^),Oceananigans.AbstractOperations.BinaryOperation{Cell,Face,Face,typeof(/),Oceananigans.AbstractOperations.BinaryOperation{Cell,Face,Face,typeof(+),Oceananigans.AbstractOperations.Derivative{Cell,Face,Face,typeof(Oceananigans.Operators.∂zᵃᵃᶠ),OffsetArrays.OffsetArray{Float64,3,CUDA.CuDeviceArray{Float64,3,1}},typeof(identity),RegularCartesianGrid{Float64,Periodic,Periodic,Bounded,OffsetArrays.OffsetArray{Float64,1,StepRangeLen{Float64,Base.TwicePrecision{Float64},Base.TwicePrecision{Float64}}}}},Oceananigans.AbstractOperations.Derivative{Cell,Face,Face,typeof(Oceananigans.Operators.∂yᵃᶠᵃ),OffsetArrays.OffsetArray{Float64,3,CUDA.CuDeviceArray{Float64,3,1}},typeof(identity),RegularCartesianGrid{Float64,Periodic,Periodic,Bounded,OffsetArrays.OffsetArray{Float64,1,StepRangeLen{Float64,Base.TwicePrecision{Float64},Base.TwicePrecision{Float64}}}}},typeof(identity),typeof(identity),typeof(identity),RegularCartesianGrid{Float64,Periodic,Periodic,Bounded,OffsetArrays.OffsetArray{Float64,1,StepRangeLen{Float64,Base.TwicePrecision{Float64},Base.TwicePrecision{Float64}}}}},Int64,typeof(identity),typeof(identity),typeof(identity),RegularCartesianGrid{Float64,Periodic,Periodic,Bounded,OffsetArrays.OffsetArray{Float64,1,StepRangeLen{Float64,Base.TwicePrecision{Float64},Base.TwicePrecision{Float64}}}}},Int64,typeof(identity),typeof(identity),typeof(identity),RegularCartesianGrid{Float64,Periodic,Periodic,Bounded,OffsetArrays.OffsetArray{Float64,1,StepRangeLen{Float64,Base.TwicePrecision{Float64},Base.TwicePrecision{Float64}}}}}},Tuple{typeof(identity),typeof(Oceananigans.Operators.ℑyzᵃᶠᶜ),typeof(Oceananigans.Operators.ℑxzᶠᵃᶜ)},RegularCartesianGrid{Float64,Periodic,Periodic,Bounded,OffsetArrays.OffsetArray{Float64,1,StepRangeLen{Float64,Base.TwicePrecision{Float64},Base.TwicePrecision{Float64}}}}},typeof(identity),typeof(identity),typeof(identity),RegularCartesianGrid{Float64,Periodic,Periodic,Bounded,OffsetArrays.OffsetArray{Float64,1,StepRangeLen{Float64,Base.TwicePrecision{Float64},Base.TwicePrecision{Float64}}}}}},Tuple{typeof(identity),typeof(identity),typeof(identity),typeof(Oceananigans.Operators.ℑxyᶜᶜᵃ)},RegularCartesianGrid{Float64,Periodic,Periodic,Bounded,OffsetArrays.OffsetArray{Float64,1,StepRangeLen{Float64,Base.TwicePrecision{Float64},Base.TwicePrecision{Float64}}}}},typeof(identity),typeof(identity),typeof(identity),RegularCartesianGrid{Float64,Periodic,Periodic,Bounded,OffsetArrays.OffsetArray{Float64,1,StepRangeLen{Float64,Base.TwicePrecision{Float64},Base.TwicePrecision{Float64}}}}}}}, ::UInt64; kwargs::Base.Iterators.Pairs{Symbol,Int64,Tuple{Symbol},NamedTuple{(:maxthreads,),Tuple{Int64}}}) at /home/glwagner/.julia/packages/GPUCompiler/uTpNx/src/cache.jl:40
 [9] gpu__compute! at ./array.jl:0 [inlined]
 [10] cufunction(::typeof(Cassette.overdub), ::Type{Tuple{Cassette.Context{nametype(CUDACtx),KernelAbstractions.CompilerMetadata{KernelAbstractions.NDIteration.StaticSize{(1, 1, 1)},KernelAbstractions.NDIteration.DynamicCheck,Nothing,Nothing,KernelAbstractions.NDIteration.NDRange{3,KernelAbstractions.NDIteration.StaticSize{(1, 1, 1)},KernelAbstractions.NDIteration.StaticSize{(1, 1, 1)},Nothing,Nothing}},Nothing,KernelAbstractions.var"##PassType#253",Nothing,Cassette.DisableHooks},typeof(Oceananigans.Fields.gpu__compute!),OffsetArrays.OffsetArray{Float64,3,CUDA.CuDeviceArray{Float64,3,1}},Oceananigans.AbstractOperations.BinaryOperation{Cell,Cell,Cell,typeof(*),Float64,Oceananigans.AbstractOperations.MultiaryOperation{Cell,Cell,Cell,4,typeof(+),Tuple{Oceananigans.AbstractOperations.BinaryOperation{Cell,Cell,Cell,typeof(^),Oceananigans.AbstractOperations.Derivative{Cell,Cell,Cell,typeof(Oceananigans.Operators.∂xᶜᵃᵃ),OffsetArrays.OffsetArray{Float64,3,CUDA.CuDeviceArray{Float64,3,1}},typeof(identity),RegularCartesianGrid{Float64,Periodic,Periodic,Bounded,OffsetArrays.OffsetArray{Float64,1,StepRangeLen{Float64,Base.TwicePrecision{Float64},Base.TwicePrecision{Float64}}}}},Int64,typeof(identity),typeof(identity),typeof(identity),RegularCartesianGrid{Float64,Periodic,Periodic,Bounded,OffsetArrays.OffsetArray{Float64,1,StepRangeLen{Float64,Base.TwicePrecision{Float64},Base.TwicePrecision{Float64}}}}},Oceananigans.AbstractOperations.BinaryOperation{Cell,Cell,Cell,typeof(^),Oceananigans.AbstractOperations.Derivative{Cell,Cell,Cell,typeof(Oceananigans.Operators.∂yᵃᶜᵃ),OffsetArrays.OffsetArray{Float64,3,CUDA.CuDeviceArray{Float64,3,1}},typeof(identity),RegularCartesianGrid{Float64,Periodic,Periodic,Bounded,OffsetArrays.OffsetArray{Float64,1,StepRangeLen{Float64,Base.TwicePrecision{Float64},Base.TwicePrecision{Float64}}}}},Int64,typeof(identity),typeof(identity),typeof(identity),RegularCartesianGrid{Float64,Periodic,Periodic,Bounded,OffsetArrays.OffsetArray{Float64,1,StepRangeLen{Float64,Base.TwicePrecision{Float64},Base.TwicePrecision{Float64}}}}},Oceananigans.AbstractOperations.BinaryOperation{Cell,Cell,Cell,typeof(^),Oceananigans.AbstractOperations.Derivative{Cell,Cell,Cell,typeof(Oceananigans.Operators.∂zᵃᵃᶜ),OffsetArrays.OffsetArray{Float64,3,CUDA.CuDeviceArray{Float64,3,1}},typeof(identity),RegularCartesianGrid{Float64,Periodic,Periodic,Bounded,OffsetArrays.OffsetArray{Float64,1,StepRangeLen{Float64,Base.TwicePrecision{Float64},Base.TwicePrecision{Float64}}}}},Int64,typeof(identity),typeof(identity),typeof(identity),RegularCartesianGrid{Float64,Periodic,Periodic,Bounded,OffsetArrays.OffsetArray{Float64,1,StepRangeLen{Float64,Base.TwicePrecision{Float64},Base.TwicePrecision{Float64}}}}},Oceananigans.AbstractOperations.BinaryOperation{Face,Face,Cell,typeof(*),Int64,Oceananigans.AbstractOperations.MultiaryOperation{Face,Face,Cell,3,typeof(+),Tuple{Oceananigans.AbstractOperations.BinaryOperation{Face,Face,Cell,typeof(^),Oceananigans.AbstractOperations.BinaryOperation{Face,Face,Cell,typeof(/),Oceananigans.AbstractOperations.BinaryOperation{Face,Face,Cell,typeof(+),Oceananigans.AbstractOperations.Derivative{Face,Face,Cell,typeof(Oceananigans.Operators.∂yᵃᶠᵃ),OffsetArrays.OffsetArray{Float64,3,CUDA.CuDeviceArray{Float64,3,1}},typeof(identity),RegularCartesianGrid{Float64,Periodic,Periodic,Bounded,OffsetArrays.OffsetArray{Float64,1,StepRangeLen{Float64,Base.TwicePrecision{Float64},Base.TwicePrecision{Float64}}}}},Oceananigans.AbstractOperations.Derivative{Face,Face,Cell,typeof(Oceananigans.Operators.∂xᶠᵃᵃ),OffsetArrays.OffsetArray{Float64,3,CUDA.CuDeviceArray{Float64,3,1}},typeof(identity),RegularCartesianGrid{Float64,Periodic,Periodic,Bounded,OffsetArrays.OffsetArray{Float64,1,StepRangeLen{Float64,Base.TwicePrecision{Float64},Base.TwicePrecision{Float64}}}}},typeof(identity),typeof(identity),typeof(identity),RegularCartesianGrid{Float64,Periodic,Periodic,Bounded,OffsetArrays.OffsetArray{Float64,1,StepRangeLen{Float64,Base.TwicePrecision{Float64},Base.TwicePrecision{Float64}}}}},Int64,typeof(identity),typeof(identity),typeof(identity),RegularCartesianGrid{Float64,Periodic,Periodic,Bounded,OffsetArrays.OffsetArray{Float64,1,StepRangeLen{Float64,Base.TwicePrecision{Float64},Base.TwicePrecision{Float64}}}}},Int64,typeof(identity),typeof(identity),typeof(identity),RegularCartesianGrid{Float64,Periodic,Periodic,Bounded,OffsetArrays.OffsetArray{Float64,1,StepRangeLen{Float64,Base.TwicePrecision{Float64},Base.TwicePrecision{Float64}}}}},Oceananigans.AbstractOperations.BinaryOperation{Face,Cell,Face,typeof(^),Oceananigans.AbstractOperations.BinaryOperation{Face,Cell,Face,typeof(/),Oceananigans.AbstractOperations.BinaryOperation{Face,Cell,Face,typeof(+),Oceananigans.AbstractOperations.Derivative{Face,Cell,Face,typeof(Oceananigans.Operators.∂zᵃᵃᶠ),OffsetArrays.OffsetArray{Float64,3,CUDA.CuDeviceArray{Float64,3,1}},typeof(identity),RegularCartesianGrid{Float64,Periodic,Periodic,Bounded,OffsetArrays.OffsetArray{Float64,1,StepRangeLen{Float64,Base.TwicePrecision{Float64},Base.TwicePrecision{Float64}}}}},Oceananigans.AbstractOperations.Derivative{Face,Cell,Face,typeof(Oceananigans.Operators.∂xᶠᵃᵃ),OffsetArrays.OffsetArray{Float64,3,CUDA.CuDeviceArray{Float64,3,1}},typeof(identity),RegularCartesianGrid{Float64,Periodic,Periodic,Bounded,OffsetArrays.OffsetArray{Float64,1,StepRangeLen{Float64,Base.TwicePrecision{Float64},Base.TwicePrecision{Float64}}}}},typeof(identity),typeof(identity),typeof(identity),RegularCartesianGrid{Float64,Periodic,Periodic,Bounded,OffsetArrays.OffsetArray{Float64,1,StepRangeLen{Float64,Base.TwicePrecision{Float64},Base.TwicePrecision{Float64}}}}},Int64,typeof(identity),typeof(identity),typeof(identity),RegularCartesianGrid{Float64,Periodic,Periodic,Bounded,OffsetArrays.OffsetArray{Float64,1,StepRangeLen{Float64,Base.TwicePrecision{Float64},Base.TwicePrecision{Float64}}}}},Int64,typeof(identity),typeof(identity),typeof(identity),RegularCartesianGrid{Float64,Periodic,Periodic,Bounded,OffsetArrays.OffsetArray{Float64,1,StepRangeLen{Float64,Base.TwicePrecision{Float64},Base.TwicePrecision{Float64}}}}},Oceananigans.AbstractOperations.BinaryOperation{Cell,Face,Face,typeof(^),Oceananigans.AbstractOperations.BinaryOperation{Cell,Face,Face,typeof(/),Oceananigans.AbstractOperations.BinaryOperation{Cell,Face,Face,typeof(+),Oceananigans.AbstractOperations.Derivative{Cell,Face,Face,typeof(Oceananigans.Operators.∂zᵃᵃᶠ),OffsetArrays.OffsetArray{Float64,3,CUDA.CuDeviceArray{Float64,3,1}},typeof(identity),RegularCartesianGrid{Float64,Periodic,Periodic,Bounded,OffsetArrays.OffsetArray{Float64,1,StepRangeLen{Float64,Base.TwicePrecision{Float64},Base.TwicePrecision{Float64}}}}},Oceananigans.AbstractOperations.Derivative{Cell,Face,Face,typeof(Oceananigans.Operators.∂yᵃᶠᵃ),OffsetArrays.OffsetArray{Float64,3,CUDA.CuDeviceArray{Float64,3,1}},typeof(identity),RegularCartesianGrid{Float64,Periodic,Periodic,Bounded,OffsetArrays.OffsetArray{Float64,1,StepRangeLen{Float64,Base.TwicePrecision{Float64},Base.TwicePrecision{Float64}}}}},typeof(identity),typeof(identity),typeof(identity),RegularCartesianGrid{Float64,Periodic,Periodic,Bounded,OffsetArrays.OffsetArray{Float64,1,StepRangeLen{Float64,Base.TwicePrecision{Float64},Base.TwicePrecision{Float64}}}}},Int64,typeof(identity),typeof(identity),typeof(identity),RegularCartesianGrid{Float64,Periodic,Periodic,Bounded,OffsetArrays.OffsetArray{Float64,1,StepRangeLen{Float64,Base.TwicePrecision{Float64},Base.TwicePrecision{Float64}}}}},Int64,typeof(identity),typeof(identity),typeof(identity),RegularCartesianGrid{Float64,Periodic,Periodic,Bounded,OffsetArrays.OffsetArray{Float64,1,StepRangeLen{Float64,Base.TwicePrecision{Float64},Base.TwicePrecision{Float64}}}}}},Tuple{typeof(identity),typeof(Oceananigans.Operators.ℑyzᵃᶠᶜ),typeof(Oceananigans.Operators.ℑxzᶠᵃᶜ)},RegularCartesianGrid{Float64,Periodic,Periodic,Bounded,OffsetArrays.OffsetArray{Float64,1,StepRangeLen{Float64,Base.TwicePrecision{Float64},Base.TwicePrecision{Float64}}}}},typeof(identity),typeof(identity),typeof(identity),RegularCartesianGrid{Float64,Periodic,Periodic,Bounded,OffsetArrays.OffsetArray{Float64,1,StepRangeLen{Float64,Base.TwicePrecision{Float64},Base.TwicePrecision{Float64}}}}}},Tuple{typeof(identity),typeof(identity),typeof(identity),typeof(Oceananigans.Operators.ℑxyᶜᶜᵃ)},RegularCartesianGrid{Float64,Periodic,Periodic,Bounded,OffsetArrays.OffsetArray{Float64,1,StepRangeLen{Float64,Base.TwicePrecision{Float64},Base.TwicePrecision{Float64}}}}},typeof(identity),typeof(identity),typeof(identity),RegularCartesianGrid{Float64,Periodic,Periodic,Bounded,OffsetArrays.OffsetArray{Float64,1,StepRangeLen{Float64,Base.TwicePrecision{Float64},Base.TwicePrecision{Float64}}}}}}}; name::String, kwargs::Base.Iterators.Pairs{Symbol,Int64,Tuple{Symbol},NamedTuple{(:maxthreads,),Tuple{Int64}}}) at /home/glwagner/.julia/packages/CUDA/YeS8q/src/compiler/execution.jl:297
 [11] macro expansion at /home/glwagner/.julia/packages/CUDA/YeS8q/src/compiler/execution.jl:109 [inlined]
 [12] (::KernelAbstractions.Kernel{KernelAbstractions.CUDADevice,KernelAbstractions.NDIteration.StaticSize{(1, 1)},KernelAbstractions.NDIteration.StaticSize{(1, 1, 1)},typeof(Oceananigans.Fields.gpu__compute!)})(::OffsetArrays.OffsetArray{Float64,3,CUDA.CuArray{Float64,3}}, ::Vararg{Any,N} where N; ndrange::Nothing, dependencies::KernelAbstractions.CudaEvent, workgroupsize::Nothing, progress::Function) at /home/glwagner/.julia/packages/KernelAbstractions/jAutM/src/backends/cuda.jl:185
 [13] compute!(::ComputedField{Cell,Cell,Cell

ali-ramadhan · 2020-12-04T15:08:35Z

Hmmm, looks like a Cassette issue. Maybe it's just being overly sensitive to the contents of the kernel (or kernel arguments) like was the case with #828?

glwagner · 2020-12-04T15:13:19Z

A "dynamic function invocation" means that the compiler thinks a function is being called whose scope can change "dynamically" (I think). This is the error one gets when a function depends on a global variable that is not const (for example). In this case, the error tells us that the types of the objects involved in calling getindex defined on BinaryOperation:

Oceananigans.jl/src/AbstractOperations/binary_operations.jl

Line 34 in c3b688f

    
           @inline Base.getindex(β::BinaryOperation, i, j, k) = β.▶op(i, j, k, β.grid, β.op, β.▶a, β.▶b, β.a, β.b)

are not correctly inferred.

The way getindex comes into play is in the kernel function _compute! that evaluates the AbstractOperation:

Oceananigans.jl/src/Fields/computed_field.jl

Lines 84 to 87 in c3b688f

    
           @kernel function _compute!(data, operand) 
        
               i, j, k = @index(Global, NTuple) 
        
               @inbounds data[i, j, k] = operand[i, j, k] 
        
           end

calling getindex(operand, i, j, k) (or equivalently operand[i, j, k]) triggers recursive getindex calls that traverse the AbstractOperation tree. It seems that when the tree is too large, this traversal cannot be entirely compiled.

Perhaps there are tricks we might use to help the compiler parse this kind of operation, like putting some type annotations / hints into getindex(b::BinaryOperation, ...). Not sure. Another possibility is to figure out how to simplify the object BinaryOperation, MultiaryOperation, so that the compiler is less stressed trying to compile them... ?

ali-ramadhan · 2020-12-04T15:21:44Z

Hmmm, if it's indeed incapable of compiling the entire operation might be helpful to be able to find out exactly at which size it fails.

I'm still thinking that it might be something to be fixed/improved in CUDA.jl.

I can try to do some test as I update #870.

francispoulin · 2020-12-04T15:35:23Z

I tried the code above and do get the same error, so I can confirm that.

I tried to trim it down and found that the following, slightly more minimal example, produces the same error. It seems that squaring and multiplying togther is too much. But if you remove one or the other, it seems to work fine.

using Oceananigans
using Oceananigans.AbstractOperations
using Oceananigans.Fields

grid = RegularCartesianGrid(size=(1, 1, 1), extent=(1, 1, 1)) 

model = IncompressibleModel(architecture=GPU(), grid=grid)

u, v, w = model.velocities

ϵ = 2 * (∂x(u)^2)

ϵ_field = ComputedField(ϵ)

compute!(ϵ_field)

glwagner · 2020-12-04T15:47:48Z

Hmm that's good to know the limit of complexity. I can confirm that behavior, eg:

julia> compute!(ComputedField(2 * Σˣˣ))

julia> compute!(ComputedField(Σˣˣ^2))

julia> compute!(ComputedField(2 * Σˣˣ^2))
ERROR: InvalidIRError: compiling kernel gpu__compute!...

Here's the tree for 2 * Σˣˣ^2:

julia> 2 * Σˣˣ^2
BinaryOperation at (Cell, Cell, Cell)
├── grid: RegularCartesianGrid{Float64, Periodic, Periodic, Bounded}(Nx=1, Ny=1, Nz=1)
│   └── domain: x ∈ [0.0, 1.0], y ∈ [0.0, 1.0], z ∈ [-1.0, 0.0]
└── tree: 
    * at (Cell, Cell, Cell) via identity
    ├── 2
    └── ^ at (Cell, Cell, Cell) via identity
        ├── ∂xᶜᵃᵃ at (Cell, Cell, Cell) via identity
        │   └── Field located at (Face, Cell, Cell)
        └── 2

glwagner · 2020-12-04T15:49:31Z

Also

julia> compute!(ComputedField(Σˣˣ^2 + Σʸʸ^2))
ERROR: InvalidIRError: compiling kernel gpu__compute!...

glwagner · 2020-12-04T15:51:38Z

It's not strictly a nesting issue, since:

julia> compute!(ComputedField(∂x(∂x(u))))

julia> compute!(ComputedField(∂x(∂x(∂x(u)))))

julia> compute!(ComputedField(∂x(∂x(∂x(∂x(u))))))

is fine.

glwagner · 2020-12-04T15:53:08Z

Whoa. Here's a hint.

julia> compute!(ComputedField(u^2 + v^2 + w^2))

julia> compute!(ComputedField(u^2 + v^2))
ERROR: InvalidIRError: compiling kernel gpu__compute!

!!

Note:

julia> u^2 + v^2
BinaryOperation at (Face, Cell, Cell)
├── grid: RegularCartesianGrid{Float64, Periodic, Periodic, Bounded}(Nx=1, Ny=1, Nz=1)
│   └── domain: x ∈ [0.0, 1.0], y ∈ [0.0, 1.0], z ∈ [-1.0, 0.0]
└── tree: 
    + at (Face, Cell, Cell) via identity
    ├── ^ at (Face, Cell, Cell) via identity
    │   ├── Field located at (Face, Cell, Cell)
    │   └── 2
    └── ^ at (Cell, Face, Cell) via identity
        ├── Field located at (Cell, Face, Cell)
        └── 2

julia> u^2 + v^2 + w^2
MultiaryOperation at (Face, Cell, Cell)
├── grid: RegularCartesianGrid{Float64, Periodic, Periodic, Bounded}(Nx=1, Ny=1, Nz=1)
│   └── domain: x ∈ [0.0, 1.0], y ∈ [0.0, 1.0], z ∈ [-1.0, 0.0]
└── tree: 
    + at (Face, Cell, Cell)
    ├── ^ at (Face, Cell, Cell) via identity
    │   ├── Field located at (Face, Cell, Cell)
    │   └── 2
    ├── ^ at (Cell, Face, Cell) via identity
    │   ├── Field located at (Cell, Face, Cell)
    │   └── 2
    └── ^ at (Cell, Cell, Face) via identity
        ├── Field located at (Cell, Cell, Face)
        └── 2

Surprisingly, MultiaryOperations are better behaved.

ali-ramadhan · 2020-12-04T15:54:41Z

I can confirm that I was able to get away with √(u^2 + v^2 + w^2) on GPUs in one of the particle tracking tests (PR #1091).

glwagner · 2020-12-04T15:55:41Z

While getindex for a MultiaryOperation is complicated (unrolls a loop!):

Oceananigans.jl/src/AbstractOperations/multiary_operations.jl

Lines 14 to 15 in c3b688f

    
           @inline Base.getindex(Π::MultiaryOperation{X, Y, Z, N}, i, j, k)  where {X, Y, Z, N} = 
        
               Π.op(ntuple(γ -> Π.▶[γ](i, j, k, Π.grid, Π.args[γ]), Val(N))...)

the MultiaryOperation object is simpler than a BinaryOperation because it stores fewer interpolation functions (we only support the case that every member of a multiary operation is first interpolated to a common location before being op'd on):

Oceananigans.jl/src/AbstractOperations/multiary_operations.jl

Lines 3 to 12 in c3b688f

    
           struct MultiaryOperation{X, Y, Z, N, O, A, I, G} <: AbstractOperation{X, Y, Z, G} 
        
                 op :: O 
        
               args :: A 
        
                  ▶ :: I 
        
               grid :: G 
        
               function MultiaryOperation{X, Y, Z}(op, args, ▶, grid) where {X, Y, Z} 
        
                   return new{X, Y, Z, length(args), typeof(op), typeof(args), typeof(▶), typeof(grid)}(op, args, ▶, grid) 
        
               end 
        
           end

compare with

Oceananigans.jl/src/AbstractOperations/binary_operations.jl

Lines 8 to 32 in c3b688f

    
           struct BinaryOperation{X, Y, Z, O, A, B, IA, IB, IΩ, G} <: AbstractOperation{X, Y, Z, G} 
        
                 op :: O 
        
                  a :: A 
        
                  b :: B 
        
                 ▶a :: IA 
        
                 ▶b :: IB 
        
                ▶op :: IΩ 
        
               grid :: G 
        
               """ 
        
                   BinaryOperation{X, Y, Z}(op, a, b, ▶a, ▶b, ▶op, grid) 
        
               Returns an abstract representation of the binary operation `op(▶a(a), ▶b(b))`, 
        
               followed by interpolation by `▶op` to `(X, Y, Z)`, where `▶a` and `▶b` interpolate 
        
               `a` and `b` to a common location. 
        
               """ 
        
               function BinaryOperation{X, Y, Z}(op, a, b, ▶a, ▶b, ▶op, grid) where {X, Y, Z} 
        
                   any((X, Y, Z) .=== Nothing) && throw(ArgumentError("Nothing locations are invalid! " * 
        
                                                                      "Cannot construct BinaryOperation at ($X, $Y, $Z).")) 
        
                   return new{X, Y, Z, typeof(op), typeof(a), typeof(b), typeof(▶a), typeof(▶b), 
        
                              typeof(▶op), typeof(grid)}(op, a, b, ▶a, ▶b, ▶op, grid) 
        
               end 
        
           end

glwagner · 2020-12-04T17:41:04Z

On problem 1: meeting with @vchuravy we think there is a "recursive call cycle with levels of indirection". In other words, calling getindex on a BinaryOperation:

Oceananigans.jl/src/AbstractOperations/binary_operations.jl

Line 34 in c3b688f

    
           @inline Base.getindex(β::BinaryOperation, i, j, k) = β.▶op(i, j, k, β.grid, β.op, β.▶a, β.▶b, β.a, β.b)

calls β.▶op = identity:

Oceananigans.jl/src/Operators/interpolation_utils.jl

Line 11 in c3b688f

    
           @inline identity(i, j, k, grid, F::TF, args...) where TF<:Function = F(i, j, k, grid, args...)

which calls op:

Oceananigans.jl/src/AbstractOperations/binary_operations.jl

Lines 59 to 60 in c3b688f

    
           @inline $op(i, j, k, grid::AbstractGrid, ▶a, ▶b, a, b) = 
        
               @inbounds $op(▶a(i, j, k, grid, a), ▶b(i, j, k, grid, b))

which may invoke another call to either ▶a=identity or ▶b=identity (which might subsequently go back to getindex...) Due to some aspect of this process, the compiler throws up its hands because of something like "you wouldn't want unbound recursion in your compiler, right?"

A possible solution, which is also a hilarious hack, is to define multiple identity functions and use an internal counter to use the different yet identical copies of this function when constructing BinaryOperations. Then we wouldn't be recursing over identity (from the compiler's point of view), since we'd be calling identity1, identity2, etc. If we also need different flavors of BinaryOperation, we can do that too...

vchuravy · 2020-12-04T17:55:17Z

Alternatively we could think about linearizing the recursion before we go off and compile it.

tomchor · 2020-12-16T16:22:18Z

I'm not sure if it matters too much, but I'm going over some examples posted here and I can't reproduce some of the examples by @glwagner. I can compute some stuff that he can't, while the opposite is true for others

julia> compute!(ComputedField(∂x(∂x(u)))) # works as expected

julia> compute!(ComputedField(∂x(∂x(∂x(u))))) # works as expected

julia> compute!(ComputedField(∂x(∂x(∂x(∂x(u)))))) # doesn't work (not expected)

julia> compute!(ComputedField(u^2 + v^2)) # works (unexpected)

julia> compute!(ComputedField(u^2 + v^2 + w^2)) # works (expected)

Hopefully that'll give you guys more hints on what's going on. I'm running this on a Tesla V100 at NCAR's Casper cluster btw.

Julia version 1.5.2
Oceananigans version 0.45.2.
For some reason CUDA doesn't appear when I type ]st, but I think it's 2.2.1 (sorry for the unfamiliarity with Julia)

EDIT:

The results above were obtained initializing a minimum model as posted above. I tried the same examples I just posted initializing a model differently and many more things are failing now (for example, compute!(ComputedField(u^2 + v^2)) fails).

glwagner · 2020-12-16T17:19:21Z

Can you please post the errors that you obtain in each case?

tomchor · 2020-12-16T17:28:44Z

Hmm, I can't reproduce the same results exactly. All I did before was honestly open a Julia session and just paste the examples you guys posted one by one.

Here's a pastebin with my whole session testing the commands I got in the previous post. (The comments of course don't reflect the outcome anymore.)

glwagner · 2020-12-16T23:16:18Z

Awesome, thank you, that's helpful. Stochastic errors are troubling.

tomchor · 2021-02-26T17:13:23Z

I'm not sure if people are still thinking about this, but I may have some relevant information (Good news!) that I'd appreciate some feedback on. Consider the following MWE:

using Oceananigans
using Oceananigans.Utils
using Oceananigans.Fields

Lx = 150; Ly = 6000; Lz = 80
topology = (Periodic, Bounded, Bounded)
grid = RegularRectilinearGrid(size=(1, 512, 8), x=(0, Lx), y=(0, Ly), z=(-Lz, 0),
                              topology=(Periodic, Bounded, Bounded))


model = IncompressibleModel(architecture = GPU(),
                            grid = grid,
                            )

w_ic(x, y, z) = 0.01*y
v_ic(x, y, z) = 0.01*x
set!(model, w=w_ic, v=v_ic)


import Oceananigans.Fields: ComputedField, KernelComputedField
using Oceananigans.AbstractOperations: @at, ∂x, ∂y, ∂z
using Oceananigans.Grids: Center, Face


u, v, w = model.velocities

function naive_calc()
    p = sum(model.pressures)
    wp = @at (Center, Center, Face) w*p
    dwpdz = (1/1024) * ∂z(wp)
    println(dwpdz)
    return ComputedField(dwpdz)
end

function nested_calc()
    p = ComputedField(sum(model.pressures))
    wp = ComputedField(@at (Center, Center, Face) w*p)
    dwpdz = (1/1024) * ∂z(wp)
    println(dwpdz)
    return ComputedField(dwpdz)
end

I can include this script in the REPL after which I get the following results. First, when trying to compute the naive calculation using a GPU I get an error, which is expected at this point:

julia> dwpdz_naive = naive_calc()
BinaryOperation at (Center, Center, Center)
├── grid: RegularRectilinearGrid{Float64, Periodic, Bounded, Bounded}(Nx=1, Ny=512, Nz=8)
│   └── domain: x ∈ [0.0, 150.0], y ∈ [0.0, 6000.0], z ∈ [-80.0, 0.0]
└── tree: 
    * at (Center, Center, Center) via identity
    ├── 0.0009765625
    └── ∂zᵃᵃᶜ at (Center, Center, Center) via identity
        └── * at (Center, Center, Face) via identity
            ├── Field located at (Center, Center, Face)
            └── + at (Center, Center, Center) via identity
                ├── Field located at (Center, Center, Center)
                └── Field located at (Center, Center, Center)
ComputedField located at (Center, Center, Center) of BinaryOperation at (Center, Center, Center)
├── data: OffsetArrays.OffsetArray{Float64,3,CUDA.CuArray{Float64,3}}, size: (3, 514, 10)
├── grid: RegularRectilinearGrid{Float64, Periodic, Bounded, Bounded}(Nx=1, Ny=512, Nz=8)
├── operand: BinaryOperation at (Center, Center, Center)
└── status: time=0.0


julia> compute!(dwpdz_naive)
ERROR: InvalidIRError: compiling kernel gpu__compute!(Cassette.Context{nametype(CUDACtx),KernelAbstractions.CompilerMetadata{KernelAbstractions.NDIteration.StaticSize{(1, 512, 8)},KernelAbstractions.NDIteration.DynamicCheck,Nothing,Nothing,KernelAbstractions.NDIteration.NDRange{3,KernelAbstractions.NDIteration.StaticSize{(1, 2, 8)},KernelAbstractions.NDIteration.StaticSize{(1, 256, 1)},Nothing,Nothing}},Nothing,KernelAbstractions.var"##PassType#253",Nothing,Cassette.DisableHooks}, typeof(Oceananigans.Fields.gpu__compute!), OffsetArrays.OffsetArray{Float64,3,CUDA.CuDeviceArray{Float64,3,1}}, Oceananigans.AbstractOperations.BinaryOperation{Center,Center,Center,typeof(*),Float64,Oceananigans.AbstractOperations.Derivative{Center,Center,Center,typeof(Oceananigans.Operators.∂zᵃᵃᶜ),Oceananigans.AbstractOperations.BinaryOperation{Center,Center,Face,typeof(*),OffsetArrays.OffsetArray{Float64,3,CUDA.CuDeviceArray{Float64,3,1}},Oceananigans.AbstractOperations.BinaryOperation{Center,Center,Center,typeof(+),OffsetArrays.OffsetArray{Float64,3,CUDA.CuDeviceArray{Float64,3,1}},OffsetArrays.OffsetArray{Float64,3,CUDA.CuDeviceArray{Float64,3,1}},typeof(identity),typeof(identity),typeof(identity),RegularRectilinearGrid{Float64,Periodic,Bounded,Bounded,OffsetArrays.OffsetArray{Float64,1,StepRangeLen{Float64,Base.TwicePrecision{Float64},Base.TwicePrecision{Float64}}}}},typeof(identity),typeof(Oceananigans.Operators.ℑzᵃᵃᶠ),typeof(identity),RegularRectilinearGrid{Float64,Periodic,Bounded,Bounded,OffsetArrays.OffsetArray{Float64,1,StepRangeLen{Float64,Base.TwicePrecision{Float64},Base.TwicePrecision{Float64}}}}},typeof(identity),RegularRectilinearGrid{Float64,Periodic,Bounded,Bounded,OffsetArrays.OffsetArray{Float64,1,StepRangeLen{Float64,Base.TwicePrecision{Float64},Base.TwicePrecision{Float64}}}}},typeof(identity),typeof(identity),typeof(identity),RegularRectilinearGrid{Float64,Periodic,Bounded,Bounded,OffsetArrays.OffsetArray{Float64,1,StepRangeLen{Float64,Base.TwicePrecision{Float64},Base.TwicePrecision{Float64}}}}}) resulted in invalid LLVM IR
Reason: unsupported dynamic function invocation (call to overdub(overdub_context::Cassette.Context, overdub_arguments...) in Cassette at /glade/u/home/tomasc/.julia/packages/Cassette/Wjztv/src/overdub.jl:595)
Stacktrace:
 [1] getindex at /glade/u/home/tomasc/.julia/packages/Oceananigans/WSSHu/src/AbstractOperations/binary_operations.jl:34
 [2] macro expansion at /glade/u/home/tomasc/.julia/packages/Oceananigans/WSSHu/src/Fields/computed_field.jl:114
 [3] gpu__compute! at /glade/u/home/tomasc/.julia/packages/KernelAbstractions/mKsXc/src/macros.jl:80
 [4] overdub at /glade/u/home/tomasc/.julia/packages/Cassette/Wjztv/src/overdub.jl:0
# I truncated the huge error message here

However, the nested calculation appears to work!:

julia> dwpdz_nested = nested_calc()
BinaryOperation at (Center, Center, Center)
├── grid: RegularRectilinearGrid{Float64, Periodic, Bounded, Bounded}(Nx=1, Ny=512, Nz=8)
│   └── domain: x ∈ [0.0, 150.0], y ∈ [0.0, 6000.0], z ∈ [-80.0, 0.0]
└── tree: 
    * at (Center, Center, Center) via identity
    ├── 0.0009765625
    └── ∂zᵃᵃᶜ at (Center, Center, Center) via identity
        └── ComputedField located at (Center, Center, Face) of BinaryOperation at (Center, Center, Face)
ComputedField located at (Center, Center, Center) of BinaryOperation at (Center, Center, Center)
├── data: OffsetArrays.OffsetArray{Float64,3,CUDA.CuArray{Float64,3}}, size: (3, 514, 10)
├── grid: RegularRectilinearGrid{Float64, Periodic, Bounded, Bounded}(Nx=1, Ny=512, Nz=8)
├── operand: BinaryOperation at (Center, Center, Center)
└── status: time=0.0


julia> compute!(dwpdz_nested)

julia> using Adapt

julia> adapt(Array, interior(dwpdz_nested))
1×512×8 view(OffsetArray(::Array{Float64,3}, 0:2, 0:513, 0:9), [1], [1, 2, 3, 4, 5, 6, 7, 8, 9, 10  …  503, 504, 505, 506, 507, 508, 509, 510, 511, 512], [1, 2, 3, 4, 5, 6, 7, 8]) with eltype Float64:
[:, :, 1] =
 0.0262775  0.014948  0.00902569  0.00559831  0.00351427  0.00221792  0.00140318  0.000888719  …  0.000263472  0.000420913  0.000674128  0.00108572  0.00177041  0.00296775  0.00528696

[:, :, 2] =
 0.0156731  0.0110235  0.00720451  0.00461103  0.00293129  0.00185902  0.00117792  0.00074609  …  0.000915116  0.00144732  0.00229007  0.00362638  0.0057484  0.00911035  0.0143023

[:, :, 3] =
 0.00844536  0.00652392  0.00451369  0.00297425  0.00191676  0.00122263  0.000776137  0.000491594  …  0.00126331  0.00199147  0.00313408  0.00491472  0.00764863  0.0117137  0.0173591

[:, :, 4] =
 0.00263363  0.0021138  0.00150181  0.00100316  0.000649705  0.000414395  0.000262323  0.00016544  …  0.00117766  0.00185426  0.00291374  0.00455993  0.00707827  0.0108175  0.0160812

[:, :, 5] =
 -0.00276553  -0.00220933  -0.00157462  -0.00105926  -0.000692536  -0.000446557  -0.00028606  …  0.00062928  0.000993307  0.0015715  0.00249751  0.00400184  0.0064944  0.0106821

[:, :, 6] =
 -0.00852329  -0.00657626  -0.00455259  -0.00300401  -0.00193944  -0.00123966  -0.000788713  …  -0.000470821  -0.000722127  -0.00106354  -0.00141766  -0.00138651  0.000390487

[:, :, 7] =
 -0.0156445  -0.0109965  -0.0071831  -0.00459454  -0.00291877  -0.00184967  -0.00117105  …  -0.00143385  -0.00226136  -0.00355998  -0.00557919  -0.00863921  -0.0129097  -0.0170153

[:, :, 8] =
 -0.0260963  -0.0148272  -0.00893539  -0.00552894  -0.00346128  -0.00217808  -0.00137374  …  -0.00251345  -0.00397508  -0.00630142  -0.0100415  -0.0161907  -0.0268075  -0.0470868

(Btw, the example above obviously works fine with architecture=CPU(), where I can check that dwpdz_nested actually produces the correct result.)

I even ran some other tests with even increased complexity. And they all appear to work on GPUs. For example this one:

function crazy_calc()
    p = ComputedField(sum(model.pressures))
    wp = ComputedField(@at (Center, Center, Face) w*p)
    dwpdz = (1/1024) * ∂z(wp)
    println(dwpdz)
    dwpdz = ComputedField(dwpdz)
    dwpdz2 = ComputedField(dwpdz^2)
    return ComputedField(dwpdz2+dwpdz)
end

I'd appreciate if some of you could try to reproduce this result on other machines. I ran this in one of NCAR's Tesla V100s. If you can reproduce this behavior, then this kinda makes KernelComputedFields obsolete, no?

glwagner · 2021-02-26T18:19:55Z

I think there's still a use for KernelComputedField! It's more expensive to construct a calculation by nesting ComputedFields inside it. The difference is in the number of kernel launches: when expressing a calculation with a single kernel, the computation is computed with a single loop over all grid points. When expressing a calculation using intermediate ComputedField, then first we launch kernels to calculate the intermediate ComputedFields, and launch a final kernel to compute the quantity of interest. It can also be more memory intensive since the intermediate ComputedFields need to be stored.

For some applications, the "optimization" of avoiding intermediate kernel launches / calculations may be unimportant (for example, if plenty of memory is available and computations are made very rarely). I think nesting ComputedFields is a nice solution that avoids having to hand-write kernels for those cases. But sometimes I do think that users want to optimize diagnostics calculations.

There are also some other nice applications of KernelComputedField; for example I think it would be nice to make the LES eddy diffusivities into KernelComputedField (which are computed every tendency evaluation so performance is crucial), and perhaps some other auxiliary variables.

I think it would be fun to try the compiler hack-around that I suggested in my comment above (defining multiple identity functions and using an internal counter to scroll through them during compilation). It's just a matter of finding the time to do it.

tomchor · 2021-04-17T18:36:33Z

@glwagner Should we keep this open though, since we still haven't solved all the issues? (I know it's unlikely we'll ever solve all of them, but we might be able to make more progress).

glwagner · 2021-04-17T19:07:52Z

I'm fine with that. It makes sense if we think there's Oceananigans.jl development we might to do to solve the issue. I'm not entirely sure that's true right now. But here's where we stand on master currently:

julia> using Oceananigans
[ Info: Precompiling Oceananigans [9e8cae18-63c1-5223-a75c-80ca9d6e9a09]
[ Info: Oceananigans will use 24 threads

julia> grid = RegularRectilinearGrid(size=(1, 1, 1), extent=(1, 1, 1)); model = IncompressibleModel(architecture=GPU(), grid=grid)
IncompressibleModel{GPU, Float64}(time = 0 seconds, iteration = 0) 
├── grid: RegularRectilinearGrid{Float64, Periodic, Periodic, Bounded}(Nx=1, Ny=1, Nz=1)
├── tracers: (:T, :S)
├── closure: IsotropicDiffusivity{Float64,NamedTuple{(:T, :S),Tuple{Float64,Float64}}}
├── buoyancy: SeawaterBuoyancy{Float64,LinearEquationOfState{Float64},Nothing,Nothing}
└── coriolis: Nothing

julia> u, v, w = model.velocities
(u = Field located at (Face, Center, Center)
├── data: OffsetArrays.OffsetArray{Float64,3,CUDA.CuArray{Float64,3}}, size: (3, 3, 3)
├── grid: RegularRectilinearGrid{Float64, Periodic, Periodic, Bounded}(Nx=1, Ny=1, Nz=1)
└── boundary conditions: x=(west=Periodic, east=Periodic), y=(south=Periodic, north=Periodic), z=(bottom=ZeroFlux, top=ZeroFlux), v = Field located at (Center, Face, Center)
├── data: OffsetArrays.OffsetArray{Float64,3,CUDA.CuArray{Float64,3}}, size: (3, 3, 3)
├── grid: RegularRectilinearGrid{Float64, Periodic, Periodic, Bounded}(Nx=1, Ny=1, Nz=1)
└── boundary conditions: x=(west=Periodic, east=Periodic), y=(south=Periodic, north=Periodic), z=(bottom=ZeroFlux, top=ZeroFlux), w = Field located at (Center, Center, Face)
├── data: OffsetArrays.OffsetArray{Float64,3,CUDA.CuArray{Float64,3}}, size: (3, 3, 4)
├── grid: RegularRectilinearGrid{Float64, Periodic, Periodic, Bounded}(Nx=1, Ny=1, Nz=1)
└── boundary conditions: x=(west=Periodic, east=Periodic), y=(south=Periodic, north=Periodic), z=(bottom=NormalFlow, top=NormalFlow))

julia> compute!(ComputedField(u + v - w)) # Now this works

julia> compute!(ComputedField(∂x(u)^2 + ∂y(v)^2 + ∂z(w)^2 + ∂x(w)^2)) # still doesnt work
ERROR: CUDA error: device kernel image is invalid (code 200, ERROR_INVALID_IMAGE)

julia> compute!(ComputedField(∂x(u)^2 + ∂y(v)^2 + ∂z(w)^2 + ∂x(w)^2 + ∂y(w)^2)) # still doesn't work
ERROR: CUDA error: a PTX JIT compilation failed (code 218, ERROR_INVALID_PTX)
ptxas application ptx input, line 802; error   : Entry function '_Z19julia_gpu__compute_7ContextI14__CUDACtx_Name16CompilerMetadataI10StaticSizeI9_1__1__1_E12DynamicCheckvv7NDRangeILi3ES2_I9_1__1__1_ES2_I9_1__1__1_EvvEEv14__PassType_253v12DisableHooksE14_gpu__compute_11OffsetArrayI7Float64Li3E13CuDeviceArrayIS9_Li3ELi1EEE17MultiaryOperationI6CenterS12_S12_Li5E2__5TupleI15BinaryOperationIS12_S12_S12_S13_10DerivativeIS12_S12_S12_6__x___S8_IS9_Li3ES10_IS9_Li3ELi1EEE10_identity222RegularRectilinearGridIS9_8PeriodicS20_7BoundedS8_IS9_Li1E12StepRangeLenIS9_14TwicePrecisionIS9_ES23_IS9_EEEEE5Int6410_identity310_identity410_identity5S19_IS9_S20_S20_S21_S8_IS9_Li1ES22_IS9_S23_IS9_ES23_IS9_EEEEES15_IS12_S12_S12_S13_S16_IS12_S12_S12_6__y___S8_IS9_Li3ES10_IS9_Li3ELi1EEE10_identity1S19_IS9_S20_S20_S21_S8_IS9_Li1ES22_IS9_S23_IS9_ES23_IS9_EEEEES24_S18_S25_S26_S19_IS9_S20_S20_S21_S8_IS9_Li1ES22_IS9_S23_IS9_ES23_IS9_EEEEES15_IS12_S12_S12_S13_S16_IS12_S12_S12_6__z___S8_IS9_Li3ES10_IS9_Li3ELi1EEES27_S19_IS9_S20_S20_S21_S8_IS9_Li1ES22_IS9_S23_IS9_ES23_IS9_EEEEES24_S29_S18_S25_S19_IS9_S20_S20_S21_S8_IS9_Li1ES22_IS9_S23_IS9_ES23_IS9_EEEEES15_I4FaceS12_S31_S13_S16_IS31_S12_S31_S17_S8_IS9_Li3ES10_IS9_Li3ELi1EEES26_S19_IS9_S20_S20_S21_S8_IS9_Li1ES22_IS9_S23_IS9_ES23_IS9_EEEEES24_S27_S29_S18_S19_IS9_S20_S20_S21_S8_IS9_Li1ES22_IS9_S23_IS9_ES23_IS9_EEEEES15_IS12_S31_S31_S13_S16_IS12_S31_S31_S28_S8_IS9_Li3ES10_IS9_Li3ELi1EEES25_S19_IS9_S20_S20_S21_S8_IS9_Li1ES22_IS9_S23_IS9_ES23_IS9_EEEEES24_S26_S27_S29_S19_IS9_S20_S20_S21_S8_IS9_Li1ES22_IS9_S23_IS9_ES23_IS9_EEEEEES14_IS18_S25_S26_7__xz___7__yz___ES19_IS9_S20_S20_S21_S8_IS9_Li1ES22_IS9_S23_IS9_ES23_IS9_EEEEE' uses too much parameter space (0x1408 bytes, 0x1100 max).
ptxas fatal   : Ptx assembly aborted due to errors

We haven't discussed the problem with AveragedField on this thread. I think we need a new issue for that (can't remember if there already is one). It actually looks like some, but not all of the issue there have been solved. A minimal example is:

julia> U = AveragedField(u, dims=(1, 2))
AveragedField over dims=(1, 2) located at (⋅, ⋅, Center) of Field located at (Face, Center, Center)
├── data: OffsetArrays.OffsetArray{Float64,3,CUDA.CuArray{Float64,3}}, size: (1, 1, 3)
├── grid: RegularRectilinearGrid{Float64, Periodic, Periodic, Bounded}(Nx=1, Ny=1, Nz=1)
├── dims: (1, 2)
├── operand: Field located at (Face, Center, Center)
└── status: time=0.0

julia> compute!(ComputedField(u - U))

julia> compute!(ComputedField((u - U)^2))

julia> V = AveragedField(v, dims=(1, 2))
AveragedField over dims=(1, 2) located at (⋅, ⋅, Center) of Field located at (Center, Face, Center)
├── data: OffsetArrays.OffsetArray{Float64,3,CUDA.CuArray{Float64,3}}, size: (1, 1, 3)
├── grid: RegularRectilinearGrid{Float64, Periodic, Periodic, Bounded}(Nx=1, Ny=1, Nz=1)
├── dims: (1, 2)
├── operand: Field located at (Center, Face, Center)
└── status: time=0.0

julia> tke = 1/2 * ((u - U)^2 + (v - V)^2 + w^2)
BinaryOperation at (Face, Center, Center)
├── grid: RegularRectilinearGrid{Float64, Periodic, Periodic, Bounded}(Nx=1, Ny=1, Nz=1)
│   └── domain: x ∈ [0.0, 1.0], y ∈ [0.0, 1.0], z ∈ [-1.0, 0.0]
└── tree: 
    * at (Face, Center, Center) via identity
    ├── 0.5
    └── + at (Face, Center, Center)
        ├── ^ at (Face, Center, Center) via identity
        │   ├── - at (Face, Center, Center) via identity
        │   │   ├── Field located at (Face, Center, Center)
        │   │   └── AveragedField over dims=(1, 2) located at (⋅, ⋅, Center) of Field located at (Face, Center, Center)
        │   └── 2
        ├── ^ at (Center, Face, Center) via identity
        │   ├── - at (Center, Face, Center) via identity
        │   │   ├── Field located at (Center, Face, Center)
        │   │   └── AveragedField over dims=(1, 2) located at (⋅, ⋅, Center) of Field located at (Center, Face, Center)
        │   └── 2
        └── ^ at (Center, Center, Face) via identity
            ├── Field located at (Center, Center, Face)
            └── 2

julia> compute!(ComputedField(tke))
ERROR: InvalidIRError: compiling kernel gpu__compute!

Interestingly, this works:

julia> tke = ((u - U)^2 + (v - V)^2 + w^2) / 2
BinaryOperation at (Face, Center, Center)
├── grid: RegularRectilinearGrid{Float64, Periodic, Periodic, Bounded}(Nx=1, Ny=1, Nz=1)
│   └── domain: x ∈ [0.0, 1.0], y ∈ [0.0, 1.0], z ∈ [-1.0, 0.0]
└── tree: 
    / at (Face, Center, Center) via identity
    ├── + at (Face, Center, Center)
    │   ├── ^ at (Face, Center, Center) via identity
    │   │   ├── - at (Face, Center, Center) via identity
    │   │   │   ├── Field located at (Face, Center, Center)
    │   │   │   └── AveragedField over dims=(1, 2) located at (⋅, ⋅, Center) of Field located at (Face, Center, Center)
    │   │   └── 2
    │   ├── ^ at (Center, Face, Center) via identity
    │   │   ├── - at (Center, Face, Center) via identity
    │   │   │   ├── Field located at (Center, Face, Center)
    │   │   │   └── AveragedField over dims=(1, 2) located at (⋅, ⋅, Center) of Field located at (Center, Face, Center)
    │   │   └── 2
    │   └── ^ at (Center, Center, Face) via identity
        │   ├── Field located at (Center, Center, Face)
        │   └── 2
    └── 2

julia> compute!(ComputedField(tke))

So I guess we are almost there with AveragedField. I think the order or arguments could actually matter for compilation due to heuristics that are invoked during compilation... ?

We might update the test for computations with AveragedField because they might pass with a different ordering. We could also change how AbstractOperations are built and put Number as the second argument regardless of how the expression is written.

glwagner · 2021-04-17T19:10:11Z

This doesn't work sadly:

julia> tke = @at (Center, Center, Center) ((u - U)^2 + (v - V)^2 + w^2) / 2
BinaryOperation at (Center, Center, Center)
├── grid: RegularRectilinearGrid{Float64, Periodic, Periodic, Bounded}(Nx=1, Ny=1, Nz=1)
│   └── domain: x ∈ [0.0, 1.0], y ∈ [0.0, 1.0], z ∈ [-1.0, 0.0]
└── tree: 
    / at (Center, Center, Center) via identity
    ├── + at (Center, Center, Center)
    │   ├── ^ at (Center, Center, Center) via identity
    │   │   ├── - at (Center, Center, Center) via identity
    │   │   │   ├── Field located at (Face, Center, Center)
    │   │   │   └── AveragedField over dims=(1, 2) located at (⋅, ⋅, Center) of Field located at (Face, Center, Center)
    │   │   └── 2
    │   ├── ^ at (Center, Center, Center) via identity
    │   │   ├── - at (Center, Center, Center) via identity
    │   │   │   ├── Field located at (Center, Face, Center)
    │   │   │   └── AveragedField over dims=(1, 2) located at (⋅, ⋅, Center) of Field located at (Center, Face, Center)
    │   │   └── 2
    │   └── ^ at (Center, Center, Center) via ℑzᵃᵃᶜ
        │   ├── Field located at (Center, Center, Face)
        │   └── 2
    └── 2

julia> compute!(ComputedField(tke))
ERROR: InvalidIRError: compiling kernel gpu__compute!

There's something fishy there, worth looking into. We can change the topic of this issue to focus on AveragedField or open a new one. Happy to do either.

EDIT: I see the issue. While we ensure that binary operations always occur at the common location for fields, we haven't ensured that operations between fields and ReducedFields occur at the location of the 3D field. We can fix this by defining a few more functions.

tomchor · 2021-04-17T20:28:43Z

There's something fishy there, worth looking into. We can change the topic of this issue to focus on AveragedField or open a new one. Happy to do either.

Thanks for all the work! I think I'll reopen this thread then since the issue is not resolved and we may still be able to make progress. Also there's a chance that someone will see this thread and have some good ideas now that we specifically link this thread in the GPU docs.

EDIT: I see the issue. While we ensure that binary operations always occur at the common location for fields, we haven't ensured that operations between fields and ReducedFields occur at the location of the 3D field. We can fix this by defining a few more functions.

That's awesome! I vote for another separate issue for the averaged fields in the name of organization. But I'll let you decide that one since you already have insights into how to solve it.

tomchor · 2021-04-20T00:45:32Z

@glwagner Just FYI, some of the things that did not work for you, actually worked for me. Most notably:

julia> tke = @at (Center, Center, Center) ((u - U)^2 + (v - V)^2 + w^2) / 2
BinaryOperation at (Center, Center, Center)
├── grid: RegularRectilinearGrid{Float64, Periodic, Periodic, Bounded}(Nx=1, Ny=1, Nz=1)
│   └── domain: x ∈ [0.0, 1.0], y ∈ [0.0, 1.0], z ∈ [-1.0, 0.0]
└── tree: 
    / at (Center, Center, Center) via identity
    ├── + at (Center, Center, Center)
    │   ├── ^ at (Center, Center, Center) via identity
    │   │   ├── - at (Center, Center, Center) via identity
    │   │   │   ├── Field located at (Face, Center, Center)
    │   │   │   └── AveragedField over dims=(1, 2) located at (⋅, ⋅, Center) of Field located at (Face, Center, Center)
    │   │   └── 2
    │   ├── ^ at (Center, Center, Center) via identity
    │   │   ├── - at (Center, Center, Center) via identity
    │   │   │   ├── Field located at (Center, Face, Center)
    │   │   │   └── AveragedField over dims=(1, 2) located at (⋅, ⋅, Center) of Field located at (Center, Face, Center)
    │   │   └── 2
    │   └── ^ at (Center, Center, Center) via ℑzᵃᵃᶜ
        │   ├── Field located at (Center, Center, Face)
        │   └── 2
    └── 2

julia> compute!(tke)

julia>

So it appears to be machine-dependent at least to some extent.

glwagner · 2021-07-22T12:10:14Z

The last lingering issue mentioned in #1241 (comment) was closed by #1599 ! See

#1599 (comment)

tomchor · 2021-07-22T13:05:12Z

The last lingering issue mentioned in #1241 (comment) was closed by #1599 ! See

#1599 (comment)

It did, but the overall issue still remains, no?

I'd vote for us to keep this issue open until we can compile at least dissipation rate calculations with AbstractOperations. It's a a high bar I think, but having the issue open (which we actually reference in the docs) might motivate someone to take a look.

glwagner · 2021-07-22T13:15:23Z

Interesting. I will open a new issue that's specific to the current issue with a concise summary. It's hard to parse the conversation in this long issue. I guess we will also have to update the docs for this.

tomchor · 2021-07-22T13:29:49Z

Sounds good. I'll change the docs after you open the new issue.

glwagner added the abstractions 🎨 Whatever that means label Dec 2, 2020

glwagner changed the title ~~Calculation of averaged field fails for GPUs but is successful or CPUs~~ Complex AbstractOperations cannot be computed on GPU Dec 2, 2020

tomchor mentioned this issue Dec 3, 2020

Roadmap to version 1.0 #1234

Closed

8 tasks

ali-ramadhan mentioned this issue Dec 4, 2020

Upper limit to operation complexity on the GPU? CliMA/Impero.jl#24

Closed

glwagner mentioned this issue Dec 4, 2020

Abstraction for using "custom" kernels to compute fields (plus an example)? #1246

Closed

ali-ramadhan added the 🚨 high priority 🚨 label Dec 4, 2020

glwagner mentioned this issue Dec 5, 2020

Factor of two missing in ViscousDissipation CliMA/LESbrary.jl#38

Closed

glwagner mentioned this issue Feb 26, 2021

Zero result when using pressure derivatives in KernelComputedFields #1401

Closed

ali-ramadhan mentioned this issue Mar 11, 2021

Grid metrics and differences as AbstractOperations #1452

Closed

ali-ramadhan mentioned this issue Mar 18, 2021

One stop shop: Oceananigans.DataAnalysis through abstract operations #1493

Closed

This was referenced Mar 31, 2021

Benchmark abstract operations #870

Merged

Includes simulation tips section in the docs #1543

Merged

glwagner mentioned this issue Apr 16, 2021

Defines many identity's to avoid recursion when compiling AbstractOperations #1595

Merged

glwagner closed this as completed in #1595 Apr 17, 2021

tomchor reopened this Apr 17, 2021

glwagner mentioned this issue Apr 23, 2021

Added information about CuArrays #1609

Merged

glwagner closed this as completed Jul 22, 2021

glwagner mentioned this issue Jul 22, 2021

PTX compilation errors with GPU AbstractOperations #1886

Closed

tomchor mentioned this issue Jul 22, 2021

Update issue referenced in simulation_tips.md #1887

Merged

Complex AbstractOperations cannot be computed on GPU #1241

Complex AbstractOperations cannot be computed on GPU #1241

Comments

tomchor commented Dec 2, 2020

glwagner commented Dec 2, 2020

tomchor commented Dec 2, 2020

glwagner commented Dec 2, 2020

glwagner commented Dec 2, 2020

glwagner commented Dec 2, 2020 • edited Loading

tomchor commented Dec 2, 2020

glwagner commented Dec 2, 2020

glwagner commented Dec 3, 2020 • edited Loading

francispoulin commented Dec 4, 2020

ali-ramadhan commented Dec 4, 2020 • edited Loading

glwagner commented Dec 4, 2020

glwagner commented Dec 4, 2020 • edited Loading

francispoulin commented Dec 4, 2020

glwagner commented Dec 4, 2020

ali-ramadhan commented Dec 4, 2020

glwagner commented Dec 4, 2020 • edited Loading

ali-ramadhan commented Dec 4, 2020

francispoulin commented Dec 4, 2020

glwagner commented Dec 4, 2020

glwagner commented Dec 4, 2020

glwagner commented Dec 4, 2020

glwagner commented Dec 4, 2020

ali-ramadhan commented Dec 4, 2020 • edited Loading

glwagner commented Dec 4, 2020

glwagner commented Dec 4, 2020 • edited Loading

vchuravy commented Dec 4, 2020

tomchor commented Dec 16, 2020 • edited Loading

glwagner commented Dec 16, 2020

tomchor commented Dec 16, 2020

glwagner commented Dec 16, 2020

tomchor commented Feb 26, 2021

glwagner commented Feb 26, 2021

tomchor commented Apr 17, 2021

glwagner commented Apr 17, 2021

glwagner commented Apr 17, 2021 • edited Loading

tomchor commented Apr 17, 2021

tomchor commented Apr 20, 2021

glwagner commented Jul 22, 2021

tomchor commented Jul 22, 2021

glwagner commented Jul 22, 2021

tomchor commented Jul 22, 2021

glwagner commented Dec 2, 2020 •

edited

Loading

glwagner commented Dec 3, 2020 •

edited

Loading

ali-ramadhan commented Dec 4, 2020 •

edited

Loading

glwagner commented Dec 4, 2020 •

edited

Loading

glwagner commented Dec 4, 2020 •

edited

Loading

ali-ramadhan commented Dec 4, 2020 •

edited

Loading

glwagner commented Dec 4, 2020 •

edited

Loading

tomchor commented Dec 16, 2020 •

edited

Loading

glwagner commented Apr 17, 2021 •

edited

Loading