Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimizes fill_halo_regions_open.jl #3792

Closed
wants to merge 6 commits into from
Closed

Optimizes fill_halo_regions_open.jl #3792

wants to merge 6 commits into from

Conversation

jagoosw
Copy link
Collaborator

@jagoosw jagoosw commented Sep 25, 2024

I was doing some profiling on a model with no open boundaries and discovered that this function was causing a big slow down. I guess this is because the compiler isn't managing to work out its just a load of nothing operations but this change appears to make it completely go away.

I was doing some profiling on a model with no open boundaries and discovered that this function was causing a big slow down. I guess this is because the compiler isn't managing to work out its just a load of nothing operations but this change appears to make it completely go away.
@jagoosw jagoosw added performance 🏍️ So we can get the wrong answer even faster GPU 👾 Where Oceananigans gets its powers from labels Sep 25, 2024
@simone-silvestri
Copy link
Collaborator

what was the slowdown?

@glwagner
Copy link
Member

Might be good to profile with fill_open_boundary_regions!(fields::NTuple, boundary_conditions, indices, loc, grid, args...; kwargs...) = nothing also to check this

Comment on lines 26 to 27
fill_open_boundary_regions!(fields::NTuple, boundary_conditions, indices, loc, grid, args...; kwargs...) =
ntuple(n->fill_open_boundary_regions!(fields[n], boundary_conditions[n], indices, loc[n], grid, args...; kwargs...), Val(length(fields)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
fill_open_boundary_regions!(fields::NTuple, boundary_conditions, indices, loc, grid, args...; kwargs...) =
ntuple(n->fill_open_boundary_regions!(fields[n], boundary_conditions[n], indices, loc[n], grid, args...; kwargs...), Val(length(fields)))
function fill_open_boundary_regions!(fields::NTuple, bcs, indices, loc, args...; kw...)
N = length(fields)
return ntuple(n -> fill_open_boundary_regions!(fields[n], bcs[n], indices, loc[n], args...; kw...), Val(N))
end

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we can't read the code, we've little to show for all our hard work

@glwagner
Copy link
Member

glwagner commented Sep 26, 2024

The profiles should be documented here. I'll tentatively approve but the documentation is important

@jagoosw
Copy link
Collaborator Author

jagoosw commented Sep 26, 2024

Ah so I've realised this isn't the fix we needed, and I was just hiding it from myself in the profile because I replaced the function by writing it in the REPL. I made an MWE:

using Oceananigans

grid = RectilinearGrid(GPU(), topology = (Flat, Flat, Bounded), size = (100, ), extent = (400, ))

model = HydrostaticFreeSurfaceModel(; grid, velocities = PrescribedVelocityFields(), momentum_advection=nothing, buoyancy=nothing, tracers = ntuple(n->Symbol(:T, n), Val(30)))
Screenshot 2024-09-26 at 12 00 29 You can see from this profile that `fill_open_boundary_regions!` takes a lot longer than `fill_halo_event!`, even though there are no velocity open boundaries. This is because it is launching a load of zero size kernels where as `fill_halo_event!` just returns nothing instead.

I've fixed this now and get this from the profile instead:
Screenshot 2024-09-26 at 12 02 11

In numbers, the original version benchmarks time_step! at around 4.074 ms ± 581.472 μs and the new version 2.438 ms ± 501.642 μs

@@ -17,13 +17,19 @@ function fill_open_boundary_regions!(field, boundary_conditions, indices, loc, g
open_fill, regular_fill = get_open_halo_filling_functions(loc)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a little convoluted. Why do you need a function to determine the size? Why can't we just use loc?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comes from normal halo filling and I think its main purpose is for windowed fields:

@inline fill_halo_size(::Tuple, ::WEB, args...) = :yz
@inline fill_halo_size(::Tuple, ::SNB, args...) = :xz
@inline fill_halo_size(::Tuple, ::TBB, args...) = :xy
# If indices are colon, and locations are _not_ Nothing, fill the whole boundary plane!
# If locations are _Nothing_, then the kwarg `reduced_dimensions` will allow the size `:xz`
# to be correctly interpreted inside `launch!`.
@inline fill_halo_size(::OffsetArray, ::WEB, ::Tuple{<:Any, <:Colon, <:Colon}, args...) = :yz
@inline fill_halo_size(::OffsetArray, ::SNB, ::Tuple{<:Colon, <:Any, <:Colon}, args...) = :xz
@inline fill_halo_size(::OffsetArray, ::TBB, ::Tuple{<:Colon, <:Colon, <:Any}, args...) = :xy
# If the index is a Colon and the location is _NOT_ a `Nothing` (i.e. not a `ReducedField`),
# then fill the whole boundary, otherwise fill the size of the corresponding array
@inline whole_halo(idx, loc) = false
@inline whole_halo(idx, ::Nothing) = false
@inline whole_halo(::Colon, ::Nothing) = false
@inline whole_halo(::Colon, loc) = true
# Calculate kernel size for windowed fields. This code is only called when
# one or more of the elements of `idx` is not Colon in the two direction perpendicular
# to the halo region and `bc` is not `PeriodicBoundaryCondition`.
@inline function fill_halo_size(c::OffsetArray, ::WEB, idx, bc, loc, grid)
@inbounds begin
whole_y_halo = whole_halo(idx[2], loc[2])
whole_z_halo = whole_halo(idx[3], loc[3])
end
_, Ny, Nz = size(grid)
_, Cy, Cz = size(c)
Sy = ifelse(whole_y_halo, Ny, Cy)
Sz = ifelse(whole_z_halo, Nz, Cz)
return (Sy, Sz)
end
@inline function fill_halo_size(c::OffsetArray, ::SNB, idx, bc, loc, grid)
@inbounds begin
whole_x_halo = whole_halo(idx[1], loc[1])
whole_z_halo = whole_halo(idx[3], loc[3])
end
Nx, _, Nz = size(grid)
Cx, _, Cz = size(c)
Sx = ifelse(whole_x_halo, Nx, Cx)
Sz = ifelse(whole_z_halo, Nz, Cz)
return (Sx, Sz)
end
@inline function fill_halo_size(c::OffsetArray, ::TBB, idx, bc, loc, grid)
@inbounds begin
whole_x_halo = whole_halo(idx[1], loc[1])
whole_y_halo = whole_halo(idx[2], loc[2])
end
Nx, Ny, _ = size(grid)
Cx, Cy, _ = size(c)
Sx = ifelse(whole_x_halo, Nx, Cx)
Sy = ifelse(whole_y_halo, Ny, Cy)
return (Sx, Sy)
end
# Remember that Periodic BCs also fill halo points!
@inline fill_halo_size(c::OffsetArray, ::WEB, idx, ::PBC, args...) = tuple(size(c, 2), size(c, 3))
@inline fill_halo_size(c::OffsetArray, ::SNB, idx, ::PBC, args...) = tuple(size(c, 1), size(c, 3))
@inline fill_halo_size(c::OffsetArray, ::TBB, idx, ::PBC, args...) = tuple(size(c, 1), size(c, 2))
@inline function fill_halo_size(c::OffsetArray, ::WEB, ::Tuple{<:Any, <:Colon, <:Colon}, ::PBC, args...)
_, Cy, Cz = size(c)
return (Cy, Cz)
end
@inline function fill_halo_size(c::OffsetArray, ::SNB, ::Tuple{<:Colon, <:Any, <:Colon}, ::PBC, args...)
Cx, _, Cz = size(c)
return (Cx, Cz)
end
@inline function fill_halo_size(c::OffsetArray, ::TBB, ::Tuple{<:Colon, <:Colon, <:Any}, ::PBC, args...)
Cx, Cy, _ = size(c)
return (Cx, Cy)
end

@@ -17,13 +17,19 @@ function fill_open_boundary_regions!(field, boundary_conditions, indices, loc, g
open_fill, regular_fill = get_open_halo_filling_functions(loc)
fill_size = fill_halo_size(field, regular_fill, indices, boundary_conditions, loc, grid)

launch!(arch, grid, fill_size, open_fill, field, left_bc, right_bc, loc, grid, args)
fill_open_halo_event!(open_fill, field, left_bc, right_bc, fill_size, loc, arch, grid, args)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
fill_open_halo_event!(open_fill, field, left_bc, right_bc, fill_size, loc, arch, grid, args)
fill_open_halo!(filling_function, field, left_bc, right_bc, fill_size, loc, arch, grid, args)

@jagoosw jagoosw changed the title Updates fill_halo_regions_open.jl Optimises fill_halo_regions_open.jl Sep 27, 2024
@glwagner
Copy link
Member

I think it's good to go.

The only thing that doesn't quite make sense to me is why

fill_size = fill_halo_size(field, regular_fill_function, indices, boundary_conditions, loc, grid)

depends on regular_fill_function, since

fill_function, regular_fill_function = get_open_halo_filling_functions(loc) 

and loc is an argument to both functions. It doesn't seem that from a purely logical point of view we need regular_fill_function at all here.

@glwagner glwagner changed the title Optimises fill_halo_regions_open.jl Optimizes fill_halo_regions_open.jl Sep 27, 2024
@jagoosw
Copy link
Collaborator Author

jagoosw commented Sep 30, 2024

I'll stream line those two line before I merge

@simone-silvestri
Copy link
Collaborator

Is this PR close to merging?

@tomchor
Copy link
Collaborator

tomchor commented Nov 14, 2024

cc @ali-ramadhan

@simone-silvestri
Copy link
Collaborator

I think this has been superseded by #3834. @jagoosw let us know if it is the case or not.

@jagoosw
Copy link
Collaborator Author

jagoosw commented Nov 14, 2024

I think this has been superseded by #3834. @jagoosw let us know if it is the case or not.

Yeah definitely superseded!

@jagoosw jagoosw closed this Nov 14, 2024
@simone-silvestri simone-silvestri deleted the jagoosw-patch-1 branch November 14, 2024 11:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
GPU 👾 Where Oceananigans gets its powers from performance 🏍️ So we can get the wrong answer even faster
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants