Halo regions need to be filled prior to computations and abstract operations #1063

glwagner · 2020-10-14T00:44:36Z

Currently halo regions are filled prior to performing a time-step. This means that after the time-step, they are incorrect. We therefore cannot output fields with correct halo regions, since data is outputted after a time-step is taken.

But it gets worse. If the average of a field is taken, we zero out the halo regions.

Zeroing out the halo regions corrupts near-boundary data for all subsequent computations with the fields. Currently, abstract operations cannot be trusted in boundary-adjacent cells.

To remedy this we need to fill halo regions on fields prior to performing computations. One way we might do this is to write a compute! method for fields:

compute!(field::Field) = fill_halo_regions!(field)

We can also define a conditional_compute! method for Fields and add a status property, so that halo regions are not filled "redundantly". For this to work, we also need to invalidate field.status when halo regions are zeroed out by compute!(averaged_field::AveragedField), (for example by setting field.status.time = NaN).

This won't work currently, of course, due to #971 . So this issue cannot be resolved until #971 is resolved.

The text was updated successfully, but these errors were encountered:

ali-ramadhan · 2020-10-14T10:32:48Z

But it gets worse. If the average of a field is taken, we zero out the halo regions.

Looks like we can use mean! on views to fix this issue as they don't perform scalar operations (interestingly mean uses scalar operations). Not sure if it recently started working or if I was blind to mean! when we first started taking horizontal averages but if we are using sum! we must have considered mean!...

julia> using Statistics, BenchmarkTools, CUDA

julia> CUDA.allowscalar(false)

julia> N = 512;

julia> Rgpu = randn(N+2, N+2, N+2) |> CuArray;

julia> Vgpu = @views Rgpu[2:N+1, 2:N+1, 2:N+1];

julia> vgpu = zeros(1, 1, N) |> CuArray;

julia> mean!(vgpu, Vgpu);

julia> @benchmark CUDA.@sync mean!(vgpu, Vgpu)
BenchmarkTools.Trial: 
  memory estimate:  2.08 KiB
  allocs estimate:  85
  --------------
  minimum time:     2.427 ms (0.00% GC)
  median time:      2.567 ms (0.00% GC)
  mean time:        2.584 ms (0.00% GC)
  maximum time:     8.747 ms (0.00% GC)
  --------------
  samples:          1930
  evals/sample:     1

which is basically the same speed as sum!

julia> @benchmark CUDA.@sync sum!(vgpu, Vgpu)
BenchmarkTools.Trial: 
  memory estimate:  1.48 KiB
  allocs estimate:  62
  --------------
  minimum time:     2.428 ms (0.00% GC)
  median time:      2.564 ms (0.00% GC)
  mean time:        2.566 ms (0.00% GC)
  maximum time:     3.228 ms (0.00% GC)
  --------------
  samples:          1944
  evals/sample:     1

and ~34x faster than the CPU version (maybe we usually expect more but reduction operations aren't the best for GPUs)

julia> using Statistics, BenchmarkTools

julia> N = 512;

julia> Rcpu = randn(N+2, N+2, N+2);

julia> Vcpu = @views Rcpu[2:N+1, 2:N+1, 2:N+1];

julia> vcpu = zeros(1, 1, N);

julia> mean!(vcpu, Vcpu);

julia> @benchmark mean!(vcpu, Vcpu)
BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     85.751 ms (0.00% GC)
  median time:      86.201 ms (0.00% GC)
  mean time:        86.316 ms (0.00% GC)
  maximum time:     87.483 ms (0.00% GC)
  --------------
  samples:          58
  evals/sample:     1

glwagner · 2020-10-14T11:22:44Z

That goes a really long way. Then the first point raised in the issue can be resolved either by filling halo regions after a time-step is complete, but before diagnostics / output is calculated, or by filling halos within write_output. If we do that then we don't need to fill halo regions before every computation (provided users haven't tinkered with them).

glwagner added the bug 🐞 Even a perfect program still has bugs label Oct 14, 2020

glwagner closed this as completed in #1083 Oct 21, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Halo regions need to be filled prior to computations and abstract operations #1063

Halo regions need to be filled prior to computations and abstract operations #1063

glwagner commented Oct 14, 2020

ali-ramadhan commented Oct 14, 2020 •

edited

Loading

glwagner commented Oct 14, 2020

Halo regions need to be filled prior to computations and abstract operations #1063

Halo regions need to be filled prior to computations and abstract operations #1063

Comments

glwagner commented Oct 14, 2020

ali-ramadhan commented Oct 14, 2020 • edited Loading

glwagner commented Oct 14, 2020

ali-ramadhan commented Oct 14, 2020 •

edited

Loading