Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Halo regions need to be filled prior to computations and abstract operations #1063

Closed
glwagner opened this issue Oct 14, 2020 · 2 comments · Fixed by #1083
Closed

Halo regions need to be filled prior to computations and abstract operations #1063

glwagner opened this issue Oct 14, 2020 · 2 comments · Fixed by #1083
Labels
bug 🐞 Even a perfect program still has bugs

Comments

@glwagner
Copy link
Member

Currently halo regions are filled prior to performing a time-step. This means that after the time-step, they are incorrect. We therefore cannot output fields with correct halo regions, since data is outputted after a time-step is taken.

But it gets worse. If the average of a field is taken, we zero out the halo regions.

Zeroing out the halo regions corrupts near-boundary data for all subsequent computations with the fields. Currently, abstract operations cannot be trusted in boundary-adjacent cells.

To remedy this we need to fill halo regions on fields prior to performing computations. One way we might do this is to write a compute! method for fields:

compute!(field::Field) = fill_halo_regions!(field)

We can also define a conditional_compute! method for Fields and add a status property, so that halo regions are not filled "redundantly". For this to work, we also need to invalidate field.status when halo regions are zeroed out by compute!(averaged_field::AveragedField), (for example by setting field.status.time = NaN).

This won't work currently, of course, due to #971 . So this issue cannot be resolved until #971 is resolved.

@glwagner glwagner added the bug 🐞 Even a perfect program still has bugs label Oct 14, 2020
@ali-ramadhan
Copy link
Member

ali-ramadhan commented Oct 14, 2020

But it gets worse. If the average of a field is taken, we zero out the halo regions.

Looks like we can use mean! on views to fix this issue as they don't perform scalar operations (interestingly mean uses scalar operations). Not sure if it recently started working or if I was blind to mean! when we first started taking horizontal averages but if we are using sum! we must have considered mean!...

julia> using Statistics, BenchmarkTools, CUDA

julia> CUDA.allowscalar(false)

julia> N = 512;

julia> Rgpu = randn(N+2, N+2, N+2) |> CuArray;

julia> Vgpu = @views Rgpu[2:N+1, 2:N+1, 2:N+1];

julia> vgpu = zeros(1, 1, N) |> CuArray;

julia> mean!(vgpu, Vgpu);

julia> @benchmark CUDA.@sync mean!(vgpu, Vgpu)
BenchmarkTools.Trial: 
  memory estimate:  2.08 KiB
  allocs estimate:  85
  --------------
  minimum time:     2.427 ms (0.00% GC)
  median time:      2.567 ms (0.00% GC)
  mean time:        2.584 ms (0.00% GC)
  maximum time:     8.747 ms (0.00% GC)
  --------------
  samples:          1930
  evals/sample:     1

which is basically the same speed as sum!

julia> @benchmark CUDA.@sync sum!(vgpu, Vgpu)
BenchmarkTools.Trial: 
  memory estimate:  1.48 KiB
  allocs estimate:  62
  --------------
  minimum time:     2.428 ms (0.00% GC)
  median time:      2.564 ms (0.00% GC)
  mean time:        2.566 ms (0.00% GC)
  maximum time:     3.228 ms (0.00% GC)
  --------------
  samples:          1944
  evals/sample:     1

and ~34x faster than the CPU version (maybe we usually expect more but reduction operations aren't the best for GPUs)

julia> using Statistics, BenchmarkTools

julia> N = 512;

julia> Rcpu = randn(N+2, N+2, N+2);

julia> Vcpu = @views Rcpu[2:N+1, 2:N+1, 2:N+1];

julia> vcpu = zeros(1, 1, N);

julia> mean!(vcpu, Vcpu);

julia> @benchmark mean!(vcpu, Vcpu)
BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     85.751 ms (0.00% GC)
  median time:      86.201 ms (0.00% GC)
  mean time:        86.316 ms (0.00% GC)
  maximum time:     87.483 ms (0.00% GC)
  --------------
  samples:          58
  evals/sample:     1

@glwagner
Copy link
Member Author

That goes a really long way. Then the first point raised in the issue can be resolved either by filling halo regions after a time-step is complete, but before diagnostics / output is calculated, or by filling halos within write_output. If we do that then we don't need to fill halo regions before every computation (provided users haven't tinkered with them).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug 🐞 Even a perfect program still has bugs
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants