-
Notifications
You must be signed in to change notification settings - Fork 195
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Halo regions need to be filled prior to computations and abstract operations #1063
Comments
Looks like we can use julia> using Statistics, BenchmarkTools, CUDA
julia> CUDA.allowscalar(false)
julia> N = 512;
julia> Rgpu = randn(N+2, N+2, N+2) |> CuArray;
julia> Vgpu = @views Rgpu[2:N+1, 2:N+1, 2:N+1];
julia> vgpu = zeros(1, 1, N) |> CuArray;
julia> mean!(vgpu, Vgpu);
julia> @benchmark CUDA.@sync mean!(vgpu, Vgpu)
BenchmarkTools.Trial:
memory estimate: 2.08 KiB
allocs estimate: 85
--------------
minimum time: 2.427 ms (0.00% GC)
median time: 2.567 ms (0.00% GC)
mean time: 2.584 ms (0.00% GC)
maximum time: 8.747 ms (0.00% GC)
--------------
samples: 1930
evals/sample: 1 which is basically the same speed as julia> @benchmark CUDA.@sync sum!(vgpu, Vgpu)
BenchmarkTools.Trial:
memory estimate: 1.48 KiB
allocs estimate: 62
--------------
minimum time: 2.428 ms (0.00% GC)
median time: 2.564 ms (0.00% GC)
mean time: 2.566 ms (0.00% GC)
maximum time: 3.228 ms (0.00% GC)
--------------
samples: 1944
evals/sample: 1 and ~34x faster than the CPU version (maybe we usually expect more but reduction operations aren't the best for GPUs) julia> using Statistics, BenchmarkTools
julia> N = 512;
julia> Rcpu = randn(N+2, N+2, N+2);
julia> Vcpu = @views Rcpu[2:N+1, 2:N+1, 2:N+1];
julia> vcpu = zeros(1, 1, N);
julia> mean!(vcpu, Vcpu);
julia> @benchmark mean!(vcpu, Vcpu)
BenchmarkTools.Trial:
memory estimate: 0 bytes
allocs estimate: 0
--------------
minimum time: 85.751 ms (0.00% GC)
median time: 86.201 ms (0.00% GC)
mean time: 86.316 ms (0.00% GC)
maximum time: 87.483 ms (0.00% GC)
--------------
samples: 58
evals/sample: 1 |
That goes a really long way. Then the first point raised in the issue can be resolved either by filling halo regions after a time-step is complete, but before diagnostics / output is calculated, or by filling halos within |
Currently halo regions are filled prior to performing a time-step. This means that after the time-step, they are incorrect. We therefore cannot output fields with correct halo regions, since data is outputted after a time-step is taken.
But it gets worse. If the average of a field is taken, we zero out the halo regions.
Zeroing out the halo regions corrupts near-boundary data for all subsequent computations with the fields. Currently, abstract operations cannot be trusted in boundary-adjacent cells.
To remedy this we need to fill halo regions on fields prior to performing computations. One way we might do this is to write a
compute!
method for fields:We can also define a
conditional_compute!
method forField
s and add astatus
property, so that halo regions are not filled "redundantly". For this to work, we also need to invalidatefield.status
when halo regions are zeroed out bycompute!(averaged_field::AveragedField)
, (for example by settingfield.status.time = NaN
).This won't work currently, of course, due to #971 . So this issue cannot be resolved until #971 is resolved.
The text was updated successfully, but these errors were encountered: