Adapt Field, AveragedField, and ComputedField for GPU, round 2 #1057

glwagner · 2020-10-13T19:52:55Z

This PR writes new Adapt.adapt_structure methods for Field, AveragedField, and ComputedField:

Field and ComputedField are adapted to their data (thus shedding location information, the grid, and boundary conditions). This is fine because we don't reference location information or boundary conditions inside GPU kernels
AveragedField sheds operand and grid when adapted to the GPU. AveragedField still needs location information for getindex to work correctly.

This obviates the need for datatuple (we still keep the function around however because its useful for tests). It also obviates the need for gpufriendly.

~~We can now use AveragedField and ComputedField inside kernels.~~ This still doesn't work. We need to open an issue once this PR is merged.

This PR supersedes #746 .

Finally, we can dramatically simplify the time-stepping routine since we don't need to "unwrap" fields anymore.

It's probably worthwhile running a benchmark before merging but hopefully there's no issue.

Resolves #722 .

codecov · 2020-10-14T07:23:56Z

Codecov Report

Merging #1057 into master will decrease coverage by 0.21%.
The diff coverage is 69.38%.

@@            Coverage Diff             @@
##           master    #1057      +/-   ##
==========================================
- Coverage   57.70%   57.49%   -0.22%     
==========================================
  Files         158      161       +3     
  Lines        3807     3807              
==========================================
- Hits         2197     2189       -8     
- Misses       1610     1618       +8

Impacted Files	Coverage Δ
src/AbstractOperations/AbstractOperations.jl	`50.00% <ø> (ø)`
src/AbstractOperations/show_abstract_operations.jl	`13.04% <0.00%> (-0.60%)`	⬇️
src/Buoyancy/buoyancy_field.jl	`61.76% <0.00%> (-2.95%)`	⬇️
src/Fields/abstract_field.jl	`57.14% <0.00%> (-0.86%)`	⬇️
src/Fields/averaged_field.jl	`77.77% <ø> (+7.77%)`	⬆️
src/Fields/computed_field.jl	`64.28% <0.00%> (ø)`
src/Fields/field.jl	`82.35% <0.00%> (-5.89%)`	⬇️
src/Fields/show_fields.jl	`0.00% <0.00%> (ø)`
src/Operators/laplacian_operators.jl	`9.09% <ø> (ø)`
src/TimeSteppers/TimeSteppers.jl	`80.00% <ø> (ø)`
... and 21 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c5f47e0...c35af73. Read the comment docs.

…eananigans.jl into glw/adapt-field-round-2

ali-ramadhan

Static ocean benchmarks (see below) show no performance regression on GPUs. Actually, it seems that CPU models are actually ~30% faster now 🎉

Side note: Some potential performance regressions may not be caught by benchmark_static_ocean.jl. I think we should merge this PR as the static ocean benchmarks would test whether adapting Field introduces performance regressions.

I'm hoping to refactor the benchmarks to reduce boilerplate and produce more useful statistics/tables. As part of that I'll add a more comprehensive benchmark that tests benchmarking with an LES closure, output writing, time averaging, etc.

Environment:

Oceananigans v0.42.0 (DEVELOPMENT BRANCH)
Julia Version 1.5.2
Commit 539f3ce943 (2020-09-23 23:17 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: Intel(R) Xeon(R) Silver 4214 CPU @ 2.20GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-9.0.1 (ORCJIT, cascadelake)
  GPU: TITAN V

Static ocean benchmarks from master branch:

        Static ocean benchmarks                Time                   Allocations      
                                       ──────────────────────   ───────────────────────
           Tot / % measured:                 448s / 28.2%           31.2GiB / 0.40%    

 Section                       ncalls     time   %tot     avg     alloc   %tot      avg
 ──────────────────────────────────────────────────────────────────────────────────────
  16× 16× 16  [CPU, Float32]       10   25.9ms  0.02%  2.59ms   3.40MiB  2.68%   348KiB
  16× 16× 16  [CPU, Float64]       10   32.1ms  0.03%  3.21ms   3.40MiB  2.68%   348KiB
  16× 16× 16  [GPU, Float32]       10   41.2ms  0.03%  4.12ms   9.28MiB  7.32%   950KiB
  16× 16× 16  [GPU, Float64]       10   45.3ms  0.04%  4.53ms   9.28MiB  7.32%   950KiB
  32× 32× 32  [CPU, Float32]       10    120ms  0.09%  12.0ms   3.40MiB  2.68%   348KiB
  32× 32× 32  [CPU, Float64]       10    117ms  0.09%  11.7ms   3.40MiB  2.68%   348KiB
  32× 32× 32  [GPU, Float32]       10   63.0ms  0.05%  6.30ms   9.28MiB  7.32%   950KiB
  32× 32× 32  [GPU, Float64]       10   41.1ms  0.03%  4.11ms   9.29MiB  7.32%   951KiB
  64× 64× 64  [CPU, Float32]       10    675ms  0.53%  67.5ms   3.40MiB  2.68%   348KiB
  64× 64× 64  [CPU, Float64]       10    705ms  0.56%  70.5ms   3.40MiB  2.68%   348KiB
  64× 64× 64  [GPU, Float32]       10   42.7ms  0.03%  4.27ms   9.28MiB  7.32%   950KiB
  64× 64× 64  [GPU, Float64]       10   43.7ms  0.03%  4.37ms   9.29MiB  7.32%   951KiB
 128×128×128  [CPU, Float32]       10    5.85s  4.64%   585ms   3.40MiB  2.68%   348KiB
 128×128×128  [CPU, Float64]       10    5.23s  4.14%   523ms   3.40MiB  2.68%   348KiB
 128×128×128  [GPU, Float32]       10   57.8ms  0.05%  5.78ms   9.28MiB  7.32%   951KiB
 128×128×128  [GPU, Float64]       10   53.5ms  0.04%  5.35ms   9.29MiB  7.32%   951KiB
 256×256×256  [CPU, Float32]       10    58.5s  46.4%   5.85s   3.40MiB  2.68%   348KiB
 256×256×256  [CPU, Float64]       10    53.9s  42.7%   5.39s   3.40MiB  2.68%   348KiB
 256×256×256  [GPU, Float32]       10    317ms  0.25%  31.7ms   9.32MiB  7.35%   955KiB
 256×256×256  [GPU, Float64]       10    321ms  0.25%  32.1ms   9.29MiB  7.32%   951KiB
 ──────────────────────────────────────────────────────────────────────────────────────

Static ocean benchmarks from glw/adapt-field-round-2 branch:

 ──────────────────────────────────────────────────────────────────────────────────────
        Static ocean benchmarks                Time                   Allocations      
                                       ──────────────────────   ───────────────────────
           Tot / % measured:                 369s / 25.7%           31.0GiB / 0.36%    

 Section                       ncalls     time   %tot     avg     alloc   %tot      avg
 ──────────────────────────────────────────────────────────────────────────────────────
  16× 16× 16  [CPU, Float32]       10   24.4ms  0.03%  2.44ms   2.87MiB  2.49%   293KiB
  16× 16× 16  [CPU, Float64]       10   25.3ms  0.03%  2.53ms   2.87MiB  2.49%   293KiB
  16× 16× 16  [GPU, Float32]       10   40.3ms  0.04%  4.03ms   8.63MiB  7.50%   884KiB
  16× 16× 16  [GPU, Float64]       10   38.3ms  0.04%  3.83ms   8.63MiB  7.50%   884KiB
  32× 32× 32  [CPU, Float32]       10   74.6ms  0.08%  7.46ms   2.87MiB  2.49%   293KiB
  32× 32× 32  [CPU, Float64]       10   72.4ms  0.08%  7.24ms   2.87MiB  2.49%   293KiB
  32× 32× 32  [GPU, Float32]       10   63.5ms  0.07%  6.35ms   8.64MiB  7.50%   884KiB
  32× 32× 32  [GPU, Float64]       10   44.6ms  0.05%  4.46ms   8.64MiB  7.51%   885KiB
  64× 64× 64  [CPU, Float32]       10    527ms  0.56%  52.7ms   2.87MiB  2.49%   293KiB
  64× 64× 64  [CPU, Float64]       10    648ms  0.68%  64.8ms   2.87MiB  2.49%   293KiB
  64× 64× 64  [GPU, Float32]       10   40.5ms  0.04%  4.05ms   8.64MiB  7.50%   884KiB
  64× 64× 64  [GPU, Float64]       10   50.8ms  0.05%  5.08ms   8.64MiB  7.51%   885KiB
 128×128×128  [CPU, Float32]       10    4.86s  5.13%   486ms   2.87MiB  2.49%   293KiB
 128×128×128  [CPU, Float64]       10    3.93s  4.15%   393ms   2.87MiB  2.49%   293KiB
 128×128×128  [GPU, Float32]       10    128ms  0.13%  12.8ms   8.65MiB  7.52%   886KiB
 128×128×128  [GPU, Float64]       10   46.8ms  0.05%  4.68ms   8.64MiB  7.51%   885KiB
 256×256×256  [CPU, Float32]       10    43.0s  45.3%   4.30s   2.87MiB  2.49%   293KiB
 256×256×256  [CPU, Float64]       10    40.6s  42.8%   4.06s   2.87MiB  2.49%   293KiB
 256×256×256  [GPU, Float32]       10    317ms  0.33%  31.7ms   8.68MiB  7.54%   889KiB
 256×256×256  [GPU, Float64]       10    322ms  0.34%  32.2ms   8.65MiB  7.51%   885KiB
 ──────────────────────────────────────────────────────────────────────────────────────

ali-ramadhan · 2020-10-17T14:58:34Z

src/AbstractOperations/multiary_operations.jl

-    │   │   └── OffsetArrays.OffsetArray{Float64,3,Array{Float64,3}}
-    │   └── / at (Cell, Cell, Cell) via Oceananigans.AbstractOperations.identity
+    * at (Cell, Cell, Cell) via identity
+    ├── 0.3333333333333333


Just an idea: We can pretty print rational numbers that show up in abstract operations

julia> rationalize(0.3333333333333333) 1//3

but perhaps this is misleading as Julia is actually multiplying by 0.3333333333333333 and not 1//3.

So probably the best thing to do is just print with eltype(model).

Hmm we can also truncate floating point numbers to fewer significant digits by redefining tree_show(a::Number, depth, nesting):

Oceananigans.jl/src/AbstractOperations/show_abstract_operations.jl

Line 22 in b11f047

tree_show(a::Union{Number, Function}, depth, nesting) = string(a)

glwagner added 3 commits October 13, 2020 15:41

Adapt the world

94dfc10

Bugfix in time stepping kernels

ce0183a

Reorganizes TimeStepper module file

f5733f3

glwagner requested a review from ali-ramadhan October 13, 2020 19:53

glwagner added 3 commits October 13, 2020 15:55

Update precomputations.jl

6fd08a2

Update pressure_correction.jl

192af89

Merge branch 'master' into glw/adapt-field-round-2

434294c

glwagner mentioned this pull request Oct 13, 2020

Omit diffusivities from model_fields #1061

Merged

glwagner added 14 commits October 13, 2020 18:38

Unrestrict closure operators

47483ea

Better show and docs update for abstract operations

18506ca

Generalize dispatch on arrays to ArrayOrField or Any

f4ee86a

Computations are incorrect on boundaries of course

03f0c41

Fixes up tests and elides GPU test

047dcbc

Adds short_show methods for AveragedField and ComputedField

7c6c844

Unary operations are a set not an array

23ae1db

Show for AveragedField and ComputedField

cb190ed

Better show for AveragedField and ComputedField

a64c32f

Trees belong to AbstractOperations

331bbd3

short_show for AbstractOperations

d482b95

Moar show for fields

402d3e5

Bugfix in short_show for BuoyancyField

eb0bfd2

Try inbounds and inline rather than propagate-inbounds

f0f86fb

glwagner added 5 commits October 14, 2020 07:15

Update calculate_tendencies.jl

e7892ac

Merge branch 'master' into glw/adapt-field-round-2

9ea60d2

Merge branch 'glw/adapt-field-round-2' of https://github.com/CliMA/Oc…

974200a

…eananigans.jl into glw/adapt-field-round-2

Merge branch 'master' into glw/adapt-field-round-2

3af5272

Merge branch 'master' into glw/adapt-field-round-2

c35af73

glwagner mentioned this pull request Oct 16, 2020

set! should enforce incompressibility when setting an initial condition for an Incompressible model #1027

Closed

ali-ramadhan approved these changes Oct 17, 2020

View reviewed changes

glwagner merged commit 12435ce into master Oct 17, 2020

ali-ramadhan mentioned this pull request Oct 17, 2020

Bump v0.43.0 #1077

Merged

glwagner mentioned this pull request Nov 5, 2020

[WIP] Adapt Field to run on GPU #746

Closed

navidcy deleted the glw/adapt-field-round-2 branch May 27, 2021 23:21

jagoosw mentioned this pull request Aug 17, 2022

Failed to compile PTX code ... uses too much parameter space #2700

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adapt Field, AveragedField, and ComputedField for GPU, round 2 #1057

Adapt Field, AveragedField, and ComputedField for GPU, round 2 #1057

glwagner commented Oct 13, 2020 •

edited

Loading

codecov bot commented Oct 14, 2020 •

edited

Loading

ali-ramadhan left a comment

ali-ramadhan Oct 17, 2020

glwagner Oct 17, 2020

Adapt Field, AveragedField, and ComputedField for GPU, round 2 #1057

Adapt Field, AveragedField, and ComputedField for GPU, round 2 #1057

Conversation

glwagner commented Oct 13, 2020 • edited Loading

codecov bot commented Oct 14, 2020 • edited Loading

Codecov Report

ali-ramadhan left a comment

Choose a reason for hiding this comment

ali-ramadhan Oct 17, 2020

Choose a reason for hiding this comment

glwagner Oct 17, 2020

Choose a reason for hiding this comment

glwagner commented Oct 13, 2020 •

edited

Loading

codecov bot commented Oct 14, 2020 •

edited

Loading