Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Benchmark abstract operations #870

Merged
merged 5 commits into from
Mar 31, 2021
Merged

Conversation

ali-ramadhan
Copy link
Member

Had some time to burn while waiting for stuff to train so I benchmarked some abstract operations:

Some outliers in there like α * β - γ * δ / ζ and (u^2 + v^2 + w^2) / 2 so there should be some useful info.

Tried to run on GPU but it wouldn't compile α + β even though it worked for me in the REPL 🤷 Worth trying again after #860.

Oceananigans v0.34.1 (DEVELOPMENT BRANCH)
Julia Version 1.5.0
Commit 96786e22cc (2020-08-01 23:44 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: Intel(R) Core(TM) i7-6700HQ CPU @ 2.60GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-9.0.1 (ORCJIT, skylake)
 ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
                                                 Abstract operations benchmarks                                                         Time                   Allocations      
                                                                                                                                ──────────────────────   ───────────────────────
                                                        Tot / % measured:                                                            7.66s / 69.8%           6.85GiB / 89.5%    
 Section                                                                                                                ncalls     time   %tot     avg     alloc   %tot      avg
 ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
  32× 32× 32 [01] -α [CPU]                                                                                                  10    526μs  0.01%  52.6μs   29.2KiB  0.00%  2.92KiB
  32× 32× 32 [02] √ζ [CPU]                                                                                                  10    987μs  0.02%  98.7μs   29.2KiB  0.00%  2.92KiB
  32× 32× 32 [03] sin(β) [CPU]                                                                                              10   6.97ms  0.13%   697μs   29.2KiB  0.00%  2.92KiB
  32× 32× 32 [04] cos(γ) [CPU]                                                                                              10   8.24ms  0.15%   824μs   29.2KiB  0.00%  2.92KiB
  32× 32× 32 [05] exp(δ) [CPU]                                                                                              10   7.06ms  0.13%   706μs   29.2KiB  0.00%  2.92KiB
  32× 32× 32 [06] tanh(ζ) [CPU]                                                                                             10   13.9ms  0.26%  1.39ms   29.2KiB  0.00%  2.92KiB
  32× 32× 32 [07] α + β [CPU]                                                                                               10    737μs  0.01%  73.7μs   29.2KiB  0.00%  2.92KiB
  32× 32× 32 [08] α + β - γ [CPU]                                                                                           10    191ms  3.58%  19.1ms    370MiB  5.90%  37.0MiB
  32× 32× 32 [09] α * β * γ * δ [CPU]                                                                                       10    857μs  0.02%  85.7μs   29.2KiB  0.00%  2.92KiB
  32× 32× 32 [10] α * β - γ * δ / ζ [CPU]                                                                                   10    299ms  5.59%  29.9ms    340MiB  5.42%  34.0MiB
  32× 32× 32 [11] u^2 + v^2 [CPU]                                                                                           10   1.30ms  0.02%   130μs   29.2KiB  0.00%  2.92KiB
  32× 32× 32 [12] (u^2 + v^2 + w^2) / 2 [CPU]                                                                               10    278ms  5.20%  27.8ms    340MiB  5.42%  34.0MiB
  32× 32× 32 [13] ∂x(α) [CPU]                                                                                               10    697μs  0.01%  69.7μs   29.2KiB  0.00%  2.92KiB
  32× 32× 32 [14] ∂y(∂y(β)) [CPU]                                                                                           10    274ms  5.13%  27.4ms    340MiB  5.42%  34.0MiB
  32× 32× 32 [15] ∂z(∂z(∂z(∂z(γ)))) [CPU]                                                                                   10    805ms  15.1%  80.5ms   0.97GiB  15.8%  99.0MiB
  32× 32× 32 [16] ∂x(δ + ζ) [CPU]                                                                                           10    898μs  0.02%  89.8μs   29.2KiB  0.00%  2.92KiB
  32× 32× 32 [17] ∂x(v) - δy(u) [CPU]                                                                                       10    891μs  0.02%  89.1μs   29.2KiB  0.00%  2.92KiB
  32× 32× 32 [18] ∂z(α * β + γ) [CPU]                                                                                       10   1.37ms  0.03%   137μs   29.2KiB  0.00%  2.92KiB
  32× 32× 32 [19] ∂x(u) * ∂y(v) + ∂z(w) [CPU]                                                                               10    279ms  5.22%  27.9ms    340MiB  5.42%  34.0MiB
  32× 32× 32 [20] ∂x(α)^2 + ∂y(α)^2 + ∂z(α)^2 [CPU]                                                                         10   64.8ms  1.21%  6.48ms   29.2KiB  0.00%  2.92KiB
  32× 32× 32 [21] ∂x(ζ)^4 + ∂y(ζ)^4 + ∂z(ζ)^4 + 2*∂x(∂x(∂y(∂y(ζ)))) + 2*∂x(∂x(∂z(∂z(ζ)))) + 2*∂y(∂y(∂z(∂z(ζ)))) [CPU]       10    3.11s  58.2%   311ms   3.47GiB  56.6%   355MiB
 ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────

@codecov
Copy link

codecov bot commented Aug 27, 2020

Codecov Report

Merging #870 (9f5c6e3) into master (6112c6c) will increase coverage by 12.89%.
The diff coverage is n/a.

❗ Current head 9f5c6e3 differs from pull request most recent head 19f23f1. Consider uploading reports for the commit 19f23f1 to get more accurate results
Impacted file tree graph

@@             Coverage Diff             @@
##           master     #870       +/-   ##
===========================================
+ Coverage   55.78%   68.67%   +12.89%     
===========================================
  Files         171      126       -45     
  Lines        4005     2678     -1327     
===========================================
- Hits         2234     1839      -395     
+ Misses       1771      839      -932     
Impacted Files Coverage Δ
src/Coriolis/no_rotation.jl 0.00% <0.00%> (-100.00%) ⬇️
...undaryConditions/coordinate_boundary_conditions.jl 33.33% <0.00%> (-66.67%) ⬇️
src/SurfaceWaves.jl 6.25% <0.00%> (-40.81%) ⬇️
src/BoundaryConditions/apply_flux_bcs.jl 21.21% <0.00%> (-34.35%) ⬇️
src/Coriolis/f_plane.jl 56.00% <0.00%> (-30.67%) ⬇️
src/Utils/pretty_time.jl 75.00% <0.00%> (-21.56%) ⬇️
src/Grids/Grids.jl 71.42% <0.00%> (-20.24%) ⬇️
src/Solvers/index_permutations.jl 0.00% <0.00%> (-20.00%) ⬇️
src/Buoyancy/Buoyancy.jl 63.15% <0.00%> (-16.85%) ⬇️
src/Fields/field.jl 65.62% <0.00%> (-16.73%) ⬇️
... and 172 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 17f8cc6...19f23f1. Read the comment docs.

Copy link
Member

@glwagner glwagner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Neat, thank you. I can't puzzle out the logic that underlies the timings. Looks like the timing is strongly nonlinear with the complexity of the operations object... ?

@ali-ramadhan
Copy link
Member Author

Same here although it seems that slow abstract operations are associated with lots of memory allocations. So perhaps if we could figure out why some allocations happen, that would speed up some of them. Running the GPU version could be useful as well as the GPU tends to forcefully @inline more stuff maybe leading to less allocations and fewer slow operations?

@ali-ramadhan
Copy link
Member Author

Should probably try to get it working on GPU before merging. Will try again after #860.

@glwagner
Copy link
Member

Hmm, we could be missing some @inline...

@ali-ramadhan ali-ramadhan force-pushed the ar/benchmark-abstract-operations branch from b6a61dd to 19f23f1 Compare March 31, 2021 23:23
@ali-ramadhan
Copy link
Member Author

I'll merge this PR since the script is fine and the real issue is being discussed in #1241.

@ali-ramadhan ali-ramadhan merged commit 860e4ae into master Mar 31, 2021
@ali-ramadhan ali-ramadhan deleted the ar/benchmark-abstract-operations branch March 31, 2021 23:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted 🦮 plz halp (guide dog provided)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants