Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Start towards caching and perf optimizations #45

Merged
merged 10 commits into from
Mar 22, 2024
Merged

Start towards caching and perf optimizations #45

merged 10 commits into from
Mar 22, 2024

Conversation

utkarsh530
Copy link
Member

Checklist

  • Appropriate tests were added
  • Any code changes were done in a way that does not break public API
  • All documentation related to code changes were updated
  • The new code follows the
    contributor guidelines, in particular the SciML Style Guide and
    COLPRAC.
  • Any new documentation only uses public API

Additional context

Add any other context about the problem here.

@utkarsh530
Copy link
Member Author

Before:

julia> @benchmark sol = solve(prob,
           ParallelSyncPSOKernel(1000, backend = CUDA.CUDABackend()),
           maxiters = 500)
BenchmarkTools.Trial: 87 samples with 1 evaluation.
 Range (min … max):  49.353 ms … 187.970 ms  ┊ GC (min … max): 0.00% … 38.72%
 Time  (median):     53.320 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):   57.881 ms ±  23.307 ms  ┊ GC (mean ± σ):  3.74% ±  6.52%

  ▇█▃                                                           
  ███▇▄▄▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▃▁▁▃ ▁
  49.4 ms         Histogram: frequency by time          179 ms <

 Memory estimate: 3.00 MiB, allocs estimate: 71083.

After:

julia> @benchmark solve(prob,
           ParallelSyncPSOKernel(1000, backend = CUDA.CUDABackend()),
           maxiters = 500)
BenchmarkTools.Trial: 132 samples with 1 evaluation.
 Range (min … max):  34.288 ms … 166.877 ms  ┊ GC (min … max): 0.00% … 22.80%
 Time  (median):     34.847 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):   38.072 ms ±  19.359 ms  ┊ GC (mean ± σ):  2.52% ±  3.89%

  █                                                             
  █▆▄▅▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▄▁▄ ▄
  34.3 ms       Histogram: log(frequency) by time       166 ms <

 Memory estimate: 1.71 MiB, allocs estimate: 41525.

@utkarsh530
Copy link
Member Author

SciML/QuasiMonteCarlo.jl#115

Generating samples from QMC is always allocating. Benchmarking without using QMC (using lb = nothing; ub = nothing):

Before:

julia> @benchmark sol = solve(prob, ParallelPSOKernel(1000, backend = CUDA.CUDABackend()))
BenchmarkTools.Trial: 274 samples with 1 evaluation.
 Range (min … max):  17.822 ms … 34.874 ms  ┊ GC (min … max): 0.00% … 45.44%
 Time  (median):     17.937 ms              ┊ GC (median):    0.00%
 Time  (mean ± σ):   18.247 ms ±  2.227 ms  ┊ GC (mean ± σ):  1.56% ±  6.04%

  █                                                            
  █▆▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▅ ▅
  17.8 ms      Histogram: log(frequency) by time      34.4 ms <

 Memory estimate: 1.63 MiB, allocs estimate: 35643.

After:

julia> @benchmark sol = solve(prob, ParallelPSOKernel(1000, backend = CUDA.CUDABackend()))
BenchmarkTools.Trial: 1376 samples with 1 evaluation.
 Range (min … max):  3.425 ms … 31.323 ms  ┊ GC (min … max): 0.00% … 77.35%
 Time  (median):     3.521 ms              ┊ GC (median):    0.00%
 Time  (mean ± σ):   3.612 ms ±  1.283 ms  ┊ GC (mean ± σ):  1.68% ±  4.15%

                     ▃█▇▄▃▂▁ ▂ ▁                              
  ▂▂▂▂▂▂▂▂▂▁▂▃▂▃▃▃▃▅▇███████████▇▇▇▅▆▆▅▆▅▅▄▄▄▄▃▃▂▃▂▂▂▂▂▁▁▁▁▂ ▄
  3.43 ms        Histogram: frequency by time        3.64 ms <

 Memory estimate: 265.19 KiB, allocs estimate: 3106.

Note: This reduces the solve call time, i.e., in initialization, and there are no improvements in the exact GPU solve time.

@utkarsh530 utkarsh530 changed the title [WIP] Start towards caching and perf optimizations Start towards caching and perf optimizations Mar 22, 2024
@utkarsh530 utkarsh530 merged commit 3a3fc70 into main Mar 22, 2024
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant