OMP threading in CICE #114

apcraig · 2018-03-30T17:18:15Z

A few problematic OMP loops were unthreaded due to reproducibility problems found during testing. grep for TCXOMP. These are in ice_dyn_eap, ice_dyn_evp, and ice_transport_remap. One issue may be thread safety in icepack_ice_strength, but it requires additional debugging.

More generally, we need to review and validate that threading is working properly in CICE and Icepack.

eclare108213 · 2018-08-08T02:30:40Z

Let's look to see whether @mhrib addressed the ice_dyn_* loops in his refactorization

eclare108213 · 2018-08-08T03:45:31Z

see also #128

mhrib · 2018-12-03T13:19:37Z

I did find issues with the same OMPs (and a few more), but no solution other than comment-out as here. See also #252

TillRasmussen · 2019-01-20T12:59:42Z

In addition to the ones that Mads (MHRI) found I found OMP issues in ice_history and ice_grid. I uncommented all OMP directives in these two files, which saved the model from crashing when running with Intel and GNU compilers. I have not found solutions nor specific locations for these bugs witin the file..

eclare108213 · 2020-05-07T15:26:22Z

I am uploading a set of slides here from a LANL training course on OpenMP profiling and debugging that I attended last week. Most of it is old news, but the profiling and debugging info at the end might be useful as we move forward with this task.
Workshop 6 Basic OpenMP and Profiling-2.pdf

apcraig · 2021-12-30T23:31:55Z

I have created a perf_suite that will be PR'ed soon. This runs a fixed suite of tests that attempt to assess CICE performance at different task and thread counts. It basically does three things.

It runs a few cases on 1 PE with no threading with different block sizes to assess the impact on block size on model performance.
It uses a fixed 16x16 block size and runs a series of scaling tests on 1 to 128 MPI tasks.
It uses the same 16x16 block size and runs a series of timing tests on 64 PEs with 64 to 4 MPI tasks and 1 to 16 threads (i.e. 64x1, 32x2, 16x4, 8x8, 4x16).

This is all done with the gx1 grid, roundrobin decomp, 2 day runs, basic out of the box configuration. The idea is not to optimize the performance of CICE but to compare the performance of CICE on different hardware, different compilers, and different tasks/threads for a very fixed problem. This is, in part, a starting point for further OMP tuning.

I attach an xl spreadsheet, CICE_OMP_perf.xlsx, that shows the results from testing on Narwhal with 4 compilers and Cheyenne with 3 compilers in table and graph form. This is for hash 9fb518e of CICE dated Dec 21, 2021, but also includes the Narwhal port and the perf_suite (which will be PR'ed soon).

There are lots of interesting insights. But with regard to OMP, we see that in this version of CICE (which has lots of OMP loops turned off that still need debugging), OMP is still doing something. In these tests, OMP is never faster than just using all MPI for the same total PE count. But for a given MPI task count, threads run faster than running the same MPI task count but single threaded (i.e. 16x4 vs 16x1), at least on Narwhal. Cheyenne shows less benefit from threading. This establishes a performance baseline and provides a starting point to improve OMP performance, probably using Narwhal gnu or cray to continue OMP tuning efforts.

apcraig · 2021-12-30T23:59:07Z

Note that CICE_OMP_perf.xlsx has an error, the 4x16 run is actually 8x16. I've fixed the error in perf_suite in my sandbox for future use. Ignore the 4x16 results for now.

apcraig · 2021-12-31T17:08:17Z

I attach an updated OMP results table and graphs, CICE_OMP_perf.xlsx. This also has a second sheet that shows all timing info for the threading and unthreaded tests. If you look closely, you can see that Advection is just about the only section that threads reasonably. Column and Dynamics do not thread well and maybe not at all. I'll try to understand this better.

TillRasmussen · 2022-01-01T19:53:25Z

For the dynamic part most of the OMP has been commented out including the one in the subcycling iteration.

apcraig · 2022-01-19T19:00:11Z

I believe #680 largely addresses this PR. Will close this issue when #680 is merged. We'll need to remain diligent with respect to OpenMP validation and performance.

apcraig · 2022-03-10T02:36:22Z

This has largely been addressed in #680 and apcraig#64. There are still some known issues in VP and 1d EVP.

apcraig · 2022-03-10T02:37:22Z

I will close, VP and 1d EVP has their own issues. FYI, added omp_suite and perf_suite to check OpenMP and evaluate performance.

typo in ice_pio

apcraig self-assigned this Mar 30, 2018

apcraig added Type: Bug technical labels Mar 30, 2018

apcraig added Software Engineering and removed technical labels Jan 15, 2019

apcraig mentioned this issue Jan 19, 2022

Update OMP #680

Merged

16 tasks

apcraig closed this as completed Mar 10, 2022

anton-seaice pushed a commit to ACCESS-NRI/CICE that referenced this issue Jan 22, 2024

Merge pull request CICE-Consortium#114 from anton-seaice/ioerrchk3

a2d2c13

typo in ice_pio

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OMP threading in CICE #114

OMP threading in CICE #114

apcraig commented Mar 30, 2018

eclare108213 commented Aug 8, 2018

eclare108213 commented Aug 8, 2018

mhrib commented Dec 3, 2018

TillRasmussen commented Jan 20, 2019

eclare108213 commented May 7, 2020

apcraig commented Dec 30, 2021

apcraig commented Dec 30, 2021 •

edited

Loading

apcraig commented Dec 31, 2021

TillRasmussen commented Jan 1, 2022 •

edited

Loading

apcraig commented Jan 19, 2022

apcraig commented Mar 10, 2022

apcraig commented Mar 10, 2022

OMP threading in CICE #114

OMP threading in CICE #114

Comments

apcraig commented Mar 30, 2018

eclare108213 commented Aug 8, 2018

eclare108213 commented Aug 8, 2018

mhrib commented Dec 3, 2018

TillRasmussen commented Jan 20, 2019

eclare108213 commented May 7, 2020

apcraig commented Dec 30, 2021

apcraig commented Dec 30, 2021 • edited Loading

apcraig commented Dec 31, 2021

TillRasmussen commented Jan 1, 2022 • edited Loading

apcraig commented Jan 19, 2022

apcraig commented Mar 10, 2022

apcraig commented Mar 10, 2022

apcraig commented Dec 30, 2021 •

edited

Loading

TillRasmussen commented Jan 1, 2022 •

edited

Loading