-
Notifications
You must be signed in to change notification settings - Fork 132
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Evp kernel / #160
Evp kernel / #160
Conversation
…RI: CHECK THIS OMP"
Thanks for all the work here. I have a modified version of the boundary updates for the tripole grids. I am happy to share this with you. I'm not sure if it will help the above. What is the tool you use for tracking down the OMP issues? I know we've had a few for awhile, but it would be good to fix them. My email is [email protected]. |
I do not have a specific tool.
I use our three compilers (Cray,Intel,gnu) and test them all using different Block splitting (1..36 blocks , only split in one direction) and different number of threads (1..36). All done on one node using our 1213x1228 Grid and only two timesteps.
I use an allocatable version of CICE. Thereby I do not have to recompile, but can use the same binary. A small script starts all the cases and make individual folders for them.
If any/some of the runs fails, I check the errors. Usually SIG-error or the pbs-log directly points to a loop in a file. I try to rerun it and sometimes it just runs ok, but I guess this is due to race conditions.
I must admit, that I did not carefully cross-check the 10 OMPs. Perhaps an upstream error leads to errors in another OMP loop... Thereby they all do not necessary have problems.
However, as I recall ice_init always leads to errors.
And I suspect the two Finite Difference loops to be unsafe due to FD (ice_dyn_evp / ice_transport)
Rgs
Mads
|
Forgot to mension, that my testgrid is regular ... Ie nothing to do with tripoles or circular boundaries.
Rgs
Mads
|
I will close this now. @mhrib , if you are able to keep your evp_kernel branch around for a while in case we need to reference it, that would be helpful. That just means not manually deleting the branch just yet. thanks. |
A number of items:
1b) OpenMP issues??
All from: https://github.com/mhrib/CICE/tree/evp_kernel
ad1) EVP-kernel
New EVP kernels made on the full grid on order to avoid expensive halo updates and keep it all in memory. Vectoriced and suited for one single node using OpenMP threads.
CPPDEF:
#ifdef DMI_EVP
Affected files:
./cicecore/cicedynB/dynamics/evp_kernel1d.F90 (The core)
./cicecore/cicedynB/dynamics/ice_dyn_evp.F90 (switch between CICE/new kernels)
./cicecore/cicedynB/dynamics/ice_dyn_shared.F90 (for namelist)
./cicecore/cicedynB/general/ice_init.F90 (for namelist)
Namelist:
&dynamics_nml
evp_kernel_ver = 0
...
0: CICE (default) , 1: kernel_v1 , 2: kernel_v2 (calculate distances inline)
The good:
A) EVP kernel is working and implemented into CICE. Kernel 0 (CICE) or kernel 1+2 is configurable via. namelist "evp_kernel_ver" (integer)
B) Kernel runs threaded on a single node using the following calls in ice_dyn_evp.F90:
C) Works fine with binary identical results for (uvel,vvel) compared to CICE evp version using "normal" grids using IEEE flags.
D) Seems to work fine (normal grids) with both CICE and evp-kernel threads ... AFTER comment out 10 OMP blocks (see ad 1b below)
E) Works for "normal" circular grids using vectorized halo_update. (updated after June 2018 WebEX)
To Do and nice to have:
A) Issues with tri-pole grids. Assume that a correct halo-update can fix this???
B) Issues with CICE threading: Comment out 10 out of 135 OMP calls (in 33 files). See below for specific files:
C) Reduce number of variables in "evp_copyin" routine for v1+v2. Memory usages can be reduced hereby, and speedup can be expected. Easy to do.
D) "logistics" such that CICE can run on several nodes, and a single node for EVP kernel only.
E) Kernel v3 using real*4 internally - reduce memory and increase performance...
F) "basal_stress" has not been vectorized yet (the call between stress and stepu in ice_dyn_evp.F90). Strait forward to do it.
ad1b: OMP threads issues:
1 : xx OMP : ./cicecore/cicedynB/analysis/ice_diagnostics.F90 (Comment out last two OMP blocks)
2 : xx OMP : ./cicecore/cicedynB/analysis/ice_history.F90 (Comment out first OMP block , NOTE: OMP PRIVATE: ,qn,ns, --> ,qn,sn,)
4 : xx OMP : ./cicecore/cicedynB/dynamics/ice_dyn_eap.F90 (Comment out OMP blocks for stress+stepu)
5 : xx OMP : ./cicecore/cicedynB/dynamics/ice_dyn_evp.F90 (Comment out OMP blocks for stress+stepu)
7 : xx OMP : ./cicecore/cicedynB/dynamics/ice_transport_driver.F90 (Comment out OMP block #1,4)
8 : xx OMP : ./cicecore/cicedynB/dynamics/ice_transport_remap.F90 (Comment out second OMP block)
11 : xx OMP : ./cicecore/cicedynB/general/ice_init.F90 (Comment out first OMP block)
23 : xx OMP : ./cicecore/drivers/cice/CICE_RunMod.F90 (Comment out last OMP block)
I have inserted the following text in front of the affected OMP directives:
NOTE: I have not checked so careful, so perhaps some of the OMP's are ok? Tested with cray,intel and gnu compilers and they crash. Sometimes a "re-submit" made them work ok - maybe race
ad 2) Revised EVP parameters (Till)
... according to Kimmirtz, Danilov and Losch, 2015 (#ifdef DMI_REVP)
CPPDEF:
#ifdef DMI_REVP
affected files:
./cicecore/cicedynB/dynamics/ice_dyn_evp.F90
./cicecore/cicedynB/dynamics/ice_dyn_shared.F90 (for namelist)
./cicecore/cicedynB/general/ice_init.F90 (for namelist)
Namelist:
&dynamics_nml
arlx1i = 1._dbl_kind/300.0_dbl_kind
brlx = 300.0_dbl_kind
(above is default values)
ad 3) Splitting of namelist
CPPDEF
#ifdef DMI_nml
Affected files:
/cicecore/cicedynB/analysis/ice_history_pond.F90
./cicecore/cicedynB/analysis/ice_history_bgc.F90
./cicecore/cicedynB/analysis/ice_history_drag.F90
./cicecore/cicedynB/analysis/ice_history_mechred.F90
./cicecore/cicedynB/analysis/ice_history.F90
./cicecore/cicedynB/infrastructure/ice_domain.F90
./cicecore/shared/ice_fileunits.F90
./cicecore/shared/ice_init_column.F90
ad 4) DMI specific
Nice to have but not needed by others
CPPDEFS
#ifdef DMI
ad 5) Allocatable version
Allocated version for horz. parameters
SAME as this pull on a more clean version is found here:
#157
CPPDEFS:
I have just changed the code, so no CPP flags. But quite some "subroutine alloc_xxx"'s
namelist
&domain_nml
nprocs = -1
max_blocks = -1 ! max number of blocks per processor
block_size_x = -1 ! size of block in first horiz dimension
block_size_y = -1 ! size of block in second horiz dimension
nx_global = -1 ! NXGLOB, i-axis size
ny_global = -1 ! NYGLOB
CICE stops with ERROR if the namelist settings are <1 - except for max_blocks. If max_blocks is <1 it is automatically calculated/estimated as:
max_blocks=( ((nx_global-1)/block_size_x + 1) * &
((ny_global-1)/block_size_y + 1) ) / nprocs
affected files:
cicecore/shared/ice_domain_size.F90
cicecore/shared/ice_arrays_column.F90
cicecore/drivers/cice/CICE_InitMod.F90
cicecore/drivers/hadgem3/CICE_InitMod.F90
cicecore/drivers/cesm/CICE_InitMod.F90
cicecore/cicedynB/infrastructure/ice_grid.F90
cicecore/cicedynB/infrastructure/ice_restoring.F90
cicecore/cicedynB/general/ice_flux.F90
cicecore/cicedynB/general/ice_state.F90
cicecore/cicedynB/general/ice_forcing.F90
cicecore/cicedynB/general/ice_forcing_bgc.F90
cicecore/cicedynB/general/ice_flux_bgc.F90
Developer(s):
Mads Hvid Ribergaard, DMI
Jacob Weismann Poulsen, DMI
Till Rasmussen, DMI
Please suggest code Pull Request reviewers in the column at right.
Are the code changes bit for bit, different at roundoff level, or more substantial?
Is the documentation being updated with this PR? (Y/N)
If not, does the documentation need to be updated separately at a later time? (Y/N)
Note: "Documentation" includes information on the wiki and .rst files in doc/source/,
which are used to create the online technical docs at https://readthedocs.org/projects/cice-consortium-cice/.
Other Relevant Details:
Email/WebEx us if you have any questions...