This is accompany code associated with the paper submission 'SPEL: Software Tool for Porting ELM with OpenACC in a Function Unit Test Framework'. This software tool builds off of previous work done by Dali Wang and Yao Cindy to create a robust method for developing the E3SM Land Model (ELM) onto GPUs.
SPEL contains the folders:
./SourceFiles/: folder contains the ELM Fortran source files and the GPU-ready ELM test modules
./scripts/: folder contains SPEL Python scripts and few Fortran modules to generate ELM test modules
./modified-files/: folder created to hold optimized versions of source files -- created as neeeded
./scripts/script-output: created at first run to hold temporary files
./unit-tests/: Created at first run and will contain directories for all cases.
Currently, these SPEL Python scripts are used to:
- extract and prepare ELM files to run and compile without MPI and netcdf.
- modify ELM routines to remove modules that cannot or are undesired to run on the GPU.
- Perform automatic OpenACC acceleration using either the routine directive or parallel loop directives.
- Understand code by generating simple call tree and dependency graph of the modules.
- write Fortran routines to generate input/output needed to initialize variables and verify the results and a needed Makefile
- Note: The current intended workflow is to generate a unit test with opt = False and then re-run with opt = True as desired. Currently, opt = True bypasses many of the other processing scripts as it is meant to be run multiple times as the user recognizes and resolves issues with the code.
Setup : In scripts directory, edit mod_config.py with specific file layout as needed and UnitTestforELM.py with a list of subroutines to parse (only parent subroutine needs to be listed) and a name for the case. While running with python3 UnitTestforELM.py, a directory will be created in ./unit-test/{casename} to contain the Function Unit Test program.
A Makefile will automatically be generated for the chosen subroutines to test.
elm_initializeMod.F90 and main.F90 will be modified by the scripts to
use and allocate only the variables that are needed.
readMod.F90, writeMod.F90, and verificationMod.F90 are generated by the scripts to create the appropriate I/O and validation functions. duplicationMod.F90 is generated to duplicate the same variables as many times as desired at run-time.
Get Reference Data: Compile ELM with writeMod.F90 and place subroutine write_vars() before subroutine used for the Unit Test.
In addition to the scripts, the main.F90 file was created to effectively replace
the lnd_cpl_mct and elm_drv routines and is where all testing is done and configurations should be done.
Compilation of unit-test only requires NV Fortran compiler, CUDA 10+, and potentially LAPACK.
make command will create the elmtest.exe which is then run with ./elmtest.exe [numSetsOfSites] [clump-pproc] where numSetsOfSites controls the number of unique sites used for the reference output to be computed and clump-pproc (optional default = 1) are the number of clumps to have per mpi task.
The Makefile defaults to compiling with CPU-only and with debug mode. To change to OpenACC edit the Makefile to compile with FC_FLAGS_ACC
Unit Test Example:
mpirun -n1 ./elmtest.exe 2. ->> Perform a Unit Test for 2 sets of the 42 Ameriflux sites on one mpi task.
- Example reference simulation data for LakeTemperature come with SPEL called E3SM_constants.txt and output_LakeTemperature_vars.txt. These must be in same directory as executable.
- elm_initializationMod.F90 and main.F90 are hard-coded with SPEL output to avoid having to make changes for this example. (will update SPEL to handle this automatically in the future)
- Since the optimizations are only semi-automatic and require some familiarity to fully implement, an optimized version of LakeTemperature is provided in the ./scripts directory called LakeTemperature.OPT.F90.
- An original version of LakeTemperatureMod.F90 is in the main directory called CPU-LakeTemperatureMod.F90
edit_file.py :
Contains functions that are intended to be used on entire .F90 files rather than on specific subroutines. These functions were created for the purpose of preparing ELM files to work with the unit-test. The user must provide a list of the modules and subroutines/functions that need to be removed, and the python functions can then comment them out (entire subroutines for some modules) with a '!#py ' comment. If a module is encountered that is not present in the SourceFiles/ directory, The user will be prompted if this module is necessary and add it to the omit list if not and exit if it is.
The file keeps track of any mods used in the file that have not been processed and will recursively process them. Currently, the user must manually keep track of what has been processed in a separate file.
There are special comments for BeTR and FATES additions to ELM to allow for easy search and replace to enable them.
analyze_subroutines.py :
Contains a class Subroutine designed to hold all the relevant info and functions needed to analyze subroutines for the unit-test and openACC, such as derived types and components read/written to and any other subroutines called(and their variable info).
The functions will add !$acc routine info to each subroutine (if not present) and do necessary edits to the subroutines required for GPU compilation. Mostly, this means changing subroutine calls containing array bounds. The python functions operate recursively on all subroutines called by the main one.
The examineLoops function is used for further optimization to go beyond the naive implementation. There is also functionality to automatically replace bounds allocations with the compact filter allocation if needed. Currently, if examineLoops is called with the add_acc enabled, it will only accelerate loops that do not have a race condition detected (reduction operation). Loops that have that detected are listed in Yellow for the user to examine afterwards.
UnitTestforELM.py :
This has the main python function that calls the others. This is where the user sets which subroutines they wish to create a Unit Test for using the sub_name_list list variable.
If opt = True then SPEL will attempt to parse and accelerate subroutines on a loop-by-loop level, some features that are incompatible with the "routine" directive. Then,a Makefile, verification routines, I/O routines and other files are created that are necessary for a Unit Test are created.
DerivedType.py :
Contains DerivedType Class used for processing the ELM data types
LoopConstructs.py :
Contains Loop Class used for processing and modifying loops in ELM functions
errorAnalysis.py :
Functions used for analyszing output from verficationMod.F90
mod_config.py :
Configure location of source files and essential files.
process_associate.py :
Holds function to obtain global variables in associate list.
variable_analysis.py :
Functions to find global variables that aren't derived types.
write_routines.py :
Functions that write needed .F90 files (e.g., duplicateMod.F90, Makefile)
interfaces.py :
Functions that help to resolve which subroutine in an interface is actually being called.
utilityFunctions.py :
Holds funtions that may be used in many different other modules.
SPEL has been developed mainly on the Summit computer at the Oak Ridge National Laboratory. Summit has 4,608 computing nodes, most of which contain two 22-core IBM POWER9 CPUs, six 16-GB NVIDIA Volta GPUs, and 512 GB of shared memory. The software environment includes NVIDIA HPC 21.3 and several libraries: spectrum-mpi (10.4), NetCDF (4.8), pnetcdf(1.12), HDF (1.10), and CUDA (11.1).
SPEL uses CUDA Fortran (NVIDIA HPC package) to manage memory.
====== Create a new folder and install the code
git clone https://github.com/peterdschwartz/SPEL_OpenACC.git
===== create LakeTemperature Test module
cd SPEL_OpenACC/scripts
python3 UnitTestforELM.py # It creates a LakeTemperature module
# at SPEL_OpenACC/unit-tests/LakeTemparature
cd SPEL_OpenACC/unit-tests/LakeTemperature
====== compilation on Summit
module load nvhpc #Load appropriate module file current it is nvhpc/21.3
make clean; make #Make CPU-version test module
Make GPU-version test module
Change the compiler flag in makefile.
From: FC_FLAGS = $(FC_FLAGS_DEBUG) $(MODEL_FLAGS)
To: FC_FLAGS = $(FC_FLAGS_ACC) $(MODEL_FLAGS)
make clean; make
====== run on Summit
bsub -Is -P cli144 -nnodes 1 -W 2:00 $SHELL # Launch interactive session
Go to the test module directory and copy (reference) data for the test module (LakeTemperature)
cd SPEL_OpenACC/unit-tests/LakeTemperature
cp ../../*.txt . # copy reference/input data (E3SM_constants.txt
# and output_LakeTemperature_vars.txt)
Run the test module (both CPU Version and GPU version)
./elmtest.exe 2 # run LakeTemperature Module using 2 sets of 42
# AmeriFlux datasets, total 84 sites
(For a larger dataset (> 4 sets), we will need to increase the CUDA heapsize via cudeDeviceSetLimit accordingly)
PS: Verification procedure is done by inserting the “call update_vars_LakeTemperature(flag, tag) (from verificationMod.F90) into Main.F90 after the function call (such as “LakeTemperature”) to output a file that hold all variables modified by LakeTemperature. We conduct the same verification procedure for both CPU-version and GPU-version code and produce two output files, then we call “errorAnalysis.py” to analysis the differences between these two outputs.
The final optimized LakeTemparture.F90 code is saved at SPEL_OpenACC/scripts/LakeTemperatureMod.OPT.F90. It can be used to replace the LakeTemperatureMod.F90 in the test module directory to create a new elmtest.exe.