Read and cache lightning freq data in ELM to improve I/O performance #6742

jayeshkrishna · 2024-11-11T19:24:39Z

When looking at the I/O performance of E3SM v3 high resolution (ne120) case we found that the lightning data was being read by the land model a lot of times (~3K reads, every 3 hrs) slowing down the simulation (total read time = 1538.39s, close to 8% of the runtime, for a 1 yr run). Instead of reading the lightning data in multiple time steps it might be useful to read the entire variable (or multiple timesteps) in a single read and cache it.

The performance results of the case above from PACE is given below for reference (Please look at the "Read file I/O statistics" section to see the input file with the slow read times),

https://pace.ornl.gov/scorpio/198912/ELM-1601179

The file discussed above is /global/cfs/cdirs/e3sm/inputdata/atm/datm7/NASA_LIS/clmforc.Li_2012_climo1995-2011.T62.lnfm_Total_c140423.nc .

$ du -sh  /global/cfs/cdirs/e3sm/inputdata/atm/datm7/NASA_LIS/clmforc.Li_2012_climo1995-2011.T62.lnfm_Total_c140423.nc
202M	/global/cfs/cdirs/e3sm/inputdata/atm/datm7/NASA_LIS/clmforc.Li_2012_climo1995-2011.T62.lnfm_Total_c140423.nc

The lightning data, variable lnfm, in the file is small (70K per timestep, 201MB for the entire variable), assuming that the variable is distributed across processes. Multiple timesteps can be cached by the model to improve read performance (by reducing the number of reads)

Since lnfm is a time dependent variable and time is an UNLIMITED dimension if you encounter issues (including performance issues) reading the entire variable in a single read you might want to convert the time dimension to a fixed size dimension first.

ncks --fix_rec_dmn time clmforc.Li_2012_climo1995-2011.T62.lnfm_Total_c140423.nc -o fixed_timedim_clmforc.Li_2012_climo1995-2011.T62.lnfm_Total_c140423.nc

After converting the UNLIMITED dimension (time) to a fixed size dimension you would need to modify the decomposition map passed to pio_read_darray() to read multiple (or all) time slices of the lightning data (lnfm variable)

The text was updated successfully, but these errors were encountered:

rljacob · 2024-11-11T19:42:41Z

The compset is 2010_EAM%CMIP6_ELM%CNPRDCTCBCTOP_MPASSI%PRES_DOCN%DOM_MOSART_SGLC_SWAV_SIAC_SESP

jayeshkrishna · 2024-11-11T19:43:10Z

FYI: @sarats , @dqwu

rljacob · 2024-11-11T20:00:38Z

FYI: @thorntonpe . Not sure who would be best to fix this.

glemieux · 2024-11-12T23:15:42Z

I just wanted to note quickly that I introduced some code back with #5369 that replicated the lightning read in for elm-fates usage. As such, I'll make sure to update that code with the recommended fix or we could refactor the elm fire code to integrate with the fates methods that were added with #5369. That would be a larger amount of work that is beyond scope here.

ekluzek · 2024-11-13T16:12:12Z

@jayeshkrishna I don't see any reason to change the time dimension away from unlimited. You certainly can read multiple time samples from a file that has the unlimited dimension. The advantage of the unlimited dimension is that it's easy to add new time slides on the end of the file.

Is there something else going on for your reasoning to change the file dimension?

jayeshkrishna · 2024-11-13T16:41:12Z

@ekluzek : You are right, it should be possible to read the entire variable without converting the record dimension (Since typically these variables are read one timestep at a time you could encounter bugs that don't show up for typical runs. For files with multiple variables with unlimited dimensions the performance of reads would be improved by the conversion). I will update the conversion as a suggestion.

jayeshkrishna added Land Performance v3.0 Issues affecting v3.0 elm land model labels Nov 11, 2024

jayeshkrishna assigned glemieux and peterdschwartz and unassigned glemieux Nov 11, 2024

glemieux added this to FATES issue board Nov 12, 2024

github-project-automation bot moved this to ❕Todo in FATES issue board Nov 12, 2024

glemieux mentioned this issue Nov 12, 2024

Replicate fix to lightning I/O performance in elm-fates fire code NGEET/fates#1285

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Read and cache lightning freq data in ELM to improve I/O performance #6742

Read and cache lightning freq data in ELM to improve I/O performance #6742

jayeshkrishna commented Nov 11, 2024 •

edited

Loading

rljacob commented Nov 11, 2024

jayeshkrishna commented Nov 11, 2024

rljacob commented Nov 11, 2024

glemieux commented Nov 12, 2024

ekluzek commented Nov 13, 2024

jayeshkrishna commented Nov 13, 2024

Read and cache lightning freq data in ELM to improve I/O performance #6742

Read and cache lightning freq data in ELM to improve I/O performance #6742

Comments

jayeshkrishna commented Nov 11, 2024 • edited Loading

rljacob commented Nov 11, 2024

jayeshkrishna commented Nov 11, 2024

rljacob commented Nov 11, 2024

glemieux commented Nov 12, 2024

ekluzek commented Nov 13, 2024

jayeshkrishna commented Nov 13, 2024

jayeshkrishna commented Nov 11, 2024 •

edited

Loading