Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slow, potentially serial reading of netcdf files by E3SM #6666

Open
ndkeen opened this issue Oct 4, 2024 · 4 comments
Open

Slow, potentially serial reading of netcdf files by E3SM #6666

ndkeen opened this issue Oct 4, 2024 · 4 comments
Assignees
Labels
Performance PIO SCORPIO The E3SM I/O library (derived from PIO)

Comments

@ndkeen
Copy link
Contributor

ndkeen commented Oct 4, 2024

For a v2 ne30 F case, I was noticing that every 15th day was much slower than the other days. Digging further, I discovered it was because of a file read. This particular file:

/global/cfs/cdirs/e3sm/inputdata/atm/cam/volc/CMIP_DOE-ACME_radiation_average_1850-2014_v3_c20171204.nc

Looking at the spiostats, I see that a ne30 case ran for 1 year, time spent reading this file was reported as 216 seconds.
I suspect it's being read every 15th day of the month. Which is about 18s per month. I see the 15th day of each month as about 6x more expensive than other days. The file type is classic and it's not very large file -- so I might assume its being read in serial?

/pscratch/sd/n/ndk/e3sm_scratch/pm-cpu/20240812.v2.LR.F2010-FAMIP-hybrid-03-12-45year
@rljacob rljacob changed the title Slow, potentially serial reading of netcdf files by PIO Slow, potentially serial reading of netcdf files by E3SM Oct 7, 2024
@rljacob
Copy link
Member

rljacob commented Oct 8, 2024

Related: #6670

@ndkeen ndkeen added PIO SCORPIO The E3SM I/O library (derived from PIO) Performance labels Oct 9, 2024
@ndkeen
Copy link
Contributor Author

ndkeen commented Oct 9, 2024

Updates here.

First, I wanted to try this same launch script with current master. It does run (though I needed to remove 1 specific output), but for whatever reason, does not read the same problematic volc file noted above. Surely good reason for that. But as I'm just trying to reproduce what I saw before, I checked out maint-2.0 again and reran. Sure enough, same issue

After 1 year, spent 627 seconds reading from
/global/cfs/cdirs/e3sm/inputdata/atm/cam/volc/CMIP_DOE-ACME_radiation_average_1850-2014_v3_c20171204.nc

Then I changed the format of this file from classic to cdf5 and tried again. This time it spent 386 seconds reading. So while almost 2x faster, it's still clearly doing something not quite right as the file itself is small. I'm just going to leave that file in cfd5 format on NERSC unless I hear otherwise.

/pscratch/sd/n/ndk/e3sm_scratch/pm-cpu/test-20240812.v2.LR.F2010-FAMIP-hybrid.1yr.base
/pscratch/sd/n/ndk/e3sm_scratch/pm-cpu/test-20240812.v2.LR.F2010-FAMIP-hybrid.1yr.volc-cdf5

The overall speedup was 30.2 to 32.9 sypd or 8.9% faster.

@ndkeen
Copy link
Contributor Author

ndkeen commented Oct 17, 2024

Rob I don't think this issue is related to the one you pointed to

@rljacob
Copy link
Member

rljacob commented Oct 17, 2024

Just "related" in the sense of another instance of bad i/o.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Performance PIO SCORPIO The E3SM I/O library (derived from PIO)
Projects
None yet
Development

No branches or pull requests

3 participants