-
Notifications
You must be signed in to change notification settings - Fork 253
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
writing native grid atmf history files is too slow in FV3ATM #2439
Comments
@DavidHuber-NOAA provided a HR4 gdasfcst test case on dogwood at: /lfs/h2/emc/global/noscrub/David.Huber/keep/gdasfcst_w_native_rundir |
I noticed that in the above run directory (/lfs/h2/emc/global/noscrub/David.Huber/keep/gdasfcst_w_native_rundir) in model_configure lossy compression parameters (quantization) are set as:
|
@aerorahul the |
@junwang-noaa and @aerorahul had a conversation on what these should be. |
@DusanJovic-NOAA the quantize_nsd and quantize_bitround configurations are corresponding to the previous nbits=14 with our customized lossy compression code. The physics group evaluated with results for nbits setting from (nbits=12-32), and decided the nbits=14 to be used in GFSv16. The quantize_nsd=5 is corresponding to nbits=14. |
@DavidHuber-NOAA Can you sync the input data for the test case you provided on Cactus. I see these errors:
|
@DusanJovic-NOAA This test case was run on Dogwood and I do not have access to it now that it is in production. However, I just created a fresh clone into |
Thanks. It works, but I had to change the directory names in input.nml. ls: cannot access '/lfs/h2/emc/global/noscrub/David.Huber/GW/develop/fix/am/global_slmask.t1534.3072.1536.grb': No such file or directory but the one with 'david.huber` does exist. |
I found that native history files write is noticeably faster if I change the size of the chunks, specifically: diff --git a/io/module_write_netcdf.F90 b/io/module_write_netcdf.F90
index b016415..03a9d57 100644
--- a/io/module_write_netcdf.F90
+++ b/io/module_write_netcdf.F90
@@ -398,14 +398,14 @@ contains
par_access = NF90_COLLECTIVE
if (rank == 2 .and. ichunk2d(grid_id) > 0 .and. jchunk2d(grid_id) > 0) then
if (is_cubed_sphere) then
- chunksizes = [im, jm, tileCount, 1]
+ chunksizes = [im, jm, 1, 1]
else
chunksizes = [ichunk2d(grid_id), jchunk2d(grid_id), 1]
end if
ncerr = nf90_def_var_chunking(ncid, varids(i), NF90_CHUNKED, chunksizes) ; NC_ERR_STOP(ncerr)
else if (rank == 3 .and. ichunk3d(grid_id) > 0 .and. jchunk3d(grid_id) > 0 .and. kchunk3d(grid_id) > 0) then
if (is_cubed_sphere) then
- chunksizes = [im, jm, lm, tileCount, 1]
+ chunksizes = [im, jm, 1, 1, 1]
else
chunksizes = [ichunk3d(grid_id), jchunk3d(grid_id), min(kchunk3d(grid_id),fldlev(i)), 1]
end if Can apply this change in the code, recompile, and rerun your test. |
@DusanJovic-NOAA Thanks for the quick attention on this. I gave your code changes a try and ran a fresh forecast with native grid writes enabled at C768. This significantly reduce the runtime from ~60 minutes to ~23 minutes. I copied the run directory into |
@DavidHuber-NOAA Thank you for checking. @junwang-noaa should we update the code in develop with these changes? |
@DusanJovic-NOAA Thanks for debugging the issue. The timing looks good now. Please update the develop branch. |
Description
The G-W gdas fcst job slows down significantly when the option of writing the native grid history files is turned on. Besides the resources issue on write grid component, it is also found that the model writes native grid atmf history files significantly slower than writing the Gaussian grid atmf history files or writing the native grid restart files. The timing from Dave's test is showing below:
nid002370.dogwood.wcoss2.ncep.noaa.gov 2544: ./atmf003.nc write time is 18.91891 at fcst 03:00
nid002370.dogwood.wcoss2.ncep.noaa.gov 2544: ./cubed_sphere_grid_atmf003.nc write time is 184.79446 at fcst 03:00
nid002370.dogwood.wcoss2.ncep.noaa.gov 2544: ./cubed_sphere_grid_sfcf003.nc write time is 36.00565 at fcst 03:00
nid002370.dogwood.wcoss2.ncep.noaa.gov 2544: ./sfcf003.nc write time is 36.36828 at fcst 03:00
nid002370.dogwood.wcoss2.ncep.noaa.gov 2544: RESTART/20211220.210000.fv_core.res.nc write time is 5.30265 at fcst 03:00
nid002370.dogwood.wcoss2.ncep.noaa.gov 2544: RESTART/20211220.210000.fv_srf_wnd.res.nc write time is 0.01886 at fcst 03:00
nid002370.dogwood.wcoss2.ncep.noaa.gov 2544: RESTART/20211220.210000.fv_tracer.res.nc write time is 7.70513 at fcst 03:00
nid002370.dogwood.wcoss2.ncep.noaa.gov 2544: RESTART/20211220.210000.phy_data.nc write time is 7.28120 at fcst 03:00
nid002370.dogwood.wcoss2.ncep.noaa.gov 2544: RESTART/20211220.210000.sfc_data.nc write time is 3.23882 at fcst 03:00
To Reproduce:
Additional context
Output
The text was updated successfully, but these errors were encountered: