-
Notifications
You must be signed in to change notification settings - Fork 92
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Restart failed with error "longwave down is 0 or negative" #935
Comments
@XiulinGao could you link that post on the CESM forum you mentioned? thanks |
How are you specifying the longwave (LW) down from your forcing data? Is LW down one of the provided fields, or are you using other fields to calculate LW down? What's the list of fields in the forcing data? |
@rgknox Ryan: here is the link to the post https://bb.cgd.ucar.edu/cesm/threads/longwave-down-is-zero-or-negative-error-replaced-default-gswp3-forcing-with-wrf-forcing-at-hourly-scale.7784/ @ekluzek Erik: LW is specified in the forcing by the variable FLDS, I processed forcing data according to the requirement for CLM1PT forcing format, so the data is not same as GSWP3. Forget to mention I'm running this under GSWP3v1 datm mode but replaced all default forcing using the WRF data, I have suspected if datm streaming setting need to be changed to reflect a higher time resolution of the wrf forcing, which is at one-hour interval. |
Can you try running this with a different set of years to see if you get the same fail? |
@jkshuman I re-configured my PES layout to finish a single run within 2 hours to go pass this restart error now. Just let you know that there's no problem finishing the run but results are totally wrong. I went check the output forcing variables, and see that both rain and longwave radiation lost their spatial variations during the run, see attached figures. I'm not sure if this is the root of the issue, but this is totally wrong no matter what. I also attached precipitation plot for 1981-01 from the forcing I'm using, which definitely shows strong spatial pattern. |
Maybe you can try doing some spatial simulations using the original NCEP or CRUNCEP data, but let CLM downscale it to 9 km by prescribing it in the user_nl_datm. |
To do that, I should set mapalgo to bilinear, add a mesh file (my domain of interest?), and no need to change defualt forcing file? I'm not sure how to do that |
Yes, set mapalgo to bilinear. For the domain of interest - we normally obtain it when we construct the surface file. And, keep the default forcing file as it is. |
Update: I followed Ashehad's suggestion to run the simulation, however, still with WRF forcing ;), but instead setting magalgo to 'none' and meshfile to 'none', I set them to 'bilinear' and added my mesh file, just by checking the first few output files, RAIN variable now is back to normal with varying values for cells. I'll follow up on this after I run with restart and see if that fails again. Also, does nearest neighbor makes more sense? as i'm using 9km forcing for a 9km simulation so nn spatial interpolation make more sense? |
Longwave radiation error still persists after I changed the forcing setting, so that's not the issue. I'll try to use default GSWP3 forcing to see if restart also fails. |
In case this helps... In my case, the error is triggered at startup (not restart) when I remove this line |
thanks Sam. I tried to take this line of code out and it still gives the same error. I did a run with debug mode 1, here is something I see in the cesm log file, any thoughts? |
Sorry for the confusion. I didn't mean to suggest removing that line of code. In my case, removing that line of code triggered the error at startup, which means (in my case) that Beyond this speculation, I would have to try to reproduce your error and try to debug it. Please let me know @XiulinGao if you would like me to do that, and I will follow up with you. |
@XiulinGao could you provide the full logs? In particular, I'm curious about the cesm, lnd and datm logs |
yes please! here is the case directory: /glade/work/xiugao/Regional-WRF/Simulations/oak-grass_restart_CLM_FATES. If you look at the simulation directory /glade/scratch/xiugao/Regional-WRF/Simulations/oak-grass_restart_CLM_FATES/run and try to see the datm restart file you can see that the first nt (nstreams, nfiels) variable has a lot of zeros, which is different from a interval restart file that is saved during the run (see /glade/scratch/xiugao/Regional-WRF/Simulations/regional-avba-N08T27_CLM_FATES/run/regional-avba-N08T27_CLM_FATES.datm.r.2021-01-01-00000.nc for example), which makes me suspect that it might be the reason causing the error message. |
did simulation with GSWP3 forcing and restart was successful, confirming that this restart issue might be specific for the WRF forcing and domain we are using. |
Ryan, I'll get back to you later once I have the detailed logs when turn on debug, the logs were overwritten when I switched back to non-debug. When running under non-debug, the error message is only 'long wave radiation sent from atm model is negative or zero' without details. |
Here are the atm, cesm and lnd logs: https://drive.google.com/drive/folders/1Zhc4SkJeYaMwHlKsi4WmS_Ec29F61rvs?usp=sharing |
From @XiulinGao's copy of the CTSM I replicated her restart failure as follows:
Meanwhile though, I believe an initial run can give same answers as a restart as long as one updates the start dates and finidat correctly. Xiulin, I will confirm, and then we can discuss how you can do this, because I see it as your only immediate option, while restarts do not work for curvilinear grids. @ekluzek do you know of others using curvilinear grids that we could ask for guidance, or is Xiulin the first? |
@XiulinGao I am almost there, and I hope @ekluzek may have feedback: @ekluzek I think a branch simulation will get past this problem. I also vaguely recall that there's a way to start a branch without pointing to the datm restart file, right? I will look into it. @XiulinGao if you wanted to try using initial runs (manually) for now, you could see how I did it in this case (otherwise you could wait until next week when I hope to figure out the "branch" solution):
For the initial run that repeated year 2, I did:
So each time that you need to restart, you need to update user_nl_clm and env_run.xml similar to how I did for my "continue" simulation. |
...I may have been wrong about branch runs:
|
Update:
@mvertens suggested a short run to check whether the same is true in a cpl.h file generated after restart. The answer is yes:
|
(editing my handle to @slevisconsulting because there is some unknown to me person out there who has responded in the past to the handle that you used for me :-)) @slevisconsulting - in looking at your $CASEROOT - I see the following: $ ./xmlquery -p NX CTSM and DATM treat these variables differently. ! Set scalars in export state
call State_SetScalar(dble(ldomain%ni), flds_scalar_index_nx, exportState, &
flds_scalar_name, flds_scalar_num, rc)
call State_SetScalar(dble(ldomain%nj), flds_scalar_index_ny, exportState, &
flds_scalar_name, flds_scalar_num, rc) So even thought LND_NX and LND_NY are 1 - those values are ignored and the domain values are sent instead: DATM on the other hand uses the ATM_NX and ATM_NY values that are in nuopc.runconfig (that are obtained from the xml variables ATM_NX and ATM_NY) - which are both 1. the corresponding nx and ny values are used by the mediator to write out 2d history and restart output and determines the coordinate axis that would other wise be a 1d unstructured list. My suggestion would be to set the xml variables and see if this resolves the restart problem. We need to understand why the xml variables are 1 and then find a way to set them correctly. |
Good news @mvertens @XiulinGao @mvertens pls let me know if you have a recommendation for a way to correct this problem in the scripts. Should we open a github issue and, if so, under CTSM or elsewhere? |
@slevis - great news that it worked! |
@mvertens |
@XiulinGao to try the above suggestion, edit the file |
...then create a new case with |
Awesome! Thank you both for the insightful comments and solutions! I tried to define a grid resolution (WRF-SPARSE) in component_grids_nuopc.xml, but failed with error message saying "Compset specification file is /glade/u/home/xiugao/CTSM/cime_config/config_compsets.xml I wonder if defining a new grid resolution involves not only modification in component_grids_nuopc.xml but also changes in other places. |
@XiulinGao I'm glad that you restarted successfully by changing env_build.xml Regarding the suggestion to modify component_grids_nuopc.xml, you may be right that additional scripts need modification. Let's put that on hold and, instead, I will follow up here with this question for @jedwards4b: Jim,
Thanks! |
@slevisconsulting Yes I am the correct contact. Can you open an issue in CDEPS and provide a test along with the desired outcome. |
@XiulinGao the information you have provided here is incomplete. It would help me if you could show the modification that you made to component_grids_nuopc.xml and the command that generated the error. |
@jedwards4b I posted a new issue in CDEPS so that you may replicate the error. |
I think I actually figured out how to do it without manually editing ATM_NX ATM_NY LND_NX LND_NY. when build the model, specify res=147x151_california. No need to xmlchange mesh file for lnd and atm, but still have to point to the mask file for sparse grid run. But I agree with Sam's suggestion to automatically read the dimension from mesh file would make things easier. |
I guess I don't see how you are proposing to read the dimensions from the mesh file? |
you can specify mesh file when building the case by ./xlmchange ATM_DOMAIN_MESH & LND_DOMAIN_MESH |
I understand now, thanks. |
So the method used here is what you should do. I think that CLM_USRDAT should not be used in this case. |
Thank you! curious how that would be different from running clm using CLM_USRDAT but manually change nx ny and mesh file. To me, those predefined grid resolution serve as a shortcut for setting up domain and mapping files? Which seems not necessary for nuopc as it does mapping on the fly? and we can easily assign mesh file and define nx ny? |
Sure you can do that - but having to make a bunch of changes after you define a case makes it hard to test that case in any automated testing. |
right, that makes sense. Thanks for all the insightful discussions here, learned a lot. |
@XiulinGao I added step G7 in discussion #1919 in this list of instructions, so as to include the above solutions to the restart issue. Feel free to make corrections or additions if you find it necessary. Also, if this issue (#935) is resolved, then you may wish to close it. |
Sounds good. Close the issue now. |
Hi FATES team, I'm running a regional simulation at 9kn spatial resolution with masked land units (only in grasslands) on Cheyenne using NUOPC driver. It's a 40-year-long simulation with resubmit, but model failed right away after restart with error pointing to a negative longwave radiation. I did following steps to identify the problems:
Some background info for this regional case can be found here: ESCOMP/CTSM#1773
I have posted this on CESM forum, but think might also post it here just in case anyone has ran into a similar issue before and have already solved it.
Thanks!
The text was updated successfully, but these errors were encountered: