Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HLM-side changes to allow FATES snow occlusion of LAI #1324

Merged
merged 7 commits into from
Apr 21, 2021

Conversation

ckoven
Copy link
Contributor

@ckoven ckoven commented Apr 6, 2021

FATES hasn't thusfar been letting snow occlude LAI. With PR NGEET/fates#734, it will. This PR accompanies that FATES PR. The reason that CTSM-side changes are needed is that the FATES code needs to handle restarts slightly differently than normal timestepping so as to have bit-for-bit restarts, and so the calling routine needs to tell FATES whether it is calling the snow occlusion code during restarts versus during timestepping.

Description of changes

Just lets FATES know whether it is calling leaf_area_profile during restarting versus during timestepping.

Specific notes

Accompanies NGEET/fates#734.

Contributors other than yourself, if any:
Discussion with @rgknox @rosiealice @glemieux

CTSM Issues Fixed (include github issue #):
fixes NGEET/fates#359
fixes #1293

Are answers expected to change (and if so in what way)?
When FATES is active, should change answers, particularly in that ELAI is no longer always equal to TLAI. Should not change anything if FATES not active.

Any User Interface Changes (namelist or namelist defaults changes)?
no

Testing performed, if any: standard testing on cheyenne and izumi by @glemieux

@ckoven ckoven mentioned this pull request Apr 6, 2021
5 tasks
@rgknox rgknox self-assigned this Apr 6, 2021
@rgknox
Copy link
Collaborator

rgknox commented Apr 7, 2021

FATES developer regression tests ran, expected tests PASS:
/glade/scratch/rgknox/clmed-tests/snow-burial-fix-ctsm5.1.dev029-C86f56515-F0b83fe23.fates.cheyenne

This PR, coordinated with NGEET/fates#734 should not be b4b when use_fates=true tests, and should be b4b with all others.

clm_aux has not been run yet, and the external tag pointer in Externals_CLM will need to be updated once we update the fates tag:

https://github.com/ESCOMP/CTSM/blob/master/Externals_CLM.cfg#L5

@ekluzek @billsacks @glemieux

@ekluzek
Copy link
Collaborator

ekluzek commented Apr 7, 2021

Why is this implemented with an integer rather than a logical?

@ekluzek
Copy link
Collaborator

ekluzek commented Apr 7, 2021

It's not completely clear to me what is_called_at_restart means. From the code I think it means if you are restarting and reading the restart file you'd set that to true. But, every other time it would be false. But, the description " ! ifalse = daily timestep, itrue = restart" seems to sound like it refers to when a restart is being written? But, what if restarts aren't written daily? So I think this needs more clarity.

Copy link
Collaborator

@ekluzek ekluzek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I asked a couple questions in the conversation. The thing that I think needs to happen before bringing this in is a little more clarity on the documentation of what is_called_at_restart is and when it's used.

@ckoven
Copy link
Contributor Author

ckoven commented Apr 7, 2021

Thanks @ekluzek.

Why is this implemented with an integer rather than a logical?

Good point. I guess when I first wrote it I was thinking there might need to be more than two cases. But it turned out to only need two cases. So I am happy to switch it to logical.

I asked a couple questions in the conversation. The thing that I think needs to happen before bringing this in is a little more clarity on the documentation of what is_called_at_restart is and when it's used.

Sure, will do. It is only when a run is being re-initialized from a restart file that this matters. So maybe I will change the variable name to is_initing_from_restart and describe that in the comments.

@ckoven
Copy link
Contributor Author

ckoven commented Apr 7, 2021

Hi @ekluzek I think these changes should address your requested changes? thanks!

@glemieux
Copy link
Collaborator

glemieux commented Apr 8, 2021

Tested aux_clm on izumi against ctsm5.1.dev022 baseline, which this PR is based on:
/scratch/cluster/glemieux/tests_0408-120325iz

All non-fates testmods PASS b4b. All fates testmods for this suite result in DIFFs which is not unexpected (note: fates suite regression tests passed b4b).

@glemieux
Copy link
Collaborator

glemieux commented Apr 9, 2021

I merged in master (for both fates and ctsm) and re-ran aux_clm comparison against ctsm5.1.dev030 test on izumi. All non-fates testmods pass b4b now. Tests are found here: /scratch/cluster/glemieux/tests_0409-114712iz

@glemieux
Copy link
Collaborator

glemieux commented Apr 12, 2021

All expected aux_clm tests comparing against ctsm5.1.dev030 pass b4b on cheyenne except SMS_D_Lm1_Mmpi-serial.CLM_USRDAT.I1PtClm50SpRs.cheyenne_intel.clm-USUMB which fails during the run. It looks like it can't find the inputdata:

265     Cannot download file since it lives outside of the input_data_root '/glade/p/cesm/cseg/inputdata'
266 ERROR: Could not find all inputdata on any server

Note that all the fates testmods result in DIFFs, which is as expected. There were two expected non-fates testmod failures.
Test location is here: /glade/u/home/glemieux/scratch/tests_0409-145623ch

@ekluzek ekluzek closed this Apr 12, 2021
@ekluzek ekluzek reopened this Apr 12, 2021
@ekluzek
Copy link
Collaborator

ekluzek commented Apr 12, 2021

All expected aux_clm tests comparing against ctsm5.1.dev030 pass b4b on cheyenne except SMS_D_Lm1_Mmpi-serial.CLM_USRDAT.I1PtClm50SpRs.cheyenne_intel.clm-USUMB which fails during the run. It looks like it can't find the inputdata:

265     Cannot download file since it lives outside of the input_data_root '/glade/p/cesm/cseg/inputdata'
266 ERROR: Could not find all inputdata on any server

Note that all the fates testmods result in DIFFs, which is as expected. There were two expected non-fates testmod failures.
Test location is here: /glade/u/home/glemieux/scratch/tests_0409-145623ch

This is #3905 so is an expected fail.

@billsacks
Copy link
Member

SMS_D_Lm1_Mmpi-serial.CLM_USRDAT.I1PtClm50SpRs.cheyenne_intel.clm-USUMB

I added this to the expected fails list in a more recent tag, so once you update to the latest master, you should see it noted as "EXPECTED FAILURE".

@ekluzek ekluzek self-assigned this Apr 14, 2021
@ekluzek ekluzek added FATES API update Changes to the FATES version that also REQUIRE an API change in CTSM priority: high High priority to fix/merge soon, e.g., because it is a problem in important configurations labels Apr 14, 2021
@ekluzek
Copy link
Collaborator

ekluzek commented Apr 14, 2021

We are seeing floating point exception in running longish (6-month) FATES tests now.

SMS_D_Lm6.f45_f45_mg37.I2000Clm45Fates.cheyenne_intel.clm-FatesColdDef
SMS_D_Lm6.f45_f45_mg37.I2000Clm50Fates.cheyenne_intel.clm-FatesColdDef
SMS_D_Lm6_P144x1.f45_f45_mg37.I2000Clm50Fates.cheyenne_intel.clm-FatesColdDef

I look at the first one and it's a floating point exception at line 1403 in

fates/biogeophys/FatesPlantRespPhotosynthMod.F90

which is

    ans = exp( ha / (rgas*1.e-3_r8*(tfrz+25._r8)) * (1._r8 - (tfrz+25._r8)/tl) )

    return
  end function ft1_f

So tl must be out of reasonable range. I'm thinking the others are likely for the same reason.

@glemieux is going to dig into this further.

@ekluzek
Copy link
Collaborator

ekluzek commented Apr 14, 2021

There are some ER tests that are 9 day tests that are failing on izumi. I think they should work if they are changed to 10 day tests. I thought 9 day should work, but looks like it doesn't. So these two tests...

ERI_D_Ld9_P48x1.T31_g37.I2000Clm50Sp.izumi_nag.clm-reduceOutput
ERI_D_Ld9_P48x1.f10_f10_mg37.I2000Clm50BgcCru.izumi_nag.clm-reduceOutput

@billsacks
Copy link
Member

There are some ER tests that are 9 day tests that are failing on izumi. I think they should work if they are changed to 10 day tests. I thought 9 day should work, but looks like it doesn't. So these two tests...

ERI_D_Ld9_P48x1.T31_g37.I2000Clm50Sp.izumi_nag.clm-reduceOutput
ERI_D_Ld9_P48x1.f10_f10_mg37.I2000Clm50BgcCru.izumi_nag.clm-reduceOutput

I'm confused: haven't these been 9-day tests for a really long time?

@ekluzek
Copy link
Collaborator

ekluzek commented Apr 14, 2021

@billsacks I'm confused as well as I know we have tests that use STOP_N==9. There are 9-step tests for example. The error in the TestStatus.log file seems to indicate the length isn't right though...

 ---------------------------------------------------
2021-04-14 11:12:09: ERROR: Run length too short
 ---------------------------------------------------

@billsacks
Copy link
Member

At a glance, it looks to me like these tests were rerun, maybe automatically via the new use of --retry, and something failed in trying to rerun the ERI tests. The ERI tests have always been complex in terms of rerunnability, so I'm not totally surprised that there may be an issue there. That said, I've typically been able to rerun them by hand when they fail due to a system problem, and it would be a little troubling if they are rerunnable by hand by not via the --retry option.

@glemieux can you tell us the history of these tests: in particular, have you already tried rerunning them by hand or not?

@billsacks
Copy link
Member

I'll also add: for the sake of making this tag – if this is the last thing holding you up – you could go ahead and create new versions of these failed ERI tests, then we can come back to figuring out what's wrong here.

@glemieux
Copy link
Collaborator

Apologies @billsacks and @ekluzek, it looks like I accidentally ran the test suite against the current ctsm master branch instead of the pr branch with master merged in. I just ran one of the 'failing' tests on izumi with the correct PR branch and it passes now. I've got the test suite re-running now.

@glemieux
Copy link
Collaborator

All expected tests in the aux_clm suite on izumi pass b4b against ctsm5.1.dev033. Baselines generated. Correct test output file is here: /scratch/cluster/glemieux/tests_0414-171324iz

@ckoven
Copy link
Contributor Author

ckoven commented Apr 15, 2021

@ekluzek do you know when the FATES tests started failing? I.e. is it a change in FATES version or a change in CTSM version that leads to the test failure?

@ekluzek
Copy link
Collaborator

ekluzek commented Apr 15, 2021

From what @glemieux told me it works in ctsm5.1.dev030, but now fails in ctsm5.1.dev033. So it looks like it's a CTSM side thing, and Greg thought it due to one of the external updates that came in. I would think a compiler update (if that happened) could be a reason. I wouldn't think that the changes that went into those updates should cause this type of problem, but yet they are, the only answer changing thing had to do with CO2 for historical transient (and really only for coupled to CAM). So it shouldn't be that.

I just looked at the cime changes and they seem innocent enough (so no major compiler update for example).

Misc bfb enhancements and fixes

(1) If CISM is running over Antarctica, use virtual glacier columns over
    Antarctica

(2) Remove "mec" from some glacier/ice variable names (it is misleading
    to have "mec" in variable names when the ice landunit can actually
    have multiple columns *or* a single column) (ESCOMP#1294)

(3) Add history file metadata on each variable's l2g_scale_type (adds a
    landunit_mask attribute) (ESCOMP#1343)

(4) Use python3 in more shebang lines - needed to run python unit tests
    on cheyenne

(5) New compset naming for IG compsets (ESCOMP#1289)

(6) Remove calculation of fun_cost_fix that is overwritten
    (ESCOMP#1115)

(7) Bypass grid-level water mass check when fates hydro is active
    (ESCOMP#1334)

(8) Remove some dead code (ESCOMP#1333)
@glemieux
Copy link
Collaborator

Re-ran aux_clm tests on cheyenne. All expected PASS b4b: /glade/u/home/glemieux/scratch/tests_0420-124148ch
Tests on izumi are forthcoming.

@glemieux
Copy link
Collaborator

Re-ran aux_clm tests on izumi. All expected PASS b4b: /scratch/cluster/glemieux/tests_0420-163639iz

@rgknox rgknox self-requested a review April 21, 2021 14:19
@ekluzek ekluzek merged commit ff1ae2c into ESCOMP:master Apr 21, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
FATES API update Changes to the FATES version that also REQUIRE an API change in CTSM priority: high High priority to fix/merge soon, e.g., because it is a problem in important configurations
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Update FATES on CTSM main-dev ground snow is not occluding lai profile
5 participants