Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ERS tests at least one year long failing across multiple test mods #897

Closed
glemieux opened this issue Aug 30, 2022 · 7 comments · Fixed by #1098
Closed

ERS tests at least one year long failing across multiple test mods #897

glemieux opened this issue Aug 30, 2022 · 7 comments · Fixed by #1098

Comments

@glemieux
Copy link
Contributor

glemieux commented Aug 30, 2022

In the course of debugging an issue that lead to the creation of #894, it was discovered that there was a subtler, secondary issue with exact restarts for test cases that are a year or longer (i.e. most ERS tests fail on COMPARE_base_rest). This currently appears somewhat different than the issue noted in ESCOMP/CTSM#667 (comment). The problem did not at first appear confined to anyone variable or any particular testmod, although the following has been discovered so far:

  • All variations on FatesColdDefReducedComplexSatPhen testmod are b4b
  • The 1x1_brazil grid resolution is b4b
  • no comp + fixed biogeo is not b4b
  • FatesColdDef is not b4b

Through testing a subset of the run modes I've found that FatesColdDefReducedComplexNoComp will run b4b if I comment out the call to trim_canopy, turn fire off, set nclmax = 1, and set test_zero_mortality = .true.. Trying the same setup with FatesColdDef will result in a failure on COMPARE_base_rest.

The current thread that I'm following is assessing the DIFFs for the former above test setup, but with trim_canopy on. I've found that both bc_in%h2o_liqvol_sl and tveg24 are varying on the final pass through the call to phenology. This seems to suggest to me that there might be some timing issue on the last model day of the year. This plus a number of diagnostic outputs for the restart variables, lends some confidence that this issue isn't in the restart initialization necessarily.

Also note that these test were run with #685.

@glemieux
Copy link
Contributor Author

glemieux commented Sep 1, 2022

Following the thread of the restart differences around tveg24 is pointing to problem being located inside the filter loop (within the leaf temperature iterative loop) on the last day of the year in this section of the CanopyFluxes code:

https://github.com/ESCOMP/CTSM/blob/56878b6a77e167c1c875aa9cabdf6ea2e482d737/src/biogeophys/CanopyFluxesMod.F90#L1233-L1257

I've confirmed through diagnostic outputs that the sum of the t_veg_patch values at the start of the loop and the end of loop are different across the restarts. Next I'm going to try and isolate the patches to see if there is a specific subset that is problematic (I'm currently narrowing the output to only patches at a known problematic fates site). In this way I hope to be able to better identify which of the multiple variable going into the dt_veg calculation may be causing the difference.

@glemieux
Copy link
Contributor Author

glemieux commented Sep 12, 2022

Tracing the issue lead me through the host land model and back to fates in the trim_canopy check here:

if (currentCohort%year_net_uptake(z) < currentCohort%leaf_cost) then
! Make sure the cohort trim fraction is great than the pft trim limit
if (currentCohort%canopy_trim > EDPftvarcon_inst%trim_limit(ipft)) then
! keep trimming until none of the canopy is in negative carbon balance.
if (currentCohort%hite > EDPftvarcon_inst%hgt_min(ipft)) then
currentCohort%canopy_trim = currentCohort%canopy_trim - &
EDPftvarcon_inst%trim_inc(ipft)
if (prt_params%evergreen(ipft) /= 1)then
currentCohort%leafmemory = currentCohort%leafmemory * &
(1.0_r8 - EDPftvarcon_inst%trim_inc(ipft))
endif
trimmed = .true.
endif ! hite check
endif ! trim limit check
endif ! net uptake check

At least part of the ERS issue is that year_net_uptake is not being carried over in the restart. Thus if the restart is kicked off mid-year, the yearly net uptake will be less than the base and some of the cohorts will avoid being trimmed in this check. Talking to @rgknox, ideally we would roll this fix in to #769, but we agreed that focusing on the fix is a priority. I will test this fix by adding the yearly uptake to the restart with the full compliment of every leaf layer for the time being.

@glemieux
Copy link
Contributor Author

glemieux commented Sep 13, 2022

Adding year_net_uptake to the restart interface results in b4b restart runs for the few test mod and grid combinations that I have exercise so far, but only for tests that start on December 1 and runs through the end of the year. Extending out the total run time to 2 months (i.e. starting in November) or greater (e.g. starting on Jan 1 for a one year run) results in COMPARE_base_rest failure.

This suggests that the year_net_uptake is certainly part of the issue, but that there are likely multiple problems to contend with.

@glemieux
Copy link
Contributor Author

I realized I made an error in my initial fix to add year_net_uptake to the restart file. Fixing the very simple, but memory intensive, implementation results in b4b restarts. Attempting a more complex restart using RegisterCohortVector subroutine did not result in b4b restarts however. Currently investigating if its my implementation or something else.

@glemieux
Copy link
Contributor Author

glemieux commented Sep 20, 2022

The issue is not with the restart implementation method. I simply missed that I had taken out the nclmax change that I had been using during my investigations. So the current standing is that with nclmax = 1 a 13 month f10 nocomp tests with restart will return b4b results. Fully dynamic fates will not restart with b4b results however. If I reset nclmax to the default (2) then nocomp tests will return to failing the restart.

@glemieux
Copy link
Contributor Author

For future reference, since this has moved down the priority list, the branch for adding the yearly_net_uptake is https://github.com/glemieux/fates/commits/restart-nlevleaf

@glemieux
Copy link
Contributor Author

This appears to have been fixed by #1098.

@glemieux glemieux linked a pull request Nov 16, 2023 that will close this issue
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

1 participant