LWISO_Ld10.f10_f10_mg37.I2000Clm50BgcCrop.derecho_gnu.clm-coldStart test failure in ctsm5.2 branch #2366

slevis-lmwg · 2024-02-15T00:14:57Z

Brief summary of bug

The aforementioned test fails in the ctsm5.2 branch when pointing the aux_clm test-suite to the newly generated fsurdat files:

ERROR in CompareBulkToTracer: tracer does not agree with bulk water
Called from: after downscale_forcings
Variable: ice1_grc
First difference at index: 2
Bulk  :  -0.37548989083934737-321
Tracer:  -0.37598395648518862-320
ratio:   0.10000000000000000E+02
Bulk*ratio:  -0.37548989083934737-320

At first glance, this seems like a repeat of #2041. In particular, it's the ctsm5.2 branch pointing to a new fsurdat file and it's the same error message but for variable ice1_grc this time. Further investigation has revealed a different problem this time. Details below.

General bug information

CTSM version you are using:
alpha-ctsm5.2.mksrf.19_ctsm5.1.dev163-55-g1f4611fb7

Does this bug cause significantly incorrect results in the model's science?
I hope not "significantly" though see more info in the details below.

Configurations affected:
I think all configurations.

Details of bug

The test passes with a one-line change. Note that the change follows liqcan as a template by including the max function in snocan instead of omitting it:

@@ -1605,7 +1605,8 @@ bioms:   do f = 1, fn
          if (t_veg(p) > tfrz ) then ! above freezing, update accumulation in liqcan
             if ((qflx_evap_veg(p)-qflx_tran_veg(p))*dtime > liqcan(p)) then ! all liq evap
                ! In this case, all liqcan will evap. Take remainder from snocan
-               snocan(p)=snocan(p)+liqcan(p)+(qflx_tran_veg(p)-qflx_evap_veg(p))*dtime  
+               snocan(p) = max(0._r8, &
+                  snocan(p) + liqcan(p) + (qflx_tran_veg(p) - qflx_evap_veg(p)) * dtime)
             end if
             liqcan(p) = max(0._r8,liqcan(p)+(qflx_tran_veg(p)-qflx_evap_veg(p))*dtime)

I converged on this line of code as the culprit by realizing that the original error message did not just indicate tiny values as in #2041, but tiny negative values. So I pursued all sources of negative ice1_grc. I found that even after the above change, the code continues to generate negative ice1_grc in subroutine ComputeLiqIceMassNonLake after subtracting dynbal_baseline_ice(c). By the time the LWISO test completes, I see ice1_grc values < -800 kg/m2.

So this issue has two parts:

Get the test passing with the above fix

Discussion below suggests that other negative ice1_grc (as described above) may be ok, so this issue will close when #2371 gets merged.

The text was updated successfully, but these errors were encountered:

slevis-lmwg · 2024-02-15T00:23:48Z

I added this to the 5.2 project board , and I see the following options:

As with issue LWISO_Ld10.f10_f10_mg37.I2000Clm50BgcCrop.cheyenne_gnu.clm-coldStart FAIL in ctsm5.2 #2041 and corresponding PR Change small snocan to zero #2053, I resolve this and merge it with ctsm5.1 before making ctsm5.2, because the correction is answer-changing, while 5.2 only makes dataset-related changes.
We leave the test as EXPECTED FAILURE and resolve in ctsm5.3.

Thoughts/feedback?

ekluzek · 2024-02-15T00:35:40Z

Ahh, good question @slevis-lmwg. Either way is doable. By the way I made the two options numbered so we can easily reference them by number. From reading the above issue I see that you have a simple one line fix that would be easy to bring in anytime. But, you also suggest that more work might need to be done to fix negative numbers here, and would be something we'd likely want to do in ctsm5.3 as it's a potential longer term thing that requires painful debugging and impossible to give a timeline on.

However, does the simple one line fix get the case working for CTSM5.2? Don't check the change in, just see if it allows the model to work. If so I'd say do number 1. If it doesn't then we should plan on number 2 and it being an expected fail and work on it later.

ekluzek · 2024-02-15T16:52:29Z

In meeting we decided that @slevis-lmwg will do a simple answer changing tag to master that gets this working. Then update the ctsm5.2 branch to that, and show that this test works, so ctsm5.2 can come in without this being an expected fail.

slevis-lmwg · 2024-02-15T17:10:45Z

In the meeting we also decided to consider moving the liqcan(p) = line up to prevent code duplication.

Also I will bring #2355 into main with this answer-changing tag.

billsacks · 2024-02-15T17:51:54Z

Thanks a lot for your work on this @slevis-lmwg and for tracking down the issue!

I think your solution – applying max(0, ...) to the snocan update – seems reasonable for now. I'm nervous that both this and the pre-existing liqcan update could just as easily generate tiny positive values that should actually be 0, which will lead to similar numerical issues. So a more robust solution would be to change both the liqcan and snocan updates to use truncate_small_values. But I'm okay kicking that can down the road if it isn't causing problems for now.

Regarding the negative ice1 values after subtracting dynbal_baseline_ice: I would need to spend some time getting my head back into this to say for certain, but my initial reaction is that this might be okay. There are some arbitrary state variables in the model, such as the amount of ice in glacier columns and the amount of water in lakes, and these baselines adjust for these to help avoid unnecessarily large adjustment fluxes when we have dynamic landunits. (This is partly explained in the ChangeLog entry for ctsm1.0.dev031.) I think that a negative ice1 value then just means that there is less ice now than there was initially, which would be okay. That said, the negative value you give does seem awfully large, so this may be a true issue.... But even so, I'd say that there's a reasonable chance that you'll find that this is working as intended, so it probably isn't worth the time to dig into right now.

Do not allow negative snocan in a part of the code that triggers the error reported in issue #2366.

Remove a case of negative snocan in CanopyFluxesMod to resolve #2366 One-line change that now includes "max(0._r8," to prevent negative snocan in that part of the code.

Remove a case of negative snocan in CanopyFluxesMod to resolve ESCOMP#2366 One-line change that now includes "max(0._r8," to prevent negative snocan in that part of the code.

slevis-lmwg self-assigned this Feb 15, 2024

slevis-lmwg added the next this should get some attention in the next week or two. Normally each Thursday SE meeting. label Feb 15, 2024

slevis-lmwg added the bug something is working incorrectly label Feb 15, 2024

slevis-lmwg mentioned this issue Feb 15, 2024

Workaround for transient Smallville tests #1673 + testing all new datasets #2318

Merged

ekluzek removed the next this should get some attention in the next week or two. Normally each Thursday SE meeting. label Feb 15, 2024

slevis-lmwg added a commit to slevis-lmwg/ctsm that referenced this issue Feb 15, 2024

Resolve issue ESCOMP#2366 and PR ESCOMP#2355

7401dd9

slevis-lmwg mentioned this issue Feb 15, 2024

Merge-tag to resolve issue #2366 #2371

Merged

slevis-lmwg closed this as completed in #2371 Feb 16, 2024

slevis-lmwg added a commit that referenced this issue Feb 16, 2024

Merge pull request #2371 from slevis-lmwg/merge_tag_20240215

e3e2e80

Do not allow negative snocan in a part of the code that triggers the error reported in issue #2366.

slevis-lmwg mentioned this issue Apr 4, 2024

Failing water isotope test on the ctsm5.2 branch #2444

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LWISO_Ld10.f10_f10_mg37.I2000Clm50BgcCrop.derecho_gnu.clm-coldStart test failure in ctsm5.2 branch #2366

LWISO_Ld10.f10_f10_mg37.I2000Clm50BgcCrop.derecho_gnu.clm-coldStart test failure in ctsm5.2 branch #2366

slevis-lmwg commented Feb 15, 2024 •

edited

Loading

slevis-lmwg commented Feb 15, 2024 •

edited by ekluzek

Loading

ekluzek commented Feb 15, 2024 •

edited

Loading

ekluzek commented Feb 15, 2024

slevis-lmwg commented Feb 15, 2024

billsacks commented Feb 15, 2024

LWISO_Ld10.f10_f10_mg37.I2000Clm50BgcCrop.derecho_gnu.clm-coldStart test failure in ctsm5.2 branch #2366

LWISO_Ld10.f10_f10_mg37.I2000Clm50BgcCrop.derecho_gnu.clm-coldStart test failure in ctsm5.2 branch #2366

Comments

slevis-lmwg commented Feb 15, 2024 • edited Loading

Brief summary of bug

General bug information

Details of bug

slevis-lmwg commented Feb 15, 2024 • edited by ekluzek Loading

ekluzek commented Feb 15, 2024 • edited Loading

ekluzek commented Feb 15, 2024

slevis-lmwg commented Feb 15, 2024

billsacks commented Feb 15, 2024

slevis-lmwg commented Feb 15, 2024 •

edited

Loading

slevis-lmwg commented Feb 15, 2024 •

edited by ekluzek

Loading

ekluzek commented Feb 15, 2024 •

edited

Loading