-
Notifications
You must be signed in to change notification settings - Fork 316
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
C13_HR, C13_NBP, FPI values result in numeric conversion not representable error #741
Comments
Keith O and I have a hunch that this is likely caused by a very small inorganic N pool size (likely SMIN_NH4) that results in the negative FPI value you're receiving. There's a limiter already in the code to make sure we're not dividing by a negative or zero value, but nothing for the numerator Try adding the underlined line to the end of SoilBiogeochemCompetitionMod.F90 [line 935], and the appropriate end if This should hopefully get you past the error, and also allow some additional history files to be written out that we can look at.
|
Will (and others), Thanks for taking a look. I applied the patch below to the clone of my case with hist_ndens=1 and reran it. /gpfs/u/home/cmip6/cesm_tags/BSSP585_BPRP_n001/cime/../components/clm/src/soilbiogeochem/SoilBiogeochemCompetitionMod.F90 2019-05-15 13:16:00.935004097 -0600
+++ SourceMods/src.clm/SoilBiogeochemCompetitionMod.F90 2019-05-20 15:06:26.059764128 -0600
@@ -933,7 +933,11 @@ subroutine SoilBiogeochemCompetition (bo
end if
if (potential_immob(c) > 0.0_r8) then
- fpi(c) = actual_immob(c) / potential_immob(c)
+ if (actual_immob(c) > 0.0_r8) then
+ fpi(c) = actual_immob(c) / potential_immob(c)
+ else
+ fpi(c) = 0._r8
+ end if
else
fpi(c) = 1._r8
end if The good news is that the patch did get rid of the large FPI value, and all other fields were unchanged. The bad news is that the large magnitude values in the variables C13_HR, C13_NBP, C14_HR, and C14_NBP, CASE=b.e21.BSSP585_BPRPcmip6.f09_g17.CMIP6-esm-ssp585.001 I extracted all clm2.h0 values from the problem child gridpoint into It kinda looks like the decomp cascade is somehow producing small magnitude negative HR at depth. It looks like the computation of isotopic HR occurs in the block lines 480-495 of CNCIsoFluxMod.F90 Perhaps the (isostate / gross state) ratio is large, despite being a ratio of small numbers, What are your thoughts on adding (soilbiogeochem_cf%decomp_cascade_hr_vr_col(cc,j,l) > 0._r8) Or does it make more sense to try to eliminate the negative HR value? |
I sent an email just to Will and Keith O at the same moment that Keith L sent his. I will repeat it here for the benefit of the whole list, though it repeats some of what Keith L just said: The possibility that we saw in our software meeting was that decomp_cpools_vr_col could be very close to 0 in this block of code in CNCIsoFluxMod:
Possibly what's happening is: one of the cpools is being updated to be close to 0 (when it should probably be exactly 0), and truncation of this near-zero value doesn't happen until later. One possible place where I see an update of decomp_cpools_vr_col without an associated truncation is in CStateUpdateDynPatch. Bill |
Hi, Before testing the mod to CNCIsoFluxMod.F90, I set 'hist_nhtfrq(1) = 1' to get output every timestep. I put in the following patch, and the model no longer generates large magnitude values in the first timestep for Despite changing these quantities in the first timestep of the year, there were no changes to values in subsequent timesteps. /gpfs/u/home/cmip6/cesm_tags/BSSP585_BPRP_n001/cime/../components/clm/src/biogeochem/CNCIsoFluxMod.F90 2019-05-15 13:16:00.034407733 -0600
+++ SourceMods/src.clm/CNCIsoFluxMod.F90 2019-05-20 19:42:44.939207535 -0600
@@ -482,7 +482,8 @@ subroutine CIsoFlux1(num_soilc, filter_s
do j = 1, nlevdecomp
do l = 1, ndecomp_cascade_transitions
cdp = cascade_donor_pool(l)
- if ( soilbiogeochem_cs%decomp_cpools_vr_col(cc,j,cdp) /= 0._r8) then
+ if ( soilbiogeochem_cs%decomp_cpools_vr_col(cc,j,cdp) /= 0._r8 .and. &
+ soilbiogeochem_cf%decomp_cascade_hr_vr_col(cc,j,l) > 0._r8) then
iso_soilbiogeochem_cf%decomp_cascade_hr_vr_col(cc,j,l) = &
soilbiogeochem_cf%decomp_cascade_hr_vr_col(cc,j,l) * &
(iso_soilbiogeochem_cs%decomp_cpools_vr_col(cc,j,cdp) &
@@ -499,7 +500,8 @@ subroutine CIsoFlux1(num_soilc, filter_s
do j = 1, nlevdecomp
do l = 1, ndecomp_cascade_transitions
cdp = cascade_donor_pool(l)
- if ( soilbiogeochem_cs%decomp_cpools_vr_col(cc,j,cdp) /= 0._r8) then
+ if ( soilbiogeochem_cs%decomp_cpools_vr_col(cc,j,cdp) /= 0._r8 .and. &
+ soilbiogeochem_cf%decomp_cascade_ctransfer_vr_col(cc,j,l) > 0._r8) then
iso_soilbiogeochem_cf%decomp_cascade_ctransfer_vr_col(cc,j,l) = &
soilbiogeochem_cf%decomp_cascade_ctransfer_vr_col(cc,j,l) * &
(iso_soilbiogeochem_cs%decomp_cpools_vr_col(cc,j,cdp) & I tried removing the patch to SoilBiogeochemCompetitionMod.F90, but the large FPI returned. Keith |
FYI, I reran my test without changing line 502 of CNCIsoFluxMod.F90, and the large values are still gone from the history file. |
I've cloned Keith's case to do some investigation. cs_soil%decomp_cpools_vr_col(c,j,i_met_lit) These variables are zero initially and then become small negative because of small negative values of the fluxes: cf_veg%dwt_frootc_to_litr_met_c_col(c,j) The state variables don't seem to be truncated until much later on in the calling sequence.
!KO
This seemed to get rid of large values of C13_HR, C13_NBP, C14_HR, C14_NBP, FPI, small negative HR_vr at depth, and small negative values of HR and NBP. And I was able to run with hist_ndens=2 (r4). But, at this point I don't understand why the fluxes are small negative and whether that is expected. |
From discussion today:
! fine root litter carbon fluxes
cnveg_carbonflux_inst%dwt_frootc_to_litr_met_c_col(c,j) = &
cnveg_carbonflux_inst%dwt_frootc_to_litr_met_c_col(c,j) + &
(dwt_frootc_to_litter(p)*pftcon%fr_flab(patch%itype(p)))/dt &
* soilbiogeochem_state_inst%froot_prof_patch(p,j) Is it
|
Keith Lindsay reports that the exact change he ran with that works is as follows: -- /gpfs/u/home/cmip6/cesm_tags/BSSP585_BPRP_n001/components/clm/src/soilbiogeochem/SoilBiogeochemCompetitionMod.F90 2019-05-15 13:16:00.935004097 -0600
+++ ./SoilBiogeochemCompetitionMod.F90 2019-05-21 17:23:16.623107000 -0600
@@ -933,7 +933,11 @@ subroutine SoilBiogeochemCompetition (bo
end if
if (potential_immob(c) > 0.0_r8) then
- fpi(c) = actual_immob(c) / potential_immob(c)
+ if (actual_immob(c) > 0.0_r8) then
+ fpi(c) = actual_immob(c) / potential_immob(c)
+ else
+ fpi(c) = 0._r8
+ end if
else
fpi(c) = 1._r8
end if |
After spending several hours looking at the wrong variety of carbon, I found that dwt_frootc_to_litter is the culprit. So next I'll start by looking at frootc_patch. |
frootc_patch is negative going into update_patch_state. Would it make sense that it is an inactive patch (it does seem to be only one pft on the suspect column) and it becomes active 2046-01-01? Alternative hypothesis anyone? |
Thanks for digging, @olyson ! I can't tell from your last comment: are you saying that it is an inactive patch, or is that just a hypothesis? If the latter, you should be able to check It doesn't really make sense to me that this would be an inactive point going in to the land use change, because I think this dwt term is generated from a shrinking patch – so, presumably, one that had non-zero area to begin with (though it's possible that a patch could be shrinking on its column while the column itself has 0 area... I'd need to think more about that). If you do find that the patch was inactive on the restart file, I can help give this some more thought. |
Just a hypothesis. I'll check the restart file. |
Per Bill's suggestion I found the correct pft number in the restart file (I couldn't get GetGlobalIndex to work because of error "calling within a threaded region", but I narrowed it down using lat/lon, pft type, etc) and found that it did have that exact negative value for frootc_c13 (and it's active). I had been using the local pft number. Then, looking at the code, I realized that we don't do precision control for frootc: The other thing I investigated was I went back to the fpi problem that Will and I suggested a fix for. I found that there was also a large negative value for fpi_vr for the original problem column noted by Keith L. that comes from a large negative value for fpi_nh4_vr. I applied a similar fix to the calculation for fpi_nh4_vr:
Along with the other fix to fpi (and not using the fix to CNCIsoFluxMod.F90), that allowed the model to run to the end of the month and put out a history file in single precision. Since these approaches are all some form of precision control anyway, I tried my original approach of calling SoilBiogeochemPrecisionControl (with no other code modifications), and this worked for Simone's case (worked previously for Keith L.'s case as well). |
Thanks a lot, @olyson ! So, for the dwt flux issue, is this summary correct? (I have less understanding of the fpi issue): frootc can sometimes be negative; this is intentional. Negative frootc causes negative dwt_frootc_to_litter if the patch in question is shrinking. The resulting negative fluxes cause problems in the ciso calculation. This can be worked around by inserting an extra precision control call between the calculation of the dwt fluxes and the ciso fluxes, so that small negative dwt fluxes are set to 0. I'm okay with fixing this by putting in this extra call to precision control, as long as there's a comment explaining why this is needed. I'm thinking: For the sake of code maintenance, it could be helpful to know which precision control calls are needed for broad purposes, and which are just needed for very select reasons. Eventually we could consider removing the latter, either fixing the root cause of the problem or just doing something like precision control on the small number of variables that need it at that point. |
That sums it up nicely. I'm not sure of the cause of the fpi issue, if it's related to the negative fluxes, or why the precision control call fixes it. Probably worth further investigation. |
FYI, I've now got GetGlobalIndex working and confirmed that I'm looking at the right /pft/column/etc. |
After talking with Dave and Bill, we agreed to go with the added call to SoilBiogeochemPrecisionControl in CNVegetationFacade.F90 within subroutine DynamicAreaConservation to solve this issue. The SoilBiogeochemPrecisionControl performs precision control on the C12, C13, and C14 decomp_cpools_vr_col. I have not done any testing on this other than verifying it works on Keith L.'s case and Simone's case as described above. |
diff --git a/glade/u/home/cmip6/cesm_tags/BSSP585_BPRP_n001/components/clm/src/biogeochem/CNVegetationFacade.F90 b/CNVegetationFacade.F90
index fed7889..c3b4220 100644
--- a/glade/u/home/cmip6/cesm_tags/BSSP585_BPRP_n001/components/clm/src/biogeochem/CNVegetationFacade.F90
+++ b/CNVegetationFacade.F90
@@ -721,6 +721,13 @@ contains
soilbiogeochem_nitrogenstate_inst)
call t_stopf('CNUpdateDynPatch')
+ ! This call fixes issue #741 by performing precision control on decomp_cpools_vr_col
+ call t_startf('SoilBiogeochemPrecisionControl')
+ call SoilBiogeochemPrecisionControl(num_soilc_with_inactive, filter_soilc_with_inactive, &
+ soilbiogeochem_carbonstate_inst, c13_soilbiogeochem_carbonstate_inst, &
+ c14_soilbiogeochem_carbonstate_inst,soilbiogeochem_nitrogenstate_inst)
+ call t_stopf('SoilBiogeochemPrecisionControl')
+
call t_startf('dyn_cnbal_col')
call dyn_cnbal_col(bounds, clump_index, column_state_updater, &
soilbiogeochem_carbonstate_inst, c13_soilbiogeochem_carbonstate_inst, & |
When running a transient case with C isotopes, people occasionally ran into a problem whereby C13_HR, C13_NBP, FPI values result in numeric conversion not representable error. At least part of the problem can be explained as: frootc can sometimes be negative; this is intentional. Negative frootc causes negative dwt_frootc_to_litter if the patch in question is shrinking. The resulting negative fluxes cause problems in the ciso calculation. This can be worked around by inserting an extra precision control call between the calculation of the dwt fluxes and the ciso fluxes, so that small negative dwt fluxes are set to 0. For more details, see ESCOMP#741 Resolves ESCOMP#741
@olyson - I have run the test suite off of the release branch. All tests are passing with this change, but we're getting answer changes. As is often the case, it's hard to tell definitively from the test suite whether these answer changes are scientifically meaningful or are essentially roundoff level. Transient cases show answer changes, as expected. In addition:
|
@olyson I should add: I'm not too concerned about the diffs in non-transient cases: with this extra precision control call, it's not too surprising that we're getting answer changes in a variety of situations. So I think that, if you verify that behavior looks okay (around roundoff-level diffs) in a transient case, like we discussed on Friday, then that's probably sufficient to sign off on these changes. |
I've complete a full transient simulation with the proposed fix. Diagnostics compared to a control are here: I don't see any scientifically meaningful differences between these simulations, but happy to discuss with others that want to take a look. |
Add CN prec. control call to fix problems related to small neg. values Small negative values (roughly roundoff-level different from zero) in frootc (and possibly other quantities) were occasionally creating problems with carbon isotope fluxes and FPI in the first time step of the year, at the time of transient landcover change. This tag fixes the problem by introducing an extra call to SoilBiogeochemPrecisionControl in between computing the patch-level transient landcover fluxes and moving these to column-level. In particular, this truncates small negative values of decomp_cpools_vr_col to zero, which prevents the previous blow-ups. For most of the problematic fields, the explanation seems to be: frootc can sometimes be negative; this is intentional. Negative frootc causes negative dwt_frootc_to_litter if the patch in question is shrinking. The resulting negative fluxes cause problems in the ciso calculation. This can be worked around by inserting an extra precision control call between the calculation of the dwt fluxes and the ciso fluxes, so that small negative dwt fluxes are set to 0. This does not necessarily fully explain the issue with FPI, but the insertion of the extra precision control call fixes that issue, too. For more details, see the discussion in #741 Resolves #741
Fixed on the release branch in release-clm5.0.26. This still needs to come to master; this will be done in the big batch of tags that @ekluzek is bringing from the release branch to master soon. |
Add CN prec. control call to fix problems related to small neg. values Small negative values (roughly roundoff-level different from zero) in frootc (and possibly other quantities) were occasionally creating problems with carbon isotope fluxes and FPI in the first time step of the year, at the time of transient landcover change. This tag fixes the problem by introducing an extra call to SoilBiogeochemPrecisionControl in between computing the patch-level transient landcover fluxes and moving these to column-level. In particular, this truncates small negative values of decomp_cpools_vr_col to zero, which prevents the previous blow-ups. For most of the problematic fields, the explanation seems to be: frootc can sometimes be negative; this is intentional. Negative frootc causes negative dwt_frootc_to_litter if the patch in question is shrinking. The resulting negative fluxes cause problems in the ciso calculation. This can be worked around by inserting an extra precision control call between the calculation of the dwt fluxes and the ciso fluxes, so that small negative dwt fluxes are set to 0. This does not necessarily fully explain the issue with FPI, but the insertion of the extra precision control call fixes that issue, too. For more details, see the discussion in ESCOMP#741 Resolves ESCOMP#741
When running a transient case with C isotopes, people occasionally ran into a problem whereby C13_HR, C13_NBP, FPI values result in numeric conversion not representable error. At least part of the problem can be explained as: frootc can sometimes be negative; this is intentional. Negative frootc causes negative dwt_frootc_to_litter if the patch in question is shrinking. The resulting negative fluxes cause problems in the ciso calculation. This can be worked around by inserting an extra precision control call between the calculation of the dwt fluxes and the ciso fluxes, so that small negative dwt fluxes are set to 0. For more details, see ESCOMP#741 Resolves ESCOMP#741
When running a transient case with C isotopes, people occasionally ran into a problem whereby C13_HR, C13_NBP, FPI values result in numeric conversion not representable error. At least part of the problem can be explained as: frootc can sometimes be negative; this is intentional. Negative frootc causes negative dwt_frootc_to_litter if the patch in question is shrinking. The resulting negative fluxes cause problems in the ciso calculation. This can be worked around by inserting an extra precision control call between the calculation of the dwt fluxes and the ciso fluxes, so that small negative dwt fluxes are set to 0. For more details, see ESCOMP#741 Resolves ESCOMP#741
Add CN prec. control call to fix problems related to small neg. values Small negative values (roughly roundoff-level different from zero) in frootc (and possibly other quantities) were occasionally creating problems with carbon isotope fluxes and FPI in the first time step of the year, at the time of transient landcover change. This tag fixes the problem by introducing an extra call to SoilBiogeochemPrecisionControl in between computing the patch-level transient landcover fluxes and moving these to column-level. In particular, this truncates small negative values of decomp_cpools_vr_col to zero, which prevents the previous blow-ups. For most of the problematic fields, the explanation seems to be: frootc can sometimes be negative; this is intentional. Negative frootc causes negative dwt_frootc_to_litter if the patch in question is shrinking. The resulting negative fluxes cause problems in the ciso calculation. This can be worked around by inserting an extra precision control call between the calculation of the dwt fluxes and the ciso fluxes, so that small negative dwt fluxes are set to 0. This does not necessarily fully explain the issue with FPI, but the insertion of the extra precision control call fixes that issue, too. For more details, see the discussion in ESCOMP#741 Resolves ESCOMP#741
Add CN prec. control call to fix problems related to small neg. values Small negative values (roughly roundoff-level different from zero) in frootc (and possibly other quantities) were occasionally creating problems with carbon isotope fluxes and FPI in the first time step of the year, at the time of transient landcover change. This tag fixes the problem by introducing an extra call to SoilBiogeochemPrecisionControl in between computing the patch-level transient landcover fluxes and moving these to column-level. In particular, this truncates small negative values of decomp_cpools_vr_col to zero, which prevents the previous blow-ups. For most of the problematic fields, the explanation seems to be: frootc can sometimes be negative; this is intentional. Negative frootc causes negative dwt_frootc_to_litter if the patch in question is shrinking. The resulting negative fluxes cause problems in the ciso calculation. This can be worked around by inserting an extra precision control call between the calculation of the dwt fluxes and the ciso fluxes, so that small negative dwt fluxes are set to 0. This does not necessarily fully explain the issue with FPI, but the insertion of the extra precision control call fixes that issue, too. For more details, see the discussion in ESCOMP#741 Resolves ESCOMP#741
When running a transient case with C isotopes, people occasionally ran into a problem whereby C13_HR, C13_NBP, FPI values result in numeric conversion not representable error. At least part of the problem can be explained as: frootc can sometimes be negative; this is intentional. Negative frootc causes negative dwt_frootc_to_litter if the patch in question is shrinking. The resulting negative fluxes cause problems in the ciso calculation. This can be worked around by inserting an extra precision control call between the calculation of the dwt fluxes and the ciso fluxes, so that small negative dwt fluxes are set to 0. For more details, see ESCOMP#741 Resolves ESCOMP#741
Add CN prec. control call to fix problems related to small neg. values Small negative values (roughly roundoff-level different from zero) in frootc (and possibly other quantities) were occasionally creating problems with carbon isotope fluxes and FPI in the first time step of the year, at the time of transient landcover change. This tag fixes the problem by introducing an extra call to SoilBiogeochemPrecisionControl in between computing the patch-level transient landcover fluxes and moving these to column-level. In particular, this truncates small negative values of decomp_cpools_vr_col to zero, which prevents the previous blow-ups. For most of the problematic fields, the explanation seems to be: frootc can sometimes be negative; this is intentional. Negative frootc causes negative dwt_frootc_to_litter if the patch in question is shrinking. The resulting negative fluxes cause problems in the ciso calculation. This can be worked around by inserting an extra precision control call between the calculation of the dwt fluxes and the ciso fluxes, so that small negative dwt fluxes are set to 0. This does not necessarily fully explain the issue with FPI, but the insertion of the extra precision control call fixes that issue, too. For more details, see the discussion in ESCOMP#741 Resolves ESCOMP#741
Add CN prec. control call to fix problems related to small neg. values Small negative values (roughly roundoff-level different from zero) in frootc (and possibly other quantities) were occasionally creating problems with carbon isotope fluxes and FPI in the first time step of the year, at the time of transient landcover change. This tag fixes the problem by introducing an extra call to SoilBiogeochemPrecisionControl in between computing the patch-level transient landcover fluxes and moving these to column-level. In particular, this truncates small negative values of decomp_cpools_vr_col to zero, which prevents the previous blow-ups. For most of the problematic fields, the explanation seems to be: frootc can sometimes be negative; this is intentional. Negative frootc causes negative dwt_frootc_to_litter if the patch in question is shrinking. The resulting negative fluxes cause problems in the ciso calculation. This can be worked around by inserting an extra precision control call between the calculation of the dwt fluxes and the ciso fluxes, so that small negative dwt fluxes are set to 0. This does not necessarily fully explain the issue with FPI, but the insertion of the extra precision control call fixes that issue, too. For more details, see the discussion in ESCOMP#741 Resolves ESCOMP#741
Brief summary of bug
C13_HR, C13_NBP, FPI values result in numeric conversion not representable error
General bug information
CTSM version you are using: relcisofix.n01_release-clm5.0.20
Does this bug cause significantly incorrect results in the model's science? Yes
Configurations affected: --compset BSSP585_BPRPcmip6 --res f09_g17
Details of bug
Original message from Keith Lindsay:
I'm running the esm-ssp585 CESM2 CMIP6 experiment.
The model is aborting when writing the monthly clm history files for 2046-01 with the error message
NetCDF: Numeric conversion not representable
Suspecting that the problem is that CLM is trying to write an r8 value that is outside of the range of r4 into a netCDF float(=r4),
I've rerun the last segment with hist_ndens=1, to get r8 output.
Sure enough, the run went past where the original run aborted, and there are 3 variables with values outside the range of r4 values.
All of the out of range values are at the same location:
(i,j)=(40,168) (lon,lat)=(50E,68.3N)
The variables and excessive values are:
C13_HR: -6.8144E+46
C13_NBP: 6.8144E+46
FPI: -1.4824+100
The largest r4 value is ~3.4028E+38
The following vars are also unusually large, but not outside r4 limits
C14_HR:-2.9915E+36
C14_NBP: 2.9915E+36
Values for HR and NBP at the same location are -1.837E-41 and -9.452E-18 respectively.
pretty small
Any suggestions on what I should do to further diagnose this and/or work around it?
The original case is
CASE=b.e21.BSSP585_BPRPcmip6.f09_g17.CMIP6-esm-ssp585.001
CASEROOT=/glade/work/cmip6/cases/C4MIP/b.e21.BSSP585_BPRPcmip6.f09_g17.CMIP6-esm-ssp585.001
RUNDIR=/glade/scratch/cmip6/b.e21.BSSP585_BPRPcmip6.f09_g17.CMIP6-esm-ssp585.001/run
My clone, where I changed hist_ndens and reran, is
CASE=b.e21.BSSP585_BPRPcmip6.f09_g17.CMIP6-esm-ssp585.001
CASEROOT=/glade/scratch/klindsay/temp_caseroots/b.e21.BSSP585_BPRPcmip6.f09_g17.CMIP6-esm-ssp585.001
RUNDIR=/glade/scratch/klindsay/b.e21.BSSP585_BPRPcmip6.f09_g17.CMIP6-esm-ssp585.001/run
The text was updated successfully, but these errors were encountered: