Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ERS_D_Ld5.f19_g16.I2000Clm50BgcCruGs run FAIL (intel) #322

Closed
rgknox opened this issue Mar 14, 2018 · 13 comments
Closed

ERS_D_Ld5.f19_g16.I2000Clm50BgcCruGs run FAIL (intel) #322

rgknox opened this issue Mar 14, 2018 · 13 comments
Assignees
Labels
bfb bit-for-bit bug something is working incorrectly

Comments

@rgknox
Copy link
Collaborator

rgknox commented Mar 14, 2018

I generated this FAIL as part of the FATES developer test suite. It seems to finish initializing MOSART, and then fails while initializing luna?

To reproduce:
Use the intel compiler ('m using Lawrencium, lr3 partition, intel/2016.4.072).
./create_test ERS_D_Ld5.f19_g16.I2000Clm50BgcCruGs.${MACH}.clm-default

Error:

MOSART decomp info proc =        28 begr =    226801 endr =    234900 numr =      8100
MOSART decomp info proc =        29 begr =    234901 endr =    243000 numr =      8100
MOSART decomp info proc =        30 begr =    243001 endr =    251100 numr =      8100
MOSART decomp info proc =        31 begr =    251101 endr =    259200 numr =      8100
forrtl: error (182): floating invalid - possible uninitialized real/complex variable.
Image              PC                Routine            Line        Source             
cesm.exe           0000000003732E8D  Unknown               Unknown  Unknown
cesm.exe           0000000003730D27  Unknown               Unknown  Unknown
cesm.exe           00000000036D3664  Unknown               Unknown  Unknown
cesm.exe           00000000036D3476  Unknown               Unknown  Unknown
cesm.exe           00000000036565E6  Unknown               Unknown  Unknown
cesm.exe           00000000036613B7  Unknown               Unknown  Unknown
Unknown            00002ACB9E66F5E0  Unknown               Unknown  Unknown
cesm.exe           00000000016AA809  lunamod_mp_nitrog         851  LunaMod.F90
cesm.exe           0000000001691514  lunamod_mp_update         368  LunaMod.F90
cesm.exe           00000000010B0402  canopyfluxesmod_m        1321  CanopyFluxesMod.F90
cesm.exe           00000000008623A3  clm_driver_mp_clm         543  clm_driver.F90
cesm.exe           0000000000828ED8  lnd_comp_mct_mp_l         451  lnd_comp_mct.F90
cesm.exe           000000000046A61E  component_mod_mp_         728  component_mod.F90
cesm.exe           000000000043A167  cime_comp_mod_mp_        2650  cime_comp_mod.F90
cesm.exe           0000000000452755  MAIN__                    103  cime_driver.F90
cesm.exe           000000000041211E  Unknown               Unknown  Unknown
libc.so.6          00002ACB9E89DC05  Unknown               Unknown  Unknown
cesm.exe           0000000000412029  Unknown               Unknown  Unknown

Tested Hash: fc7d5c2 (based off of f167e9c with only test generation modifications).

Code:

Looks like the offending line is 851 of biogeophys/LunaMod.F90
https://github.com/ESCOMP/ctsm/blob/master/src/biogeophys/LunaMod.F90#L851

@rgknox
Copy link
Collaborator Author

rgknox commented Mar 14, 2018

I'm noticing that the other "old" terms are initialized/zero'd prior to the call of NitrogenAllocation() on line 368, except for the PNstoreold, which may be the offending uninitialized term.

edit: PNStoreold is not initialized as far as I can tell.

pinging @xuchongang

@billsacks billsacks added the bug something is working incorrectly label Mar 14, 2018
@ekluzek
Copy link
Collaborator

ekluzek commented Apr 13, 2018

I tried this on cheyenne and it worked there. @rgknox what version of intel is lawrencium running? I'll try it on hobart also...

PASS ERS_D_Ld5.f19_g16.I2000Clm50BgcCruGs.cheyenne_intel.clm-default CREATE_NEWCASE
PASS ERS_D_Ld5.f19_g16.I2000Clm50BgcCruGs.cheyenne_intel.clm-default XML
PASS ERS_D_Ld5.f19_g16.I2000Clm50BgcCruGs.cheyenne_intel.clm-default SETUP
PASS ERS_D_Ld5.f19_g16.I2000Clm50BgcCruGs.cheyenne_intel.clm-default SHAREDLIB_BUILD time=173
PASS ERS_D_Ld5.f19_g16.I2000Clm50BgcCruGs.cheyenne_intel.clm-default MODEL_BUILD time=115
PASS ERS_D_Ld5.f19_g16.I2000Clm50BgcCruGs.cheyenne_intel.clm-default SUBMIT
PASS ERS_D_Ld5.f19_g16.I2000Clm50BgcCruGs.cheyenne_intel.clm-default RUN time=315
PASS ERS_D_Ld5.f19_g16.I2000Clm50BgcCruGs.cheyenne_intel.clm-default COMPARE_base_rest
PASS ERS_D_Ld5.f19_g16.I2000Clm50BgcCruGs.cheyenne_intel.clm-default MEMLEAK insuffiencient data for memleak test

@billsacks
Copy link
Member

@rgknox is this still an issue?

@rgknox
Copy link
Collaborator Author

rgknox commented Oct 3, 2018

As far as I can tell, PNStoreold, is still uninitialized. This is still generating uninitialized errors on lawrencium. My test is on fates_next_api branch, which delayed behind master, but I checked the code in master and from all accounts the variable simply isn't given a value, anywhere. (can anyone confirm?)

The offending test is: ERS_D_Ld5.f19_g16.I2000Clm50BgcCruGs

I checked the machine file on lawrencium-lr3, we don't specify any zany compile options during debug.

Module info: intel/2016.4.072

@billsacks
Copy link
Member

I agree that that looks like a bug. However, it looks like the offending line, and thus the offending PNStoreold argument, could simply be removed. The offending line is:

https://github.com/ESCOMP/ctsm/blob/a34404419aaa81f5021602963222c50daa10c0f6/src/biogeophys/LunaMod.F90#L851

but that appears to be overridden here before Nstore is referenced:

https://github.com/ESCOMP/ctsm/blob/a34404419aaa81f5021602963222c50daa10c0f6/src/biogeophys/LunaMod.F90#L885

I checked the logic of the while loop, and it looks like the loop will always be entered at least once. So I think that we could just remove the offending line and the PNStoreold argument to that routine.

@ekluzek do you think we should run this by someone else for confirmation?

@billsacks
Copy link
Member

If nobody objects to my fix, I'll fold it in to an upcoming tag.

@billsacks billsacks self-assigned this Oct 5, 2018
@ekluzek
Copy link
Collaborator

ekluzek commented Oct 5, 2018

@billsacks that seems to be correct to me as well. @wwieder could you take a look at this as well? If possible we could contact Bardan and/or Chonggang as well. @wwieder should we bother with contacting them? This came in with the original version in clm4_5_1_r120. If it doesn't change answers it seems like we should go forward with this. Since, Nstore is local I don't see how it could change answers, but sometimes there's some strange interaction that you don't catch simply by looking at it.

@rgknox
Copy link
Collaborator Author

rgknox commented Oct 5, 2018

I see it the same way you do @billsacks. While I don't know that code well, logically speaking, Nstore is overwritten later in the call-sequence before it is used anyway.

@wwieder
Copy link
Contributor

wwieder commented Oct 8, 2018 via email

@ekluzek
Copy link
Collaborator

ekluzek commented Oct 8, 2018

Hi @wwieder I'm wondering if you can take a look at LunaMod.F90 specifically, the calculation of Nstore in subroutine NitrogenAllocation. It gets set near the top and then changed inside the "do while" loop. I suppose if the "do while" isn't executed you would need that initial setting, so maybe that's why it's there. But, why is it different than the other calculation? And is it possible for the do while NOT to be executed? Another question is if we need to bring someone else into the conversation?

@wwieder
Copy link
Contributor

wwieder commented Oct 8, 2018 via email

@xuchongang
Copy link

This does seems a bug to me. There are too solutions to me. The first one is to delete the line as suggested by @billsacks and the second one is to put a initialization to set PNstoreold=0 at https://github.com/ESCOMP/ctsm/blob/b0439495b404c37ca8ef7c3e5252485c2d35e034/src/biogeophys/LunaMod.F90#L363.

@billsacks
Copy link
Member

Thanks for the reply. I'll go ahead and delete the line since, from my and @rgknox 's analysis, it isn't needed.

billsacks added a commit that referenced this issue Oct 26, 2018
Miscellaneous minor, bit-for-bit bug fixes

Four miscellaneous minor, bit-for-bit bug fixes:

(1) Py3 pylint check and address cime issue ESMCI/cime#2822 (from Jim
    Edwards: #526)

(2) Change uppercase DEBUG variables to lowercase debug (requested by
    Jim Edwards to avoid conflicting with the DEBUG CPP token)
    (Fixes #534)

(3) Remove unnecessary line of code in LunaMod.F90 that was causing
    problems with some compilers due to an uninitialized variable
    (Fixes #322)

(4) Add r8 to 0 constant to fix build issue with XLF compiler (from Jim
    Edwards: #531)
billsacks added a commit to billsacks/ctsm that referenced this issue Oct 28, 2018
This initial setting of Nstore was problematic because it referenced the
uninitialized PNstoreold variable. From some analysis, it looks like
Nstore is always overwritten before it's referenced, so it's safe to
just remove this line, along with the now-unnecessary PNstoreold
subroutine argument.

Fixes ESCOMP#322
billsacks added a commit that referenced this issue Oct 29, 2018
From dev014 & dev015: CMIP6 compset modifiers, output usermods & fixes

Bring in all changes from ctsm1.0.dev014 and ctsm1.0.dev015:

From ctsm1.0.dev015:

(1) Support %BGC-CROP-CMIP6DECK and %BGC-CROP-CMIP6WACCMDECK compset
    modifiers, so that we can turn on the necessary options
    (output-related and others) via new CMIP6-specific compsets.

(2) Turn on carbon isotopes in CMIP6 runs (from Erik Kluzek)

(3) Remove setting of CCSM_BGC=CO2A in the cmip6 usermods

(4) Add usermods directories for getting typical extra output that's
    wanted in many cases: output_crop, output_crop_highfreq, output_bgc,
    output_bgc_highfreq, output_sp, and output_sp_highfreq. These can be
    enabled by adding something like '--user-mods-dir output_crop' on
    the create_newcase line (that short-hand works for an I compset; for
    F or B compsets, you need to provide the full path to the usermod
    directory).

(4) Allow holes in the number of history tapes. Holes are cases where,
    for example, we have h0, h1 and h3 tapes, but no h2 tape (because
    there are no fields on the h2 tape). (This is needed for (3).)

(5) Fix reading and writing of 1-d logical global arrays. This fixes
    #24 for real (rather than just preventing an attempt to
    read/write 1-d logical arrays, as was done in the previous 'fix').

(6) Add C13_NBP and C14_NBP diagnostic fields (from Keith Oleson)

(7) Make a bunch of carbon isotope diagnostic fields inactive by default

(8) Don't allow interpolation (use_init_interp) from a case without
    carbon isotopes to a case with carbon isotopes: Due to
    #67, interpolation from a case
    without carbon isotopes to a case with carbon isotopes yields
    incorrect initialization values for the carbon isotopes. Now that
    we're turning carbon isotopes on via some semi-out-of-the-box
    usermods (for cmip6), it is becoming more important to check to make
    sure someone doesn't shoot themselves in the foot this way.

(9) Add tests of the new output usermods as well as of the CMIP6 compset
    modifiers

From ctsm1.0.dev014: Four miscellaneous minor, bit-for-bit bug fixes:

(1) Py3 pylint check and address cime issue ESMCI/cime#2822 (from Jim
    Edwards: #526)

(2) Change uppercase DEBUG variables to lowercase debug (requested by
    Jim Edwards to avoid conflicting with the DEBUG CPP token)
    (Fixes #534)

(3) Remove unnecessary line of code in LunaMod.F90 that was causing
    problems with some compilers due to an uninitialized variable
    (Fixes #322)

(4) Add r8 to 0 constant to fix build issue with XLF compiler (from Jim
    Edwards: #531)
billsacks added a commit to billsacks/ctsm that referenced this issue Feb 22, 2019
This initial setting of Nstore was problematic because it referenced the
uninitialized PNstoreold variable. From some analysis, it looks like
Nstore is always overwritten before it's referenced, so it's safe to
just remove this line, along with the now-unnecessary PNstoreold
subroutine argument.

Fixes ESCOMP#322
billsacks added a commit to billsacks/ctsm that referenced this issue Feb 22, 2019
Miscellaneous minor, bit-for-bit bug fixes

Four miscellaneous minor, bit-for-bit bug fixes:

(1) Py3 pylint check and address cime issue ESMCI/cime#2822 (from Jim
    Edwards: ESCOMP#526)

(2) Change uppercase DEBUG variables to lowercase debug (requested by
    Jim Edwards to avoid conflicting with the DEBUG CPP token)
    (Fixes ESCOMP#534)

(3) Remove unnecessary line of code in LunaMod.F90 that was causing
    problems with some compilers due to an uninitialized variable
    (Fixes ESCOMP#322)

(4) Add r8 to 0 constant to fix build issue with XLF compiler (from Jim
    Edwards: ESCOMP#531)
AGonzalezNicolas pushed a commit to HPSCTerrSys/clm5_0 that referenced this issue Jun 27, 2024
This initial setting of Nstore was problematic because it referenced the
uninitialized PNstoreold variable. From some analysis, it looks like
Nstore is always overwritten before it's referenced, so it's safe to
just remove this line, along with the now-unnecessary PNstoreold
subroutine argument.

Fixes ESCOMP#322
AGonzalezNicolas pushed a commit to HPSCTerrSys/clm5_0 that referenced this issue Jun 27, 2024
From dev014 & dev015: CMIP6 compset modifiers, output usermods & fixes

Bring in all changes from ctsm1.0.dev014 and ctsm1.0.dev015:

From ctsm1.0.dev015:

(1) Support %BGC-CROP-CMIP6DECK and %BGC-CROP-CMIP6WACCMDECK compset
    modifiers, so that we can turn on the necessary options
    (output-related and others) via new CMIP6-specific compsets.

(2) Turn on carbon isotopes in CMIP6 runs (from Erik Kluzek)

(3) Remove setting of CCSM_BGC=CO2A in the cmip6 usermods

(4) Add usermods directories for getting typical extra output that's
    wanted in many cases: output_crop, output_crop_highfreq, output_bgc,
    output_bgc_highfreq, output_sp, and output_sp_highfreq. These can be
    enabled by adding something like '--user-mods-dir output_crop' on
    the create_newcase line (that short-hand works for an I compset; for
    F or B compsets, you need to provide the full path to the usermod
    directory).

(4) Allow holes in the number of history tapes. Holes are cases where,
    for example, we have h0, h1 and h3 tapes, but no h2 tape (because
    there are no fields on the h2 tape). (This is needed for (3).)

(5) Fix reading and writing of 1-d logical global arrays. This fixes
    ESCOMP#24 for real (rather than just preventing an attempt to
    read/write 1-d logical arrays, as was done in the previous 'fix').

(6) Add C13_NBP and C14_NBP diagnostic fields (from Keith Oleson)

(7) Make a bunch of carbon isotope diagnostic fields inactive by default

(8) Don't allow interpolation (use_init_interp) from a case without
    carbon isotopes to a case with carbon isotopes: Due to
    ESCOMP#67, interpolation from a case
    without carbon isotopes to a case with carbon isotopes yields
    incorrect initialization values for the carbon isotopes. Now that
    we're turning carbon isotopes on via some semi-out-of-the-box
    usermods (for cmip6), it is becoming more important to check to make
    sure someone doesn't shoot themselves in the foot this way.

(9) Add tests of the new output usermods as well as of the CMIP6 compset
    modifiers

From ctsm1.0.dev014: Four miscellaneous minor, bit-for-bit bug fixes:

(1) Py3 pylint check and address cime issue ESMCI/cime#2822 (from Jim
    Edwards: ESCOMP#526)

(2) Change uppercase DEBUG variables to lowercase debug (requested by
    Jim Edwards to avoid conflicting with the DEBUG CPP token)
    (Fixes ESCOMP#534)

(3) Remove unnecessary line of code in LunaMod.F90 that was causing
    problems with some compilers due to an uninitialized variable
    (Fixes ESCOMP#322)

(4) Add r8 to 0 constant to fix build issue with XLF compiler (from Jim
    Edwards: ESCOMP#531)
AGonzalezNicolas pushed a commit to HPSCTerrSys/clm5_0 that referenced this issue Jul 5, 2024
This initial setting of Nstore was problematic because it referenced the
uninitialized PNstoreold variable. From some analysis, it looks like
Nstore is always overwritten before it's referenced, so it's safe to
just remove this line, along with the now-unnecessary PNstoreold
subroutine argument.

Fixes ESCOMP#322
AGonzalezNicolas pushed a commit to HPSCTerrSys/clm5_0 that referenced this issue Jul 5, 2024
From dev014 & dev015: CMIP6 compset modifiers, output usermods & fixes

Bring in all changes from ctsm1.0.dev014 and ctsm1.0.dev015:

From ctsm1.0.dev015:

(1) Support %BGC-CROP-CMIP6DECK and %BGC-CROP-CMIP6WACCMDECK compset
    modifiers, so that we can turn on the necessary options
    (output-related and others) via new CMIP6-specific compsets.

(2) Turn on carbon isotopes in CMIP6 runs (from Erik Kluzek)

(3) Remove setting of CCSM_BGC=CO2A in the cmip6 usermods

(4) Add usermods directories for getting typical extra output that's
    wanted in many cases: output_crop, output_crop_highfreq, output_bgc,
    output_bgc_highfreq, output_sp, and output_sp_highfreq. These can be
    enabled by adding something like '--user-mods-dir output_crop' on
    the create_newcase line (that short-hand works for an I compset; for
    F or B compsets, you need to provide the full path to the usermod
    directory).

(4) Allow holes in the number of history tapes. Holes are cases where,
    for example, we have h0, h1 and h3 tapes, but no h2 tape (because
    there are no fields on the h2 tape). (This is needed for (3).)

(5) Fix reading and writing of 1-d logical global arrays. This fixes
    ESCOMP#24 for real (rather than just preventing an attempt to
    read/write 1-d logical arrays, as was done in the previous 'fix').

(6) Add C13_NBP and C14_NBP diagnostic fields (from Keith Oleson)

(7) Make a bunch of carbon isotope diagnostic fields inactive by default

(8) Don't allow interpolation (use_init_interp) from a case without
    carbon isotopes to a case with carbon isotopes: Due to
    ESCOMP#67, interpolation from a case
    without carbon isotopes to a case with carbon isotopes yields
    incorrect initialization values for the carbon isotopes. Now that
    we're turning carbon isotopes on via some semi-out-of-the-box
    usermods (for cmip6), it is becoming more important to check to make
    sure someone doesn't shoot themselves in the foot this way.

(9) Add tests of the new output usermods as well as of the CMIP6 compset
    modifiers

From ctsm1.0.dev014: Four miscellaneous minor, bit-for-bit bug fixes:

(1) Py3 pylint check and address cime issue ESMCI/cime#2822 (from Jim
    Edwards: ESCOMP#526)

(2) Change uppercase DEBUG variables to lowercase debug (requested by
    Jim Edwards to avoid conflicting with the DEBUG CPP token)
    (Fixes ESCOMP#534)

(3) Remove unnecessary line of code in LunaMod.F90 that was causing
    problems with some compilers due to an uninitialized variable
    (Fixes ESCOMP#322)

(4) Add r8 to 0 constant to fix build issue with XLF compiler (from Jim
    Edwards: ESCOMP#531)
@samsrabin samsrabin added simple bfb bit-for-bit labels Aug 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bfb bit-for-bit bug something is working incorrectly
Projects
None yet
Development

No branches or pull requests

6 participants