Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cam6_3_079: Fix problems with irreproducible results using variable resolution grids #666

Merged
merged 8 commits into from
Oct 14, 2022

Conversation

cacraigucar
Copy link
Collaborator

@cacraigucar cacraigucar commented Sep 29, 2022

Closes #631

@cacraigucar cacraigucar added the bug-fix This PR was created to fix a specific bug. label Sep 29, 2022
@cacraigucar cacraigucar added this to the CAM6.5 milestone Sep 29, 2022
@cacraigucar cacraigucar self-assigned this Sep 29, 2022
@cacraigucar cacraigucar marked this pull request as draft September 29, 2022 16:40
@cacraigucar cacraigucar added the answer changing answer changing tag label Oct 5, 2022
@nusbaume
Copy link
Collaborator

nusbaume commented Oct 5, 2022

Should this PR be taken out of draft mode? Also should this PR close issue #665 as well?

@cacraigucar cacraigucar marked this pull request as ready for review October 5, 2022 15:15
Externals.cfg Outdated
@@ -1,5 +1,5 @@
[ccs_config]
tag = ccs_config_cesm0.0.28
tag = ccs_config_cesm0.0.44
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should use tag ccs_config_cesm0.0.45. The ccs_config_cesm0.0.44 tag includes the -mismatch nag compiler flag for mpi-serial builds which we don't want. This was reverted in tag ccs_config_cesm0.0.45.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@cacraigucar
Copy link
Collaborator Author

Actually, I am not bringing in the updated grids - that will come in in a later tag, so I removed the reference to it

@cacraigucar cacraigucar changed the title Fix problems with irreproducible results using variable resolution grids cam6_3_079: Fix problems with irreproducible results using variable resolution grids Oct 5, 2022
@cacraigucar cacraigucar requested a review from fvitt October 5, 2022 19:49
@cacraigucar
Copy link
Collaborator Author

I have been having repeated attempts to get the test:  ERP_Ln9_Vnuopc.C96_C96_mg17.F2000climo.cheyenne_intel.cam-outfrq9s_mg3to run successfully using the testlist_cam.xml setup.

Here is a summary of what I've tried:

  • When I run using test_driver.sh (and the corresponding create_test), it has always failed.  This is 2 or 3 times using aux_cam and 3 or 4 times using a "test_cac" which I setup for just this one test.  Sometimes it breaks in run and sometimes it makes it to case2run and breaks there.  An example run is: /glade/scratch/cacraig/test_cac_20221013095258/ERP_Ln9_Vnuopc.C96_C96_mg17.F2000climo.cheyenne_intel.cam-outfrq9s_mg3.GC.test_cac_20221013095258/run
  • When I run using ./create_test ERP_Ln9_Vnuopc.C96_C96_mg17.F2000climo.cheyenne_intel.cam-outfrq9s_mg3 it works fine.  I've done this at least twice
  • In order to get a baseline I can store I ran the following command which ran successfully (basically the command that test_driver.sh generates and uses the testname instead of getting it via aux_cam):qcmd -- ./create_test --xml-machine cheyenne --retry 2 --xml-compiler intel --queue regular --test-root /glade/scratch/cacraig/test_cac_try01 --output-root /glade/scratch/cacraig/test_cac_try01 --project P93300606 --test-id test_cac_try01 --compare /glade/p/cesm/amwg/cesm_baselines/cam6_3_078 --generate /glade/scratch/cacraig/test_cac_try01/baselines ERP_Ln9_Vnuopc.C96_C96_mg17.F2000climo.cheyenne_intel.cam-outfrq9s_mg3
  • The full testsuite is: /glade/scratch/cacraig/aux_cam_20221011212558. This has the failing test, but everything else passing

@fischer-ncar
Copy link
Collaborator

@cacraigucar, the test being run in /glade/scratch/cacraig/aux_cam_20221011212558 is exceeding the wallclock of 10 minutes.
The test your running in /glade/scratch/cacraig/test_cac_try01 is using a wallclock of 12 hours. You can look at the test.ERP* files
in your run directory to see the "PBS: job killed: walltime 689 exceeded limit 600" message.

@cacraigucar
Copy link
Collaborator Author

@fischer-ncar - Thanks for the information. I always look at the TestStatus.log file and that information is not there. I'll make sure to add looking at the test.ERP* files as well. I've never had a job run out of time on me (at least that I am aware of).

@cacraigucar cacraigucar merged commit 96a314f into ESCOMP:cam_development Oct 14, 2022
@cacraigucar cacraigucar deleted the cam_fix_irrep_results branch October 14, 2022 15:06
@cacraigucar cacraigucar restored the cam_fix_irrep_results branch July 24, 2024 17:05
@cacraigucar cacraigucar deleted the cam_fix_irrep_results branch July 24, 2024 17:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
answer changing answer changing tag bug-fix This PR was created to fix a specific bug.
Projects
Status: Tag
Development

Successfully merging this pull request may close these issues.

5 participants