Add more unit tests of system_tests_compare_two, fix multisubmit bug #1858

billsacks · 2017-08-31T22:17:25Z

The multisubmit bug related to the indentation of a block: this was
mistakenly being done for both phases of a multi-submit, whereas it only
should have been done after the second
phase. #1830 had this correct, but the
indentation was incorrect in #1837,
which is what came to master.

Test suite: scripts_regression_tests on cheyenne
Test baseline: n/a
Test namelist changes: none
Test status: bit for bit

Fixes #1856

User interface changes?: none

Update gh-pages html (Y/N)?: N

Code review:

The multisubmit bug related to the indentation of a block: this was mistakenly being done for both phases of a multi-submit, whereas it only should have been done after the second phase. ESMCI#1830 had this correct, but the indentation was incorrect in ESMCI#1837, which is what came to master.

billsacks · 2017-08-31T22:19:30Z

@jedwards4b and @jgfouca - I realize that the change to system_tests_compare_two.py conflicts with @jedwards4b 's change in #1857 - I had already done this fix before I saw #1857 . As I'm about to mention in the review for #1857 , the indentation differs here compared with what @jedwards4b did there. Regardless of which one is accepted, the unique part of this PR is to add unit tests - including unit tests that caught the bug that this fixes.

jgfouca

Looks great. Glad to have good unit testing of this critical class.

jgfouca · 2017-09-01T16:40:07Z

@billsacks I approve. Since you're working on a fork, I'll let you resolve the conflict.

billsacks · 2017-09-01T19:52:23Z

Updated to latest master to resolve merge conflict. The changes were identical now on master and in my branch. So now the only diffs in this PR are in the addition of unit tests.

I ran scripts_regression_tests A_RunUnitTests and B_CheckCode on the latest version.

I'll merge this to master now.

…#1868) Centralize coll. of perf. data at NERSC and update NERSC syslog scripts a) Change SAVE_TIMING_DIR default at NERSC to a central location Currently the default location for SAVE_TIMING_DIR on Edison, Cori-Haswell, and Cori-KNL is /project/projectdirs/$PROJECT . There are a number of ACME-project allocations at NERSC, and it is advantageous for the performance data for all of these to be archived in a single location. Here this default is set to /project/projectdirs/acme . If the ACME model is run by someone not in the acme group and if this default is not changed in env_run.xml, then performance data archiving will be disabled. b) Change mach_syslog for Cori to start checkpointing earlier Currently the scripts for Cori-Haswell and Cori-KNL that monitor model progress do not start until the number of lines in acme.log exceeds the number of cores in the allocation nodes. This design was introduced when the process-to-core mapping was output to acme.log. This mapping output has since been disabled for these systems and the script often waits excessively long for jobs with large node counts. This commit changes these scripts to start after an empirically determined number of lines, attempting to start after the model output starts, thus after the list of MPICH environment variables is output. As this is emprically determined, it may need to be adjusted again in the future. c) Change mach_syslog for Edison to start checkpointing earlier Currently the script for Edison that monitors model progress does not start until the number of lines in acme.log exceeds the number of cores in the allocated nodes. This design was introduced when the process-to-core mapping was output to acme.log. As the number of cores can be larger (and potentially much larger) than the number of MPI processes when using OpenMP threading, the script often waits excessively long for jobs with large nodes counts when OpenMP threading is used. This commit changes this script to start after the length of acme.log exceeds the number of nodes. While not guaranteed to capture all of the process-to-core mapping, this change does guarantee that something is captured before the job ends. Note that this change is needed now because of the successful cleanup of acme.log, significantly shortening its length compared to that generated by earlier versions of the model. Fixes #1858 [BFB] P2-117

billsacks added 2 commits August 31, 2017 12:48

Fix pylint error

d68af28

billsacks added tp: system tests ty: Bug Fix labels Aug 31, 2017

billsacks assigned jgfouca Aug 31, 2017

billsacks requested review from jedwards4b and jgfouca August 31, 2017 22:17

billsacks mentioned this pull request Aug 31, 2017

Erp to restart tests #1857

Merged

jedwards4b approved these changes Aug 31, 2017

View reviewed changes

jgfouca approved these changes Sep 1, 2017

View reviewed changes

Merge branch 'master' into comparetwo_multisubmit_fix

0042730

billsacks merged commit 230f7dd into ESMCI:master Sep 1, 2017

billsacks deleted the comparetwo_multisubmit_fix branch September 1, 2017 19:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add more unit tests of system_tests_compare_two, fix multisubmit bug #1858

Add more unit tests of system_tests_compare_two, fix multisubmit bug #1858

billsacks commented Aug 31, 2017

billsacks commented Aug 31, 2017

jgfouca left a comment

jgfouca commented Sep 1, 2017

billsacks commented Sep 1, 2017 •

edited

Loading

Add more unit tests of system_tests_compare_two, fix multisubmit bug #1858

Add more unit tests of system_tests_compare_two, fix multisubmit bug #1858

Conversation

billsacks commented Aug 31, 2017

billsacks commented Aug 31, 2017

jgfouca left a comment

Choose a reason for hiding this comment

jgfouca commented Sep 1, 2017

billsacks commented Sep 1, 2017 • edited Loading

billsacks commented Sep 1, 2017 •

edited

Loading