Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Macros fixes and diag_type fix #396

Merged
merged 6 commits into from
Jan 17, 2020
Merged

Conversation

apcraig
Copy link
Contributor

@apcraig apcraig commented Jan 17, 2020

PR checklist

  • Short (1 sentence) summary of your PR:
    Fix Macros files, threading was not active on a few platforms including cheyenne.
    Update run status output based on diag_type setting
  • Developer(s):
    apcraig
  • Suggest PR reviewers from list in the column to the right.
  • Please copy the PR test results link or provide a summary of testing completed below.
    Full test suite on cheyenne, will post results when done
  • How much do the PR code changes differ from the unmodified code?
    • bit for bit
    • different at roundoff level
    • more substantial
      For threaded tests, answers change on some platforms, namely cheyenne
  • Does this PR create or have dependencies on Icepack or any other models?
    • Yes
    • No
  • Does this PR add any new test cases?
    • Yes
    • No
  • Is the documentation being updated? ("Documentation" includes information on the wiki or in the .rst files from doc/source/, which are used to create the online technical docs at https://readthedocs.org/projects/cice-consortium-cice/. A test build of the technical docs will be performed as part of the PR testing.)
    • Yes
    • No, does the documentation need to be updated at a later time?
      • Yes
      • No
  • Please provide any additional information or relevant details below:

Surprised we didn't find this error sooner!
Changes answers for some threaded cases. Machines affected are

  • cheyenne
  • cesium
  • fram
  • millikan
  • phase2
  • phase3

Other Macros changes don't affect answers.
Partly address CICE #344

@apcraig
Copy link
Contributor Author

apcraig commented Jan 17, 2020

I will post test results when complete. I have already run the suite a few times on cheyenne, just doing a final test. I expect everything to pass but answers to change for threaded cases. Most machine unaffected.

I will also merge this on Friday afternoon. This is a significant bug that we need to fix asap, especially with the user meeting coming up and needing to work on cheyenne. I want this in the weekend testing.

Copy link
Contributor

@duvivier duvivier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's also add the documentation for ICE_RUNLENGTH to user_guide/ug_case_settings.rst

The default is -1, which isn't shown in the table, but is 15 minutes. The value of zero should be 29 minutes. (see cice.batch.csh)

I think all the ICE_IOTYPE stuff is accurate in the documentation now with these fixes.

@apcraig
Copy link
Contributor Author

apcraig commented Jan 17, 2020

Test results are in
https://github.com/CICE-Consortium/Test-Results/wiki/5bbe517b77.cheyenne.intel.20-01-17.012453
Turning on threading sure does destroy performance on cheyenne. We may need to rethink the cheyenne test suite and just run the nothread_suite. Pretty surprising. That's something we can think about after this PR.

@apcraig
Copy link
Contributor Author

apcraig commented Jan 17, 2020

I just updated the documentation. I think it's as suggested.

Copy link
Contributor

@duvivier duvivier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me!

Copy link
Member

@phil-blain phil-blain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All changes look good to me, but I think we could take the opportunity to clean up the unneeded logic in the Macros files I mentioned (that's my error: the 4 machines are our ECCC machines and when I created the Macros I didn't understand the build process as well as I do now).

Regarding #344 , I did some tests and the following change to cice.run.setup.csh seem to correctly grep the diag_file :

diff --git i/configuration/scripts/cice.run.setup.csh w/configuration/scripts/cice.run.setup.csh
index 8030324..80863c7 100755
--- i/configuration/scripts/cice.run.setup.csh
+++ w/configuration/scripts/cice.run.setup.csh
@@ -47,6 +47,11 @@ setenv OMP_NUM_THREADS ${nthrds}
 
 cp -f \${ICE_CASEDIR}/ice_in \${ICE_RUNDIR}
 set diagtype = \`grep -i diag_type \${ICE_CASEDIR}/ice_in | grep -i stdout | wc -l\`
+set diagfile = \`grep -i diag_file \${ICE_CASEDIR}/ice_in | sed -e "s/.* = '\(.*\)'/\1/"\`
+
+# Remove any existing diag_file from a previous run
+if ( \${diagtype} == 0) then
+  rm -f \${diagfile}
+endif
 
 echo " "
 echo "CICE rundir is \${ICE_RUNDIR}"
@@ -76,17 +81,16 @@ if !(-d \${ICE_LOGDIR}) mkdir -p \${ICE_LOGDIR}
 cp -p \${ICE_RUNLOG_FILE} \${ICE_LOGDIR}
 
 if ( \${diagtype} > 0) then
-  grep ' CICE COMPLETED SUCCESSFULLY' \${ICE_RUNLOG_FILE}
-  if ( \$status != 0 ) then
-    echo "CICE run did not complete - see \${ICE_LOGDIR}/\${ICE_RUNLOG_FILE}"
-    echo "\`date\` \${0}: \${ICE_CASENAME} run did NOT complete \${ICE_RUNLOG_FILE}"  >> \${ICE_CASEDIR}/README.case
-    exit -1
-  endif
+  set success_file = \${ICE_RUNLOG_FILE}
 else
-  echo "\`date\` \${0}: \${ICE_CASENAME} run completed "
-  echo "CICE NOT run with diag_type='stdout', run status unknown"
-  echo "\`date\` \${0}: \${ICE_CASENAME} run completed "  >> \${ICE_CASEDIR}/README.case
-  echo "\`date\` \${0}: \${ICE_CASENAME} NOT run with diag_type='stdout', run status unknown"  >> \${ICE_CASEDIR}/README.case
+  set success_file = \${diagfile}
+endif
+
+grep ' CICE COMPLETED SUCCESSFULLY' \${success_file}
+
+if ( \$status != 0 ) then
+  echo "CICE run did not complete - see \${ICE_LOGDIR}/\${ICE_RUNLOG_FILE}"
+  echo "\`date\` \${0}: \${ICE_CASENAME} run did NOT complete \${ICE_RUNLOG_FILE}"  >> \${ICE_CASEDIR}/README.case
   exit -1
 endif

but I don't feel strongly about this, as long as the path to the log file is easily copy-paste-able.

Comment on lines 76 to 78
ifeq ($(ICE_IOTYPE), netcdf)
CPPDEFS := $(CPPDEFS) -Dncdf
endif
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just realized this conditional can simply be deleted as -Dncdf is set in cice.build !

Comment on lines 72 to 73

CPPDEFS := $(CPPDEFS) -Dncdf
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just realized these lines can simply be deleted as -Dncdf is set in cice.build !

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if (${ICE_IOTYPE} == 'netcdf') then
set IODIR = io_netcdf
setenv ICE_CPPDEFS "${ICE_CPPDEFS} -Dncdf"
else if (${ICE_IOTYPE} == 'pio') then
set IODIR = io_pio
setenv ICE_CPPDEFS "${ICE_CPPDEFS} -Dncdf"
else
set IODIR = io_binary
endif

Comment on lines 70 to 72
ifeq ($(ICE_IOTYPE), netcdf)
CPPDEFS := $(CPPDEFS) -Dncdf
endif
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same thing, these lines can be deleted

Comment on lines 66 to 67

CPPDEFS := $(CPPDEFS) -Dncdf
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same thing, these lines can be deleted

Comment on lines 66 to 67

CPPDEFS := $(CPPDEFS) -Dncdf
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same thing, these lines can be deleted

Comment on lines 70 to 72
ifeq ($(ICE_IOTYPE), netcdf)
CPPDEFS := $(CPPDEFS) -Dncdf
endif
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same thing, these lines can be deleted

Comment on lines 66 to 67

CPPDEFS := $(CPPDEFS) -Dncdf
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same thing, these lines can be deleted

Comment on lines 70 to 72
ifeq ($(ICE_IOTYPE), netcdf)
CPPDEFS := $(CPPDEFS) -Dncdf
endif
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same thing, these lines can be deleted

endif
else
echo "\`date\` \${0}: \${ICE_CASENAME} run completed "
echo "CICE NOT run with diag_type='stdout', run status unknown"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
echo "CICE NOT run with diag_type='stdout', run status unknown"
echo "CICE NOT run with diag_type='stdout', run status unknown - see \${ICE_LOGDIR}/\${ICE_RUNLOG_FILE}"

I think in that case it would be nice to also output the path to the log file for easy (<middle click path> + vi <middle click>) access

@apcraig
Copy link
Contributor Author

apcraig commented Jan 17, 2020

Thanks @phil-blain, several good ideas there. I'll try to clean up some of that.

One thing is that if the diag_type is not set to stdout, it really needs to return a negative exit code. We could grab the filename from ice_in and check whether the run successfully completed, but all the rest of the stuff that is part of the test script cannot work unless diag_type is stdout. That could be fixed too, but that would be a bunch more work. Plus I don't see the need. All testing can be done with "stdout". I would also argue that using diag_type='file' is not a great idea in general. As you point out, it will continually overwrite (or append) the prior output file. The current implementation, leveraging a date stamp in the ice log file set by the scripts ensures each output file is a unique name which I think is a nice feature.

Finally, I think I figured out the performance issue with threading on cheyenne. We need omplace on job launch. I'm testing that too. I should have some updated commits soon.

@phil-blain
Copy link
Member

Yes I agree with all these considerations. That's why I don't feel too strongly about the diag_file one; I only use it for cases and not test/suites.
I don't see the need either to adapt the testing architecture.

So I'm glad with the changes you're proposing in this PR, plus outputting the path to the log file in the diag_file case.

@apcraig
Copy link
Contributor Author

apcraig commented Jan 17, 2020

Just pushed an update. cheyenne timing is much better now. https://github.com/CICE-Consortium/Test-Results/wiki/af3d93e40c.cheyenne.intel.20-01-17.183453. All tests pass and most are faster. Many regression comparisons fail (as expected).

Also cleaned up the scripts a bit and update the run checking as guided above.

Would like to merge the PR today if possible.

Copy link
Contributor

@eclare108213 eclare108213 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I can tell, this does not affect how a user interacts with the scripts. Have you tested that the success/fail changes all work as expected? If you are all happy with these changes, then I'm good with them too. Thanks for fixing this!

@apcraig
Copy link
Contributor Author

apcraig commented Jan 17, 2020

@eclare108213, usage doesn't change. I have tested the updated scripts with diag_type = stdout and diag_type = file and it seems to work as designed.

@apcraig
Copy link
Contributor Author

apcraig commented Jan 17, 2020

Not sure why readthedocs is failing right now. Have tried to run it twice. It was fine before and my recent changes did not affect documentation. I think there is a problem with readthedocs, will have to see whether it's temporary or something that we'll need to fix. Here is the version from #2078b884 which was after my doc change, https://readthedocs.org/projects/cice-consortium-cice/builds/10275392/

@apcraig
Copy link
Contributor Author

apcraig commented Jan 17, 2020

Read the docs is working again. I am going to merge this PR now.

@apcraig apcraig merged commit 5d6b26c into CICE-Consortium:master Jan 17, 2020
phil-blain added a commit to phil-blain/Icepack that referenced this pull request Feb 6, 2020
These two files used outdated environment variables for
threading ("compile_threaded") and PIO ("IO_TYPE").

Replace these with the correct variables, ICE_THREADED and
ICE_IOTYPE.

Mirrors CICE-Consortium/CICE#396
phil-blain added a commit to phil-blain/Icepack that referenced this pull request Feb 6, 2020
These two files used outdated environment variables for
threading ("compile_threaded") and PIO ("IO_TYPE").

Replace these with the correct variables, ICE_THREADED and
ICE_IOTYPE.

Mirrors CICE-Consortium/CICE#396
phil-blain added a commit to phil-blain/Icepack that referenced this pull request Feb 7, 2020
These two files used outdated environment variables for
threading ("compile_threaded") and PIO ("IO_TYPE").

Replace these with the correct variables, ICE_THREADED and
ICE_IOTYPE.

Mirrors CICE-Consortium/CICE#396
apcraig pushed a commit to CICE-Consortium/Icepack that referenced this pull request Feb 14, 2020
#296)

* machines: add conda environment for macOS and Linux

Closes #294

* Makefile: remove uneeded (and faulty) logic around CFLAGS_HOST

The CFLAG_HOST variable was added in #257, and
some Makefile logic was added to define the variable to be blank if
the included Macros file does not define it.

However, the Make syntax is wrong, as 'ifndef' takes a variable name and
not
a reference to a variable [1], so

    ifndef $(CFLAGS_HOST)

should have been written

    ifndef CFLAGS_HOST

The effect of this error is to invert the logic of the check (!) since
if CFLAGS_HOST is defined, Make will check if a variable with name equal
to whatever CFLAGS_HOST is defined to be exists, which will most probably
be false, and the conditional will then evaluate to true and CFLAG_HOSTS
will be redefined to be blank, defeating the whole purpose of the flag.

Since there's no harm in Make referencing and empty variable, fix this
by just removing the conditional check.

[1] https://www.gnu.org/software/make/manual/html_node/Conditional-Syntax.html

* doc: add documentation for conda environment

* machines: fix Macros file for cheyenne and testmachine

These two files used outdated environment variables for
threading ("compile_threaded") and PIO ("IO_TYPE").

Replace these with the correct variables, ICE_THREADED and
ICE_IOTYPE.

Mirrors CICE-Consortium/CICE#396

* scripts: implement '-nomodules' logic for env files

Some env files wrap the calls for loading the build and run environment
in a conditional based on the argument `-nomodules`, so that
scripts that source the env file just to get the variables defined in it
do not run slower because they have to load the environment.

This logic was copied from CICE but was not implemented in the Icepack
scripts icepack.setup and setup_run_dirs.csh.

Add the argument '-nomodules' so that the logic becomes effective.

* doc: conda: add note about csh/tcsh

On Ubuntu and its derivatives, the csh shell at
/bin/csh, which is used in the shebangs of the Icepack scripts,
is not compatible with conda.

Mention that in the documentation and explain how to install tcsh as an
alternative, and configure the system such that /bin/csh points to
/bin/tcsh.
@apcraig apcraig deleted the threadfix branch August 17, 2022 21:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants