Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FUNITCTSM test fails when run through run_sys_tests in upcoming ctsm5.1.dev120 #1972

Closed
billsacks opened this issue Mar 25, 2023 · 1 comment
Labels
testing additions or changes to tests

Comments

@billsacks
Copy link
Member

Brief summary of bug

Starting with the tag I'm about to make (ctsm5.1.dev120), the FUNITCTSM test will fail when run through run_sys_tests. I suspect a system issue. I will give some workarounds below.

General bug information

CTSM version you are using: ctsm5.1.dev120 (upcoming)

Does this bug cause significantly incorrect results in the model's science? No

Configurations affected: Just FUNITCTSM test when run through run_sys_tests

Details of bug

When running the aux_clm test suite through run_sys_tests, the test FUNITCTSM_P1x1.f10_f10_mg37.I2000Clm50Sp.cheyenne_intel fails with this error message:

ERROR: Command: 'cmake -DCONVERT_TO_MAKE=ON  -DOS=LINUX -DMACH=cheyenne -DCOMPILER=intel -DDEBUG=TRUE -DMPILIB=mpi-serial -Dcompile_threaded=FALSE -DCASEROOT=/glade/scratch/sacks/tests_0323-174135ch/FUNITCTSM_P1x1.f10_f10_mg37.I2000Clm50Sp.cheyenne_intel.GC.0323-174135ch_int/bld .' failed with error '-- The C compiler identification is Intel 19.1.0.20200306
-- The Fortran compiler identification is Intel 19.1.0.20200306
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - failed
-- Check for working C compiler: /glade/u/apps/ch/opt/ncarcompilers/0.5.0/intel/19.1.1/icc
-- Check for working C compiler: /glade/u/apps/ch/opt/ncarcompilers/0.5.0/intel/19.1.1/icc - broken
CMake Error at /glade/u/apps/ch/opt/cmake/3.18.2/share/cmake-3.18/Modules/CMakeTestCCompiler.cmake:66 (message):
  The C compiler

    "/glade/u/apps/ch/opt/ncarcompilers/0.5.0/intel/19.1.1/icc"

  is not able to compile a simple test program.

  It fails with the following output:

    Change Dir: /glade/scratch/sacks/tests_0323-174135ch/FUNITCTSM_P1x1.f10_f10_mg37.I2000Clm50Sp.cheyenne_intel.GC.0323-174135ch_int/bld/cmaketmp/CMakeFiles/CMakeTmp

    Run Build Command(s):/usr/bin/gmake cmTC_837a8/fast && /usr/bin/gmake  -f CMakeFiles/cmTC_837a8.dir/build.make CMakeFiles/cmTC_837a8.dir/build
    gmake[1]: Entering directory '/glade/scratch/sacks/tests_0323-174135ch/FUNITCTSM_P1x1.f10_f10_mg37.I2000Clm50Sp.cheyenne_intel.GC.0323-174135ch_int/bld/cmaketmp/CMakeFiles/CMakeTmp'
    Building C object CMakeFiles/cmTC_837a8.dir/testCCompiler.c.o
    /glade/u/apps/ch/opt/ncarcompilers/0.5.0/intel/19.1.1/icc    -o CMakeFiles/cmTC_837a8.dir/testCCompiler.c.o -c /glade/scratch/sacks/tests_0323-174135ch/FUNITCTSM_P1x1.f10_f10_mg37.I2000Clm50Sp.cheyenne_intel.GC.0323-174135ch_int/bld/cmaketmp/CMakeFiles/CMakeTmp/testCCompiler.c
    Linking C executable cmTC_837a8
    /glade/u/apps/ch/opt/cmake/3.18.2/bin/cmake -E cmake_link_script CMakeFiles/cmTC_837a8.dir/link.txt --verbose=1
    /glade/u/apps/ch/opt/ncarcompilers/0.5.0/intel/19.1.1/icc CMakeFiles/cmTC_837a8.dir/testCCompiler.c.o -o cmTC_837a8
    /usr/lib64/gcc/x86_64-suse-linux/4.8/../../../../x86_64-suse-linux/bin/ld: cannot find -lxml2
    CMakeFiles/cmTC_837a8.dir/build.make:105: recipe for target 'cmTC_837a8' failed
    gmake[1]: *** [cmTC_837a8] Error 1
    gmake[1]: Leaving directory '/glade/scratch/sacks/tests_0323-174135ch/FUNITCTSM_P1x1.f10_f10_mg37.I2000Clm50Sp.cheyenne_intel.GC.0323-174135ch_int/bld/cmaketmp/CMakeFiles/CMakeTmp'
    Makefile:140: recipe for target 'cmTC_837a8/fast' failed
    gmake: *** [cmTC_837a8/fast] Error 2





  CMake will not be able to correctly generate this project.
Call Stack (most recent call first):
  CMakeLists.txt:3 (project)


-- Configuring incomplete, errors occurred!

I have tracked the error down to the interaction of two things:

  • An update in the netcdf module from 4.7.4 to 4.9.0
  • Building the unit tests through the system test framework on cheyenne's regular queue

In particular, this issue can be reproduced as follows:

  1. Checkout ctsm5.1.dev119
  2. Run manage_externals/checkout_externals
  3. Update the intel mpi-serial netcdf module from 4.7.4 to 4.9.0 with this diff in the ccs_config directory (this change is part of the diffs that come in with ctsm5.1.dev120):
diff --git a/machines/config_machines.xml b/machines/config_machines.xml
index 06d5d52..b1550a0 100644
--- a/machines/config_machines.xml
+++ b/machines/config_machines.xml
@@ -898,7 +898,7 @@ This allows using a different mpirun command to launch unit tests
         <command name="load">netcdf/4.7.4</command>
       </modules>
       <modules compiler="intel" mpilib="mpi-serial">
-        <command name="load">netcdf/4.7.4</command>
+        <command name="load">netcdf/4.9.0</command>
       </modules>
       <modules compiler="pgi" mpilib="mpi-serial">
         <command name="load">netcdf/4.7.4</command>
  1. From the cime/scripts directory, run ./create_test FUNITCTSM_P1x1.f10_f10_mg37.I2000Clm50Sp.cheyenne_intel --queue regular
  2. In the created test directory, notice the failure

There are two possible minimal changes to the above setup to avoid the problem:

  1. Remove the change to ccs_config (reverting to netcdf 4.7.4 instead of 4.9.0)
  2. Keep the ccs_config change in place but build the unit tests on the share queue instead of the regular queue (by specifying --queue share in the create_test command in step (4) above).

But the most straightforward to workaround this issue for now is to run the unit tests directly rather than through the FUNITCTSM system test – i.e., from the src directory, running ../cime/scripts/fortran_unit_testing/run_tests.py --build-dir unit_tests.temp.

@billsacks billsacks added the testing additions or changes to tests label Mar 25, 2023
billsacks added a commit to billsacks/ctsm that referenced this issue Mar 25, 2023
@billsacks
Copy link
Member Author

This has been resolved: Brian Vanderwende in CISL made a system change that should get them working again. He said:

Jim is right that 4.9.0 added a libxml2 dependency that has been problematic on Cheyenne (as there is no "libxml2.so" on the compute image - though there is "libxml2.so.2"). We account for this after 4.9.0 and so using 4.9.1 should work fine as Jim mentions.

That said, I've put a hack in that should hopefully fix the issue for 4.9.0 as well. Let me know how it goes for you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
testing additions or changes to tests
Projects
None yet
Development

No branches or pull requests

1 participant