Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Major build system enhancement: deprecate config_compilers.xml! #4537

Merged
merged 17 commits into from
Oct 7, 2021

Conversation

jgfouca
Copy link
Member

@jgfouca jgfouca commented Sep 14, 2021

Overview

This is the most significant change to the case-control system since the cmake-ification (#3043) effort 2 years ago. This PR is a natural follow-on to that effort that further cmake-ifies our build system and reduces technical debt / complexity in the build system by stripping away layers of CIME magic that were not adding much value for us. Specifically, the config_compilers.xml file, which CIME uses to generate the Macros.make and Macros.cmake casedir files (specifies compilers, flags, link flags etc), will be replaced with hierarchical CMake cache files that are persistently stored in our repo.

Background

CIME was designed to be build-system neutral so it could incorporate components with a variety of build systems. With this goal in mind, a build-system agnostic way of specifying build-system configuration (compilers, flags, etc) was conceived and thus config_compilers.xml was born. The case-control system reads this file and, using the current MACH, COMPILER, etc case env XML settings, produces Macro.$lang files in the desired build languages, e.g. Macro.cmake for cmake. These Macro files are then imported/included early-on during the build process for components in order to set up flags etc.

While this system could, in theory, support any number of build systems, we are limited by what macro generators have been implemented in CIME/BuildTools, which only has support for generating CMake and Makefile macros.

Implementation

cmake_macros

Plot legend for above image:

Green = python
Red = Make
Purple = CMake
Pink = XML

  1. There's quite a bit of data in our config_compilers.xml file, so a script is needed to automatically generate Macro files from this XML file. I used cime_config/machines/scripts/converter to do this and added the new Macro files to cime_config/machines/cmake_macros. The full set of new macros was generated by:
% cd $e3smrepo/cime_config/machines/cmake_macros
% ../scripts/converter ../config_compilers.xml
  1. A generic macro that includes all the relevant macros based on mach/compiler settings is needed. I created this by hand and added it here: cime_config/machines/cmake_macros/Macros.cmake. case.setup now copies this file verbatim to the casedir instead of being generated from config_compilers.
  2. case_setup also copies the rest of the cmake_macros recursively to the casedir.
  3. When needed, the cmake macros can be used to generate a Makefile macro (Macros.make). The Macros.cmake file has optional logic to record all new variables being set by the included macros and dump their values in Makefile syntax. The Makefile output does not support any conditionals, so the conditionals must all be resolved by CMake which should always be possible if all the relevant case settings and COMP_NAME are known. The sharedlibs that still use Makefile build systems are the ones that need this Makefile backwards compatibility.
  4. Once the new system is in place, we need a tool to confirm that flags are the same as they were before. I used cime_config/machines/scripts/compare-flags to do this. Example:
% cd $e3smrepo/cime/scripts
% ./create_test SMS_P12x2.ne4_oQU240.A_WCYCL1850.mappy_gnu.allactive-mach_mods --no-run -t with_cmake
...
% CIME_NO_CMAKE_MACRO=1 ./create_test SMS_P12x2.ne4_oQU240.A_WCYCL1850.mappy_gnu.allactive-mach_mods --no-run -t no_cmake
...
% $e3smrepo/cime_config/machines/scripts/compare-flags $case_without_new_macros $case_with_new_macros -v -u

Motivation/Benefits

This rework continues the effort to reduce layers of CIME "magic" in our build system. Specifically, the config_compilers.xml, XML/compilers.py, and BuildTools/*.py are no longer needed by E3SM. The interplay between these particular pieces of CIME was complex and not easy to understand, even for seasoned CIME developers, and was a frequent source of GitHub issues filed for CIME.

In addition to complexity, the expressiveness of the config_compilers.xml system was pretty limited. Only a small set of case settings could be used as attribute selectors and the only operations supported were setting and appending values. E3SM users/developers are now free to use any env case setting in their Macros while also being able to use a Turing-complete language (CMake) to define their flags.

Full support of Makefile builds, required for some of our sharedlib builds, is retained.

User Impacts

Instead of modifying a compiler/mach block in config_compilers.xml, you will modify a file $e3smrepo/cime_config/machines/cmake_macros/$specification.cmake ($specification will typically be $compiler_$mach). The same pseudo-inheritance structure that existed in the XML still exists, macros are loaded in this order:

  • universal.cmake : applies changes globally
  • $compiler.cmake : applies changes for a compiler type (on all machines)
  • $os.cmake : applies change for an OS
  • $mach.cmake : applies changes for a machine (for all compilers)
  • $compiler_$os.cmake : applies changes for a compiler/os combo (for all machines)
  • $compiler_$mach.cmake : applies changes to a compiler/mach comb

So, the more specific the specification, the higher precedence it has.

The syntax changes from XML (config_compilers.xml):

<compiler MACH="mappy" COMPILER="gnu">
  <ALBANY_PATH>/projects/install/rhel7-x86_64/ACME/AlbanyTrilinos/Albany/build/install</ALBANY_PATH>
  <CPPDEFS>
    <append COMP_NAME="gptl"> -DHAVE_SLASHPROC </append>
  </CPPDEFS>
  <CFLAGS>
    <append DEBUG="FALSE"> -O2  </append>
  </CFLAGS>
  <CXX_LIBS>
    <base>-lstdc++</base>
  </CXX_LIBS>
...

to CMake (gnu_mappy.cmake) :

set(ALBANY_PATH "/projects/install/rhel7-x86_64/ACME/AlbanyTrilinos/Albany/build/install")
if (COMP_NAME STREQUAL gptl)
  string(APPEND CPPDEFS " -DHAVE_SLASHPROC")
endif()
if (NOT DEBUG)
  string(APPEND CFLAGS " -O2")
endif()
set(CXX_LIBS "-lstdc++")

While the old config_compilers.xml system should be considered deprecated, it should continue to work for now. You can go back to it by setting CIME_NO_CMAKE_MACRO in your environment. If you want to continue working with config_compilers.xml in the near time (we will delete it at some point) you can feel free to do, just make sure to keep the new macros consistent with whatever config_compilers.xml changes you are making. You can use the converter mentioned earlier to do this automatically.

Future Work

Depends

In the semi-near term, I'd like to replace the Depends files with something better. In the longterm, I'd like to see E3SM fully buildable through cmake, include sharedlibs.

Changing flags based on compiler version

This refactor unlocks the potential for some nice new capabilities such as flexibility for changing flags based on compiler version. In the current system, the only way to achieve this is to defined a new compiler block, e.g:

<compiler MACH="foo" COMPILER="intel17">
...
<compiler MACH="foo" COMPILER="intel18">
...

This approach leads to lots of duplication.

Instead of this, we can now have a single intel_foo.cmake file:

if (CMAKE_Fortran_COMPILER_VERSION VERSION_GREATER_EQUAL 10)
  # Special flags for >=10
else()
  # Special flags for < 10
endif()
# Stuff that is common to both versions.

Using cmake-approve names for things

e.g. CMAKE_C_COMPILER instead of SCC

Remove levels of inheritance

We only need universal, compiler, and mach+compiler levels, maybe mach too.

Documentation

I will update our confluence build system documentation (https://acme-climate.atlassian.net/wiki/spaces/DOC/pages/1011942171/E3SM+Build+System+and+CMake) once this PR gets merged.

[BFB]

…uca/replace_xml_with_cmake_macros

* origin/jgfouca/fix_cimeroots: (21 commits)
  Fix CIMEROOTs for buildnml scripts
  Update CIME submodule
  Run ne4-grid cases on single-node
  Replace run_e3sm template
  Run F-MMFOMP OpenMP-offload case without multithreading
  Fixing path for ocn grid domain file
  Replace or remove settings for deprecated MPAS grids
  Add initial cmake machinefiles for perlmutter and cori-gpu. No impact on other machines. [bfb]
  Run ibmgpu builds with 42 tasks/node in SMT1 mode
  Fixing stray present clause on OMP directive
  Cleanup pgigpu on Ascent
  Adding documentation for source of arrays in diagnostics
  Cleanup of OpenACC in diagnostics to note which arrays need to be brought in for each subroutine
  Expanding OpenACC region in diagnostics to cover the entire routine with the exception of one difficult loop. Data required to be on device is moved there at top of routine and data that needs to be updated on host is moved back at the end of the routine
  Adding OpenACC to Wright EOS and finishing OpenACC in EOS routines
  Adding OpenACC to Linear EOS and fixing incorrect results
  Extending OpenACC code in JM EOS Making ocnEqStatePRef device only in JM EOS
  Update builds on Ascent
  Remove deprecated MPAS grids from the scripts
  Remove LANDM_COSLAT from topo tools
  ...
…_with_cmake_macros

* origin/master: (38 commits)
  Fix the CircleCI script that installs Singularity
  Undo cime subm update
  Fix some stale CMake paths
  clean up crm_history
  revert changes to crm interface rad variables
  another bug fix
  bug fix
  Add check for icol bounds
  be more careful about ncol vs pcol
  change icrm to icol in crm_history
  Add trim to be safe
  Cosmetic update from PR review
  Minor bug fixes and clean up
  switch outfld calls from pcols to ncol
  more bug fixes and clean up
  Refactor crm_history
  bug fix
  rework handling of ECPP output
  bug fix
  update SAM/SAMOMP interface
  ...
Do NOT use MODEL as a selector in config_compilers.xml, use COMP_NAME
instead.
@jgfouca jgfouca added BFB PR leaves answers BFB CIME labels Sep 14, 2021
@jgfouca jgfouca self-assigned this Sep 14, 2021
@rljacob
Copy link
Member

rljacob commented Sep 14, 2021

Looks good. We'll need a shorter summary for the commit message.

Is the inheritance in exactly the order you listed? I would put os after universal and before compiler in the order.

What about the case where we want two or more versions of, say, the intel compiler on a machines and different flags? In config_compilers, we would call the default "intel" and the experimental one "intel21" for example.

@rljacob
Copy link
Member

rljacob commented Sep 14, 2021

You can delete the three IBM_BG*.cmake files.

if (NOT DEBUG)
string(APPEND CFLAGS " -O2 -debug minimal")
endif()
if (DEBUG)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can this if (DEBUG) block be combined with the one at line 15? I realize this is probably the raw output from your conversion tool.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, this is from the conversion tool. I could try to change it to group items with the same selectors, but that would change flag order which could change behavior.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I think the grouping might be fine as is as sections for different compilers (C, C++, Fortran). It would help to have some comment strings to clarify that.

string(APPEND CPPDEFS " -DHAVE_SLASHPROC")
endif()
string(APPEND CFLAGS " -static-intel")
string(APPEND CFLAGS " -march=core-avx2")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can lines 4 and 5 be combined in to one?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, it should be safe to group sequential appends with the same selectors (or lack of selectors in this case).

@rgknox
Copy link
Contributor

rgknox commented Sep 14, 2021

I appreciate your efforts to update the specification files for all machines in the new system, thanks @jgfouca

Copy link
Contributor

@bartgol bartgol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have much to share. Besides, it seems that most of the work must have been in the cime submodule, leaving "just the mach files" for here.

As Rob noted, some if statements can be lumped, like all the if (NOT DEBUG) for C and F flags. Also, I noticed stuff like

if (MPILIB STREQUAL mpi-serial)
...
endif()
if (MPILIB STREQUAL !mpi-serial)
...
endif()

Can't that be turned into a if/else? Are there other values for $MPILIB ?

@sarats sarats self-requested a review September 15, 2021 15:19
@sarats
Copy link
Member

sarats commented Sep 15, 2021

I would like to think through the implications of this on porting for a new LCF machine. Some of the precedence options don't seem to be relevant there.

For all its warts, config_compilers is a single place to understand/configure compiler options for a machine. I would prefer the replacement to be streamlined as described above but not too scattered across files and a deeper hierarchy. Just my initial thoughts while we (@grnydawn and I) look at porting Frontier testbed configuration.

@bartgol
Copy link
Contributor

bartgol commented Sep 15, 2021

I actually like more smaller files, rather than a single ginormous one. I think this granularity makes adding and maintaining new $mach-$compiler combos a bit easier.

Imho, what could actually help is structuring the cmake_macros folder some more, to avoid listing a folder with O(50) files. I'm thinking something like this

cime_config/machines/cmake_macros/
  |
  + -- generic
  |     |
  |     + -- universal.cmake
  |     + -- userdefined.cmake
  |     ...
  |
  + -- compilers
  |     |
  |     + -- Intel.cmake
  |     + -- GNU.cmake
  |     + -- nag.cmake
  |     ..
  + -- os
  |     |
  |     + -- darwin.cmake
  |     + -- linux-generic.cmake
  |     ...
  + -- machine
  |     |
  |     + -- cori-knl.cmake
  |     + -- mappy.cmake
  |     + -- summit-gpu.cmake
  |     + -- chrysalis.cmake
  |     ...
  + -- mach_compiler
        |
        + -- anvil_gnu.cmake
        + -- anvil_intel.cmake
        + -- theta_gnu.cmake
        + -- theta_intel.cmake
        ...

but maybe that's just my personal taste in folder organization. :)

@rljacob
Copy link
Member

rljacob commented Sep 15, 2021

I'd like to suggest removing the "os", "mach" and "compiler_os" levels. In the current config_compilers, most people spend their time in either "compiler" or "mach_compiler" settings. It will be too hard to think about side effects when modifying "os" or "compiler_os" which would affect multiple machines at once.

Removing those would reduce the number of files and we wouldn't need subdirectories.

@jgfouca
Copy link
Member Author

jgfouca commented Sep 15, 2021

@bartgol , it that case, it could. All these macro files were auto-generated; it could be pretty tricky to figure out where if/else can be used instead of a sequence of ifs.

@jgfouca
Copy link
Member Author

jgfouca commented Sep 15, 2021

@sarats , I agree the deep hierarchy makes it hard to reason about the system. For what it's worth, the hierarchy implemented in this PR is no different than the hierarchy that was being expressed in config_compilers.xml. I agree with @rljacob that we should flatten this out to just universal, machines, and machines+compiler. I don't know if I want to take this on in this PR because then this PR is no longer backwards compatible.

@sarats
Copy link
Member

sarats commented Sep 15, 2021

I agree with @rljacob, a two-level, compiler and machine_compiler seems logical and provides good separation and minimizes surprises/side-effects. 'os' doesn't make sense to me. I'm not sure if machine in isolation makes sense either as any depends or similar items there are associated with a compiler combination.

To minimize re-implementation effort in the future, it's probably better to discuss these style of changes widely before actually implementing.

On a minor note, how would this impact existing PR #4529?

@xylar
Copy link
Contributor

xylar commented Sep 15, 2021

I just want to note here that other projects (E3SM-Unified, compass and mache) all piggyback on this work. I will need some help figuring out how to extract the information that was previously in the XML file for those projects to use. Otherwise, I will end up needing the inverse of the cime_config/machines/scripts/converter to recreate the XML file from the cmake files.

@sarats
Copy link
Member

sarats commented Sep 15, 2021

@grnydawn: wondering if this impacts how you extract build configuration for ekgen kernel extraction?

@jgfouca
Copy link
Member Author

jgfouca commented Sep 16, 2021

@sarats , any changes to config_compilers.xml can be captured with a subsequent run of the converter script.

@xylar , I can help with those projects if need be. If they are using the CIME case control system and/or CMake, then almost no effort should be needed.

I should have mentioned these changes were proposed in CIME V6 outline: ESMCI/cime#3886 and implemented in CIME PR: ESMCI/cime#4088 . Notice the 4 fixed issues in the CIME PR and I'm sure there were more issues that had to be fixed in the existing system that were already closed.

…_with_cmake_macros

* origin/master: (117 commits)
  Add a new machine named GCP (google cloud platform) to help with running on these virtual clusters.
  Hommexx/SL: Make compose_ut use just one proc.
  Hommexx: Fix various warnings.
  Hommexx: Fix limiters_ut in non-BFB standalone test.
  Hommexx: Fix some SL and GllFvRemap issues.
  Hommexx/SL: Fix an implicit capture of 'this'.
  Split up CIME
  Fix a few items in LICENSE
  Fix a few more words on README
  Few more README fixes
  Update README for v2
  A few updates to CONTRIBUTING
  Update LICENSE
  white space cleanup
  Update CIME to 80220601778
  Add timers to driver-mct to provide context to SCORPIO timers
  white space cleanup
  Set E3SM default config_AM_conservationCheck_enable false
  Update to tabular output, add E3SM earth radius
  Add landIce, iceberg, temperature variables
  ...
@jgfouca
Copy link
Member Author

jgfouca commented Oct 5, 2021

Reviewers, I'm going to begin integrating this unless there is an objection. I'm electing to keep the converter script as dumb and simple as possible in order to reduce the possibility of mistakes in the conversion. The result is cmake macro files that aren't elegant, with statements that could be combined, if/else instead of if/if. I'm thinking we can clean up these macros as they are maintained as needed.

jgfouca added a commit that referenced this pull request Oct 6, 2021
)

Major build system enhancement: deprecate config_compilers.xml!

This is the most significant change to the case-control system since
the cmake-ification (#3043) effort 2 years ago. This PR is a natural
follow-on to that effort that further cmake-ifies our build system and
reduces technical debt / complexity in the build system by stripping
away layers of CIME magic that were not adding much value for
us. Specifically, the config_compilers.xml file, which CIME uses to
generate the Macros.make and Macros.cmake casedir files (specifies
compilers, flags, link flags etc), will be replaced with hierarchical
CMake cache files that are persistently stored in our repo.

[BFB]

* jgfouca/replace_xml_with_cmake_macros:
  Fix handling of '\!'
  Add CIME hotfix
  Use COMP_NAME, not MODEL
  Sync with latest config_compilers.xml
  Add support for compiler_os.cmake macros
  Improve dox for compare-flags
  compare-flags: make tool more flexible and better output
  Refresh cache files after upstream merge
  Add some helper scripts for this effort
  Update CIME with latest
  Add support for universal macro
  Update CIME to bring in new macro capability
  Remove obsolete comment
  Add new cmake-based macros
@jgfouca jgfouca merged commit 4c32474 into master Oct 7, 2021
@jgfouca jgfouca deleted the jgfouca/replace_xml_with_cmake_macros branch October 7, 2021 19:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
BFB PR leaves answers BFB CIME
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants