Skip to content

Meeting Notes 2018 Software

Erik Kluzek edited this page Oct 26, 2018 · 107 revisions

Nov 18, 2018

Agenda

  • Martyn: Discuss workflow for MizuRoute grid aggregation. (Bill wants to think about how this would fit into the CESM/CIME workflow, with respect to when you define grids and set up mapping files: How much of this aggregation is done ahead of time (allowing you to prestage the necessary mapping files) vs. at runtime?)

Nov 5, 2018

  • Erik: Current milestones are: clm5, cmip6, cesm2.1.0, and future. clm5 is already done. future, probably isn't useful. What is useful is to put in the requirement wishlist for future releases. Those we need to manage. So I'm making cesm2.1.0 the ones we HAVE to get in. But, some things should be done fairly quickly, but after the cesm2.1.0 timeframe. so I need a label for those. cmip6 is now too broad of a milestone, because some things are needed at some point along the cmip6 process, but not for cesm2.1.0.
  • Erik: Future scenarios. I got it setup for mksurfdata for SSP5-8.5, but could extend it for the others. What is the priority on this?
  • Erik: Somethings that should get done, conus_30_x8 grid, mosart 8th degree, high-res PFT. Priority for these?

Oct 22, 2018

Agenda

  • Bill: Organization of usermods: note how I'm doing nhtfrq, etc.
    • Show user_nl_clm for a case with output_sp_highfreq
    • Show generated lnd_in
  • Bill: Fixing non-prognostic-crop transient bug in release
    • Erik: are you able to do a careful review of the latest change and shepherd this tag through?
    • Is this needed on the cesm2.0 release branch, or just the clm5 release branch?
  • Bill: How to notify people of this bug?
  • Bill: How can we prevent bugs like https://github.com/escomp/ctsm/issues/538 in the future?
    • This doesn't seem like something that could have easily been prevented through more care in coding (either in the initial code related to this or in the changes that were made)
    • One idea is: For risky answer-changing tags, doing some group brainstorming on things that should be manually checked? (e.g., in this case, manually checking crop areas in a transient run). This could be done ahead of time (coming up with a checklist) or as a review of what the primary SE checked (e.g., if I was the primary SE, I would present what I have manually checked, and at least one other person would review that and say if they felt it's sufficient).
    • (erik) along with above to figure out the ramifications of a change and what options it will effect. Right now it's obvious to me why this doesn't work like we expected. But, we didn't think of it at the time. But, had we thought through the process more we should've realized it at the time.
    • Smaller tags would help here (helping to focus a reviewer on one thing), especially when there are answer changes
    • More tests like https://github.com/ESCOMP/ctsm/issues/542
    • (erik) do we know if the right behavior happens when create_crop_landunit=F?
  • Erik - Jim's changes to rtm and mosart, surprisingly change answers for five tests. So I'm going to put those updates in the answer changing tag. How much time should I spend on figuring out why answers change?
  • Erik - ne30/conus grid F case issues? What should we do?

Format of nhtfrq, etc. in usermods

Dave & Erik are happy with the format that lists individual items separately.

Issue 538 (non-prognostic-crop transient bug)

We want to fix this on master and for the CESM2.1 release (so on the clm5 release branch). We'll think later about whether to fix this for a CESM2.0.2.

Notifying people: Dave's inclination is to notify LMWG people along with the 2.1 release.

How to help prevent bugs like this moving forward?

Bill: Not a silver bullet, but: For risky answer-changing tags, doing some group brainstorming on things that should be manually checked? (e.g., in this case, manually checking crop areas in a transient run). This could be done ahead of time (coming up with a checklist) or as a review of what the primary SE checked (e.g., if I was the primary SE, I would present what I have manually checked, and at least one other person would review that and say if they felt it's sufficient).

Others agree.

Also, longer-term: Bill has some ideas for system tests we can do to catch this exact problem.

  • Erik points out that we probably need a corresponding cime change

Answer changes for rtm and mosart

Mariana feels it's important to get these changes in, and to understand the reason for the answer changes.

ne30 grid F case issues

NaNs getting sent from land to cpl after some number of years, starting in a single point. It's happening in cold regions.

Dave's suspicion is: there is some unusual situation coming from CAM, which causes CLM to get unhappy. e.g., it may be due to super-cold temperatures.

Mariana: things coming down the pike for NUOPC

In the NUOPC coupler framework, it assumes that all components send meshes (which include connectivity), to allow online regridding. So for CLM and other components, will have another input file that describes the mesh; for simple grids, there's a tool that creates this.

So we'll have 3 files that need to be consistent: surface dataset, domain file and mesh file. May want to think about how we want to combine these - or at least ensure they stay consistent.

Oct 4, 2018

MizuRoute

One motivation for this network-based routing scheme: allows inclusion of reservoirs.

Mariana suggests, as a first step, putting in place an mct-based cap.

Stepping back: we are proposing bringing in a 3rd river model in CESM alongside RTM / MOSART.

Big pieces here:

  1. Infrastructure code that can / should be shared with MOSART / RTM / MizuRoute

  2. MCT cap

  3. Remapping (using ESMF)

Note that MizuRoute is currently not parallelized, but they're working on that.

Remapping

Currently, MizuRoute is doing its own remapping (in serial). But we want the coupler to do this.

As a side-note: We would typically run CTSM at comparable resolution to the river - quite possibly even on a grid defined by basins / catchments.

MCT cap

Translates data structures between MisuRoute data structures and MCT data structures.

At initialization, cap tells coupler what grid cells are, what areas are, and domain decomposition.

Note that you don't send the corners in MCT, so it's fine to have complex polygons. (You still need to define the corners offline for the purpose of ESMF map generation.)

Time stepping

Want to maintain the same flexibility as we currently have, that allows coupling to happen less frequently than the CTSM time step.

Decomposition

It's important to get parallelization in place: for the high-res CONUS grid, it takes 3 hours per model year.

Martyn asks if we can have nested decomposition. In his figure: grey needs to be done before the orange, which needs to be done before the red, which needs to be done before the blue.

MCT doesn't care about the details of how the decomposition is done: it just cares about how grid cells / elements are distributed amongst processors.

Coming up with a general decomposition strategy within MisuRoute could be challenging....

Joe points out: We could consider using MPI just at a very coarse level (just for main rivers), and using OpenMP threading for everything else. People like this idea.

  • In CESM, if you run a component on the same processors as a different component (sequentially), then all of those components need to use the same number of threads. However, it seems like we may want to run the river model concurrently, on its own set of processors, in which case it has full control over the number of threads per processor.

We might be able to share the decomposition code that's used for RTM / MOSART.

Note that the coupling will happen with polygons that fill the domain. These polygons are defined as the catchments of the smallest-level streams.

Next steps

Mariana suggests trying to bring in MizuRoute first as a single-processor model, to get all of the coupling infrastructure sorted out. Or, as Martyn suggests, could put in place OpenMP parallelization at first.

Bill asks: to what extent do we want to have this tied in with cime initially, for the build and setting up namelists? Mariana thinks we should at least hook in the build, but initially we could have a dead-simple build-namelist script that just puts in place a pre-staged namelist file.

Oct 1, 2018 CLM Software

Agenda

  • Follow up on discussion about friction velocity, with Mariana
  • Bill - For Dave: Confirming it's okay that the CESM2.1 code base will need use_init_interp to continue from existing historical simulations
  • Bill - Dave: Want any help putting together slides for tomorrow's co-chairs meeting re: things still wanted for cesm2.1?

CESM2.1 release planning

We have 2 weeks. That should be plenty of time for the needed three tags.

If we did anything else, it could be a data reduction tag - turning a bunch of things to inactive by default. One thing to look at is how carbon isotope fields are treated by default.

Usermods to get correct output settings

Mariana: We're not going to have cylc for the community, so we should use usermods to get the correct output.

We'll add hist_fexcl1 for isotope in the carbon isotope user mods. (Note that, currently the excluded monthly vars are also included as annual vars. That may not be needed for all the C isotopes.)

We should split the cmip6_output directory into two: one that just has the low-frequency output, and one that adds high-frequency. Then we can have a top-level cmip6 as well as cmip6_high_frequency. And do the same thing at the CESM level. The main one will exclude the daily and 3-hourly output.

Actually: We should have different output directories:

  • output_sp
  • output_bgc (includes output_sp)
  • output_crop (includes output_bgc)
  • output_sp_highfreq (builds on output_sp)
  • output_bgc_highfreq
  • output_crop_highfreq

We'll also have cmip6 (which includes output_crop) and cmip6_highfreq (which includes output_crop_highfreq, which should be identical to what is now cmip6_output).

More discussion about FrictionVelocity

Mariana isn't sure if there is actually duplication between what's in FrictionVelocity and what's done in other components. Would need to talk to Bill Large, Dave Bailey and Marika Holland about this.

Dave L: we could ask Thomas to spear-head this and look into the similarities / differences between different components.

Confirming it's okay that CESM2.1 code base will need use_init_interp

Confirming it's okay that the CESM2.1 code base will need use_init_interp to continue from existing historical simulations: Dave is okay with this.

Ideally we'd set use_init_interp by default when it's needed... but that's hard to determine.

Sept 24, 2018 CLM Software

Agenda

  • Bill - Issue #509: irrigate is true for non-crop 1850 runs
    • Confirm how we want to fix this
    • Should this be fixed for CESM2.1?
    • Who should fix this? (Erik understands the subtleties of this better than I do, but if his plate is overloaded, I can take a stab at it)
    • Erik: Note, I'm pretty sure this will require code changes see issue (the branch Sam is working on fixes it)
  • Erik - CESM2.1 coordination. Talked about this at CSEG meeting. CAM needs 3 more tags. POP needs 1. We need a cime update and cesm2.1 release tag. There will be a separate branch for cime for cesm2.1 than cesm2.0. For CLM we need a tag that updates cime and ndep. The CLM tag needed for cesm2.1 will go on the clm5.0 branch. We normally put the changes on master first, but could put them on the branch first and then migrate to master.
  • Erik - See cesm2.1 testdb: https://csegweb.cgd.ucar.edu/testdb/cgi-bin/viewPlannedTag.cgi?tag_id=466
  • Erik - Should we separate the cime update for presearo from the ndep update?
  • Erik - Should prescribed aerosols and ndep be same as CMIP5 1850? Currently not, so 1850 will be different than historical
  • Erik - Talked to Peter about creating rawpft datafiles that are identical to old, but have GULAC fields on. One way to do it is with nco. I told him to keep that in mind while he is working on his code, if doing it with nco would be easier, we should do it that way. We agreed on shooting for Thursday to get this done by.
  • Erik - FYI. Peter was frustrated with git. Wanted to know how to see that his push to his fork worked.
  • Erik - Sheri brought up reducing output. Want to reduce default output for cesm2.1 to only that needed for diagnostic package as a proposal. She will talk about this at co-chairs tomorrow.
  • Erik - My tag. Made changes over weekend, need to rerun testing and tag tomorrow.
  • Bill - timing of performance changes
  • Dave - From Thomas Toniazzo: Consistent treatment of all material and energy fluxes between different components in the coupled systems so that both matter and energy are conserved. (As you know at present matter is not considered to be carrying any energy in CESM). In order to achieve this, one bit of progress that I would think important would be to centralise the computations which are carried out in different components for the properties of the atmospheric flux layer. For example, in components/clm/src/FrictionVelocityMod.F90 there is one such calculation, using in particular specific stability functions that are defined locally. So I wonder whether it would be possible to move some of these calculations to cime code repository, e.g. to cime/src/share/utils/, where also the ocean- atmosphere flux calculations are done. In am thinking specifically of the scientific models that pertain to the atmospheric surface layer and not to other specific models, so in particular the stability functions, and also some of the in-lined computations following e.g. in lines 472 in FrictionVelocityMod.F90 I see some comments in that routine (line 503) that point in this direction, so perhaps it may be possible to coordinate this with my planned work on harmonising the model's energy formulation.

Issue #509 (irrigation discrepancy between runs with vs. without use_crop)

We updated this issue.

We'll come back to this after the cesm2.1 crunch and after the changes Sam Levis is working on.

Diagnostic output volume

Dave and Keith have looked through this a few times, and don't feel that they can get the output volume much lower.

A bigger concern for Dave is: A lot of the savings come from excludes in the cmip6 output user mods. We might want to make those inactive by default.

How can we make it clear to users that they should include these user mods?

Bill: A kind of crazy idea is: We could make the default user_nl_clm that's copied to your case dir actually have some of this output stuff. But then we'd need to remove some error-checking code at runtime, so that it lets you add fields that aren't actually present in this run.

For now, some things we want to do are:

  • Split the user mods directory into one that just has monthly fields, and one that builds on that, adding the high-frequency output
  • Mention this user mods dir in the user's guide

Shared code for friction velocity?

We could consider moving this to a function in cime. The simplest thing would be to put a single-point routine in cime, but that might have performance issues. A better solution could be to have a cime routine that operates on arrays as input and output. We'd then have some pre code that packs data into arrays (using only points within the filter), and then some post code that unpacks the outputs. (This packing and unpacking would be a handy little general-purpose subroutine to have.)

A first step in the right direction is would be to remove some apparently duplicated code that depends on whether landunit_index is present. The approach could be the same as above: having pre code that packs data into arrays and post code that unpacks the data.

Sept 20, 2018

Agenda

  • Bill - Discuss issue #511: How should we ensure that the source state for each water tracer is set correctly?

Issue #511: How should we ensure that the source state for each water tracer is set correctly?

Discussion of options

Bill laid out four options:

  1. Do nothing about this

  2. Put in place a system test that catches problems, if we can think of one that can do this

  3. Organize the code so that the tracer updates happen alongside their respective state updates

  4. Set pointers that point from a flux variable to its source state, which can be used both in the state update code and in this tracer code.

People agreed that this could truly be an issue.

Dave thinks that the best option could be coming up with a system test for this, if we can think of one.

Dave also thought that organizing the code so that tracer updates happen alongside their respective state updates might be a good option, though also agrees that it could muddy the code for someone trying to understand just the state updates and not caring about tracer stuff.

We thought of a 5th option:

  1. Renaming fluxes to be explicit about their source (and maybe destination), as is done for the biogeochem code.

This wouldn't guarantee that things are done right, but it should make it more obvious when things are done wrong, and it should prevent someone from changing the state/flux structure, but reusing a flux variable to have a different source state. People felt that this might actually be the best solution. Nobody could think of a case where it fundamentally would NOT work to have a single source for a given flux - though there may be cases like two-way fluxes (e.g., for Richards equation) where we need to split what's currently one two-way flux into two one-way fluxes (see below). And Dave thought this could help clarify the code, and so might be good for other reasons anyway.

Martyn: Long-term, we might want something like a data structure like:

eqn%var(ixLookState1)%state eqn%var(ixLookState1)%flux(:)

Then you can loop over this to do the state updates.

Bill: will consider moving incrementally toward this; for now this would involve something like option (4) above - having some extra pointers. The upside is that it would force code to remain self-consistent; the downside is that it could make the code harder to understand.

Bidirectional fluxes

What to do about a flux that could be either direction, e.g., for Richards equation? We're not sure how this would work with the vision for tracer updates. We could split a two-way flux (positive or negative) into two, positive one-way fluxes (one of which will always be 0). Martyn points out that this could be problematic for computing derivatives, but it's possible that this could be done after the fact (after solving for all of the fluxes). We'll check if David Noone has ideas for how this could work.

Sept 6, 2018

Agenda

  • Erik/Sean -- How important are grids on -180-180? I've added an immediate fix for local noon. But, if they are important we should add tests and show that we get the same answers with negative longitude. Other cesm grids are sometimes on this (notably cism), and it does help if you want to do regions along 0 degrees. As far as I can tell cime works with these grids.
  • Isotope project
  • Isotope work update
  • Bill's upcoming priorities

Grids on -180 to 180

Mike: WRF and the National Water Model work with -180 to 180 by default, but this can be changed in preprocessing. If it's going to take a lot of time to fix this, then it may not be worth doing.

Erik has fixed some things, but there are apparently still some other problems. e.g., Sean has found some problems with reflected solar.

Canopy Hydrology cleanup

Probably separate this into at least three routines:

  • Snow cover fraction
  • Snow initialization
  • Canopy Hydrology (though it does more than that)

For now, we'll keep state updates where they are. Later, we might merge some state updates (which would change answers).

Refactoring snow cover fraction calculations in CanopyHydrology

We'll keep in place the original fsca formulation (n&y 07) - probably giving it a better name than just origfflag. Note that that is the form used in Noah-MP.

Note that, the way things are currently structured, it does the new frac_sno calculation, then possibly overwrites it with the old.

And note that there are order differences for the new vs. old: New needs to compute frac_sno, then update snow_depth based on that; old needs to first update snow_depth, then compute frac_sno based on that. So we should have a routine that updates both frac_sno and snow_depth, and let individual parameterizations do that in whatever order they want.

Then this will be lumped with the FSCA block above it:

         ! for subgrid fluxes
         if (subgridflag ==1 .and. .not. lun%urbpoi(l)) then
            if (frac_sno(c) > 0._r8)then
               snow_depth(c)=snow_depth(c) + newsnow(c)/(bifall(c) * frac_sno(c))
            else
               snow_depth(c)=0._r8
            end if

whereas this will be lumped with the "original fsca formulation" below it:

         else
            ! for uniform snow cover
            snow_depth(c)=snow_depth(c)+newsnow(c)/bifall(c)
         endif

i.e., it is currently assumed that subgridflag==1 is combined with oldfflag==0, and vice versa.

Frequency of these meetings

We'll aim for biweekly meetings. But we'll keep it on the calendar for every Thursday and decide week-by-week if we want to keep it.

August 20, 2018

Agenda

  • Erik -- PGI bug #442 what should we do? Worked with Jim last week to have PGI compiler guys work on it. Had to prove an issue. I finally showed that it worked for PGI16.5, but failed for all other compiler versions we have available.
  • Erik -- Set options bug #431? Dataset issue Keith found #478?
  • Erik -- FYI: added "-fast" option to mkmapdata.sh and it was able to run on normal 1-proc queues
  • Erik -- r8th grid for mosart?
  • Erik -- Data from Peter L.?
  • Erik -- Discuss location of FATES/ctsm monthly meetings
  • Erik -- CESM2.1 updates that are needed: CO2 (waiting on Doug K.), ndep/presearo for transient, FATES, transient speedup?, WACCM-X nstep startup issue and testing, improve user friendliness of build-namelist, snowmip, littfall
  • Erik -- Ethical use for publication addition to Code of Conduct.
  • Erik -- Branch needed for Steve G. that changes pause-resume behavior.

PGI bug #442

As far as Erik knows, the PGI guys are working on it now. So we'll wait for them.

Priority of #431 (History fields incorrect when set_xxx=0 but the xxx landunit is also set in initCold)

Feeling is: let's fix the known issues, but not high priority to look through carefully for other issues.

mkmapdata with -fast option

With the new -fast option, it can run on a single processor, though it still takes a few hours.

r8th grid for mosart

Sean did some work on this and got it to work.

Should this become an option in cime? Dave feels probably yes, with the main hesitation being that we'll eventually have a new network-based routing scheme, which could replace mosart for high-resolution. But in the near-term, 1/8 degree mosart could be nice. Maybe aim for this for CESM2.2.

CO2 for CMIP6

After a lot of back and forth, decision has been to just use globally-averaged rather than latitudinally varying, for coupled runs.

The dataset is pretty much ready, but they need to resolve Gregorian vs. noleap.

For datm, Dave feels it would be good to have the option to go back and forth: in a lot of cases, we'd like latitudinally varying, but for comparison with cmip6, might want to use globally-averaged. Erik might do that by having a separate field in the dataset that is spatially and temporally averaged, so you could point to that other field.

Dave: if possible, we should try to use the same file that CAM is using, rather than creating our own file. This may not be possible, though, because we need a streams-format file, whereas we think CAM is reading a slightly different format.

Priorities for CESM2.1

Priorities in order:

  1. Needed: co2, ndep/presaero for transient

  2. The transient speedup would be really good to have

  3. The nstep fix would be also really good to have

August 9, 2018

Agenda

  • Dave -- Status of CLM5/WACCM-X bugs, removal of CLM4
  • Erik -- Bring ndep update to master?
  • Erik -- Tool chain for NWP (option to skip 1km_hydro maps and use constant topo_std=371?) (The memory use requirements for mkmapdata.sh is enormous, this would bypass that)
  • Bill -- Default options for NWP (https://github.com/ESCOMP/ctsm/issues/456)
  • Bill -- Water isotope update

Some notes about running in WRF

Lakes and urban

When WRF runs with Noah-MP, lakes and urban run outside of Noah-MP, as separate land surface models called by WRF.

What do we want to do with CTSM?

  • For lakes, it probably makes sense to have CTSM handle these.
  • For urban: For single-layer, it could make sense to have CTSM handle it. Multi-layer urban (sticking into the atmosphere) is trickier....

Getting CTSM to work with the WRF workflow

Getting CTSM to work with the WRF workflow will take some work.

Mike has created scripts that take the WRF geogrid file and create a CTSM domain. But we might need to do stuff later like remove lakes from the geogrid file (or have a WRF namelist flag to tell it not to do lakes itself).

Initialization procedure: typically, initial conditions in Noah-MP come from some other model and/or HRLDAS.

  • We may need the capability to blend a CTSM restart file with initial conditions from some other source.
  • Mike notes that, in some cases, they take something like skin temperature, and initialize a bunch of other temperatures from that
  • Mike notes that there is currently no way to initialize the crop model in the middle of a season

How to deal with very high-resolution (1km or higher) raw input datasets

For the Noah-MP raw input data, they store individual tiles. We might want to think about doing that, or doing an initial step of subsetting the high-res raw data.

Mike notes the big difference that, for regional applications, everyone has their own domain/grid, so the need to create something like mapping files from raw data to the surface datasets becomes something that all users need to do.

The highest priority thing right now is to bypass the current 1-km input file. We either want to have a lower-resolution version of this file, or have the option to bypass this, using a different subgrid snow cover fraction parameterization.

We want out-of-the-box NLDAS grid

We'd like the capability to have NLDAS out-of-the-box. This is a 12-km grid. This is a better one to use than the 30-km grid.

We'd like to have NLDAS datm forcing in addition to the NLDAS CTSM grid.

Erik thinks it makes sense to start with having this done with CLM USRDAT. A following step would involve having it as a supported grid in cime. But actually, after some more discussion, we decided to just go straight into cime. Among other things that would let us choose a PE layout for that grid. (We do have precedent for having single-point and regional grids in CIME; there's something that detects regional grids and turns off ROF; look at how that's done.)

Status of CLM5/WACCM-X bugs, removal of CLM4

Erik has made some progress on the ESMF library bug, but hasn't figured it out. But this just affects crop, which probably isn't a standard configuration for WACCM-X.

For the balance check issue: For now we're going to change the hard-coded 2 time steps to 1 hour. (Later we could make this namelist-settable if needed.)

Dave will send an email to get final confirmation, but at this point we can probably go ahead with CLM4 removal.

CTSM vs. Noah-MP comparison

If possible, we'd like to compare:

  • Noah-MP
  • CTSM in NWP configuration
  • CTSM in its standard configuration, with the same forcing data as the other runs

We could also show at the meeting what it takes to get the NWP configuration working within CTSM now (hopefully it will be pretty out-of-the-box by then).

For initial conditions: use_init_interp from our existing 2000 initial conditions file.

July 26, 2018

Agenda

  • Bill - Confirming that we want to store users guide and tech note source in the source tree: I think this will make it easier to keep these in sync with code changes, but for the tech note in particular, we'll need to be vigilant about preventing large images from entering the repository. So I'm wondering if we should at least store the tech note source in a different repository so we don't need to worry about file size....
  • Bill - how do we want to manage defaults for different versions moving forward?
  • Mike/Erik -- Creating regional grids for NWP for CTSM. Currently have a tool chain for this. Requires SCRIP grid file. Removing ocean points requires some extra work (you can leave them in and run as wetland). Standard resolution PFT rawdata is quarter degree, Peter is creating a 0.05 version for just present day. Other raw datasets are various resolutions.
  • Erik -- What compilers do we need to support? Currently having trouble with PGI.
  • Report
  • Upcoming plans

Do we want the users guide and tech note in the source tree?

Erik: Maybe more reason to have the user's guide in the source tree?

Mike: It could be helpful to have the tech note in the source tree if we want people to update it frequently. Would it make sense to have images in a separate repository? Erik: We could do this with manage_externals.

Bill: like that idea. Will check with Keith, then tentatively plan to do this, as long as it isn't too hard in terms of the build.

Are we planning to maintain a bunch of options for each parameterization long-term?

Partly related to the tech note, Sean raises the question of whether we envision maintaining a bunch of options for each parameterization long-term (and thus needs to be documented in the tech note), or do we more envision that we'd pull out less-well-performing parameterizations after initial development?

Martyn: There's some value in keeping in a parameterization just because it was used in some important paper.

Mike: Also, with regional modeling, there are cases where different parameterizations work better for different regions.

Dave: Some balance here: We don't want every possible parameterization in the model, but want to keep fundamentally different ones, when they're useful. Need to think about how to organize the tech note; might just document the default configurations, possibly including alternatives in an appendix.

How to manage defaults for different versions moving forward?

For example, we currently have clm4_5 and clm5_0. Do we want to just keep incrementing that number (going to clm5_5 or clm5_1), or move to something completely different for the climate configuration?

One convenient note is that clm could be thought of as short for climate. Then we could keep going with our clm numbering, thinking of it as the numbering of the climate version within ctsm. We would also have physics suites like nwp and hyd (or wat). Though Mike points out that the nwp configuration will need different names for the different systems in which it's used.

Dave wants to bring this to SSG for approval. Then it can be the responsibility of the different groups (climate, nwp, etc.) for how they do their sub-naming convention.

Mike points out that we want some nwp configuration that basically mimics the latest version of Noah-MP.

Creating regional datasets for NWP for CTSM

Mike: Coming up with the first cut at the NWP configuration is fairly easy in terms of the code. (Side-note: It would be great long-term to have a more flexible way to specify soil layers by specifying a vector of nodes, but for now he has hard-coded a layout for 4 layers.) But the bigger issue is how to create the surface datasets.

Mike points out that, for NWP, each user basically has their own grid, so surface dataset creation needs to be easy and relatively fast. This makes it more realistic to have options in mksurfdata_map rather than at runtime.

Some specific issues are:

  1. A lot of time is spent creating fields that may not be needed (hydro1k, glacier)

  2. How to specify use of only (say) the dominant PFT

We feel that (1) may be fairly easy - we could skip setting some fields (or put some hard-coded constant fields) for some options.

For (2), we could do this at runtime - having a namelist option to only take the dominant N PFTs. This could fit into the work Sam is doing to collapse crops down to non-crop runs. Alternatively, this could be done at mksurfdata_map time... though the benefits of that would be greatest if we changed the format of the file to only list the present pfts, and we're not sure if we want to go there.

Default NWP configuration

We want to set up a default NWP configuration soon. Erik thinks that doing this correctly in build-namelist could take a fair amount of work, but we might be able to come up with something quick and dirty initially, with a plan to make it more robust when we convert to python.

PGI

Is PGI important for NWP work? Mike will ask WRF and MPAS folks?

July 23, 2018

Agenda

  • Bill - In the upcoming tags project, can we just archive cards that are done rather than keeping them forever in a "Done" column?
  • Bill - Allowing namelist defaults for new dev code that differ from CLM5 (clm5.5???)
  • Erik -- Need PFT rawdata from Peter for CESM2.1. Also there is PFT data for TRENDY that I need from Peter. And would like high res PFT data.
  • Erik -- What's the latest on getting CMIP6 presaero files in cime/datm?
  • Erik -- Science workflow, cime bug so MODEL_VERSION isn't being updated in build. Need to educate people about it. One workflow would be to add an optional setting that requires a specific MODEL_VERSION for cases to run (i.e. REQUIRED_MODEL_VERSION=ctsm1.0.dev004-4-blah, it would die on the build if MODEL_VERSION is something else)
  • Erik -- FYI on Glade changes. Enabled softlink that points to p_old, will update to current standard location under /glade/cgd/tss/ just before p_old data goes away. By the way, I couldn't get old cases working this might bite us for people that have existing simulations that they want to extend. The best way is probably to do a branch from the existing one, you could set the flag to use the existing name.
  • Erik -- Go through upcoming tags. Plans for when release tags happen. Think about Fates updates. What are the priorities?
  • Bill - Things to sort out for tag that fixes neg ice runoff SH issue: (1) Is it okay to do this in a way that changes answers even for the clm5 configuration?; (2) Need to wait until CAM is ready to update refcases for cesm2.2
  • Erik -- There is a separate discussion on the SMB work, but I need to be putting significant time into it. If I make it optional, I could also bring that work to the trunk. Perhaps Bill and I should meet regarding it as well.
  • Erik -- Is Sam waiting on me, or does he have work to do? There are some svn branches I was going to do the svn part, and he would do the git part.

Archiving cards in upcoming tags project?

Erik and Dave find some value in having the "completed tags" column, so we'll keep it.

Allowing namelist defaults for new dev code that differ from CLM5

We'll talk about this at broader CTSM software and CLM science meetings.

We may want to think about syntax for having a namelist default that has one value prior to version X, and a different default with version X and later. e.g., having a ">" or "gt" in the quotes for the phys attribute in the xml file.

Science workflow

Will W has a branch with 3 commits and wanted to be able to run cases out of the same sandbox with these different commits - and possibly even a single case that he rebuilds / reruns as he makes more commits.

There seems to be a bug with saving updated provenance information when you rebuild... Erik will look into this.

Erik's point is that we should document / let people know about this reasonable workflow.

Erik points out that a gotcha with this workflow is that you might accidentally have the wrong version checked out. To catch this, he suggests an optional REQUIRED_MODEL_VERSION that is checked, probably at build and run time.

cheyenne glade directory renames

Erik: It works to run a new case from an old ctsm code base, but it doesn't work to extend an existing case.

Dave feels we don't need to worry about this too much.

June 25, 2018

Agenda

  • Bill - my open PRs
  • Bill - closing wontfix issues?
  • Bill - should we do a scientific test of the ctsm dev001 tag to ensure same climate, or wait until other changes are included here, too?
  • User's guide
  • Tasks for 2.1

Closing wontfix issues

Is there a reason to keep wontfix issues open? No, Erik is closing them.

Scientific test of ctsm dev001 tag?

Dave suggests having Keith run the automatic validation test

User's guide

"Special cases": Rename to something like "Notes on specific cases" or "use cases". This can include crops, glaciers, isotopes.

Suggestion of having some links to the auto-generated namelist definition / defaults / etc. in a very obvious place.

June 14, 2018

Agenda

  • Bill - Go over contributing guidelines, PR template and issue template

  • Matt/Bill - Water isotope update

  • Martyn - what's in the CTSM presentation next week

Reviewing contributing guidelines and templates

Pull request template

Rosie suggests having a link to information on how to run testing. Also, to avoid scaring people off, maybe say, "Testing performed, if any", to make it clear that it's okay to submit a PR without doing any testing.

Sean: Should we distinguish between bug fixes and science enhancements, because the review process might be different for those? Feeling is that we might need to evolve towards that if we start getting a lot of PRs. If we start to have a lot more open PRs, we might need a system, like using labels and/or having a keyword at the beginning (with options like: bug fix, science enhancement, documentation only, etc.).

Mike: could be worth having a space to add names of all people involved. Dave agrees. We could remove that line if it doesn't apply.

Issue template

Rosie: It would be useful to explicitly mention that you can use issues for science discussions. It might be an empty template, but we should call out "Science discussion" as an option for issues.

Support requests

Erik suggests that we point people to the forums for this, partly because that's how CESM does it.

Some of us feel that it's nice to have things all in one place, and we don't really like the UI of the forums. But if it becomes unmanageable, we can revisit this.

Dave: support request template should point people to the manual.

Should we distinguish between users and developers? maybe if needed, not for now.

Add to support request: Have you made any code modifications?

Make it clear that this is support needed for model use. "Support needed for model use."

Water isotope update

Matt is getting close to completing the separation of WaterStateType, following the design shown here: https://github.com/ESCOMP/ctsm/pull/395

Next step will be separating WaterFluxType into pieces depending on whether a given flux is needed for isotopes.

  • Bill asks if fluxes need to be broken down into finer-grained categories. Martyn says that, eventually, we may want to distinguish summary fluxes from fluxes that are actually used to update state variables. But we can do that later.

June 11, 2018

Present: Erik Kluzek, Dave Lawrence, Bill Sacks

Agenda

  • Bill - Issue template
  • Erik - ESMF not working on DAV cluster, and Use of cesm mapping tools. Use CIME configure for tools makefiles?
  • Erik - Do we want smaller tools working on hobart (mksurfdata)? mkmapdata can only fit on large memory machine like cheyenne (> 450GB of memory).
  • Erik - Fix CTSM buildnml for python 3?
  • Erik - Not ready for CLM spinup yet, but should start making it.
  • Erik - Updated README files. Since, we need to update README files and UG, I'd like UG to utilize README files for some of it. Doing this would mean to add manage_externals to UG and include the needed files from a CTSM checkout
  • Erik - CLM is all over in the code and miscellaneous files.
  • Erik - I need to start doing work on the SMB project
  • Erik - next tag will be done today.
  • Erik - when/how often do we do these meetings?
  • Erik - Code of conduct needs to go around CESM/CGD. Need to talk with Mariana about this as well.

Mapping tools on other platforms

Feeling is that support for mapping on DAV cluster is low priority, since Erik has it working on cheyenne.

Tools working on hobart?: Low priority

Using cime scripts for mapping: Good to have, though not super-high priority.

Release branch management

After dev013, we'll let master and the release branch diverge.

For now, we'll just have master and this one release branch, which is for cesm2.0 / cesm2.1, and so can't change answers. We don't see a need for a second release branch at this point.

Side-note on "Upcoming tags" project

Bill: Added an "awaiting triage" column, where issues / PRs appear automatically if you add them to this project (instead of being added to the "to do" column). The motivation was: When they were added to the "to do" column, they appeared at the top, messing with our carefully laid out ordering. This way it's obvious what still needs prioritization within the "to do" column.

May 24, 2018

Agenda

  • (Want Mariana's input) Pulling more parameters out of the code
    • Bill will give a brief overview:
      • How we deal with parameters now, via namelist: show example: soilwater_movement_method
        • SoilWaterMovementMod.F90
        • namelist_definition_clm4_5.xml
        • namelist_defaults_clm4_5.xml
        • CLMBuildNamelist.pm (add_default call)
        • Demo case:
          • Original lnd_in
          • Change via user_nl_clm
          • Rerun preview_namelists
        • Also: pft params in netcdf file (current solution isn't scalable
          • e.g., merging changes from branches)
      • What we last discussed doing
    • We could simply follow status quo... but that requires writing more ugly and error-prone namelist-reading code, which we might eventually just ditch
      • One way to make this a little less ugly and error-prone would be Ben's suggestion from a few years ago: Read the file on the master task into a buffer, broadcast that to everyone, and then let all procs read the namelist values from that buffer. So the interface to namelist reads would include a string buffer containing the namelist file contents, rather than a name of the namelist file. (This avoids the need for each module to have namelist-reading code, avoids the problem of needing to remember to broadcast each individual value, and also is better suited for unit testing.) I feel this would be worth doing even if we have just a small number of parameters left on the namelist file.
    • Or should we do some infrastructure work so that things don't need to be redone later?
      • Translating build-namelist to python - so that we don't write more perl that then needs to be translated to python
      • In the past, we talked about reading everything from a netcdf file rather than from namelist files. Should we get that infrastructure set up now, so that we don't do extra work that just needs to be redone later?
        • Introduces cime dependency on at least numpy (we could snapshot in scipy.io.netcdf). Is that okay?
        • Ideally, we would translate the perl to python before doing this, but we might be able to come up with a way to call the necessary python from perl short-term, to allow moving forward with the netcdf-based solution before translating build-namelist to python.
  • (Want Mariana's input) Removal of CLM4: can we do this on master now?
    • One reason this could be good: allows cleaning up build-namelist
  • Discuss separation of WaterStateType for water isotopes.

Pulling more parameters out of the code

Summary of discussions from 2016/2017

Early 2016: Ben Sanderson's request is that there be one way of doing things... which could mean going back to how things were, in that pft-specific parameters were in a netcdf file, and scalars in the namelist. What he does NOT want is having some scalars in namelist and some in netcdf.

FATES plan (developed late 2016) was to use xml backend, netcdf frontend. You could modify parameters via the user_nl mechanism, or you could point to your own netcdf file with all parameters. See also https://docs.google.com/document/d/105p3L6981KJxcddVBN4J330S-6ECZODrha7uQW0OpfU/edit and https://docs.google.com/document/d/1SvwrrSo9ZKymY6nhasi0hqMkidmQyl3yFHrMZpTwhhs/edit#heading=h.110phbn1yf25 and https://docs.google.com/document/d/1332XErcAB3-2TwrwGEhFTUxFKZdhVZlH3FS_F4VpxNw/edit#heading=h.i5i5dlq3axx9

Bill's questions

  • We talked about having real-valued parameters on a netcdf file and logical / integer / string-based options on the namelist file. Is that still what we want?

    • This will mean that each science module potentially needs to read both from the namelist file and the netcdf file; this doesn't seem ideal.
    • We probably can't be 100% consistent in applying this rule: for example, there may be some discrete pft-specific options on the netcdf file.
    • Would it be better to just have netcdf all the way (with just a single namelist parameter pointing to the netcdf file - which would allow pointing to your own if you want)? Ben mentioned FATES was planning to go this way March 6, 2017.
  • I don't like working with namelists, because of the need to hard-code the variable names in the namelist statement. If we stick with text file-based input for some things, would it be worth considering a different format like .cfg? (Note that CISM has a cfg file reader.)

Discussion

Rosie: What happens in FATES: Everything other than logicals is on a netcdf file. They store a cdl file in the repo, which is then turned into a netcdf file. They don't have a mechanism for having different defaults for different configurations (but could do different cdl files). They don't yet have things like different options for a parameterization.

Bill: Question on the table: Do we want the Fortran code to read parameters from netcdf, namelist, some other format like .cfg, or some mix of them?

Rosie: In FATES, there was an overwhelming desire to have all parameters on netcdf.

Sean: It's easier to read a text file than a netcdf file. Dave points out that we can put something like a ncdump of the file in your lnd_in.

  • But the ncdump still doesn't look as nice as just the lnd_in file. One issue is that it's not separated by category. Might want to have some attribute or just use a naming convention so that variables are grouped together.

Katie asks what the user interface would be for setting pft-specific parameters. We talked about a few ideas; we'll come back to this.

Acceptability of numpy? People generally feel that's okay. One issue could be NOAA operational centers - but it's probably okay for them to use an already-built netcdf file.

Summary of where we want to end up long-term

  • People agree that we should use netcdf for numerical parameters
  • And, if we're going to have numerical parameters there, then feeling is that we should have all parameters there, including integer / logical switches.
  • We want to do the rewrite of build-namelist into python first - maybe in the next 3 months-ish?

Removal of CLM4?

Mariana doesn't see any problem, but suggests contacting Dan Marsh.

Separation of WaterStateType for water isotopes

Initial comments from Martyn

It would be nice to cleanly separate state variables, diagnostic variables, fluxes, parameters etc. Based on an initial scan the separation that you have is not that clean.

What is the path forward: (1) Do you intend to cleanly separate state variables, diagnostic variables, fluxes, parameters etc. into different types? (2) Do you intend to have an isotope type? If you are this specific, do you plan to define isotopes as a separate type or by extending more general types?

Bill's initial response (pre-meeting)

I agree with Martyn to some extent. Ideally, I'd like a clean separation of the fundamental state variables. Long-term: For other variables, I see a lot of value in having fluxes and other auxiliary variables live in the module responsible for computing those variables, to the extent possible. In the case of summary variables (e.g., sum over the column, or variables just in the top layer), I could see introducing a new type like WaterStateSummaryType.

I also agree that this initial split doesn't feel as clean as it could be. I started with the hope that a clean separation would align with the delineation between variables for which we need an isotope version and variables for which we do not. I figured that, pragmatically, this is what we need right now, and we could take additional steps later. But if Martyn or others would like, I'd be happy to take some time to sit down with you and try to separate variables out along the two axes of (a) what needs to be replicated for isotopes and other tracers vs. what does not, and (b) states vs. diagnostics, etc. We could use that to try come up with a set of types that feels right.

I'm not sure I understand (2), so will get more clarification on that question.

Discussion

Mariana suggests that we have a high-level WaterType, which just contains some other types. Point is: pull related types together to make the code more understable. Bill agrees.

Martyn suggests delineating states vs. diagnostic variables. Bill agrees. So there will be states that apply to all tracers, diagnostics that apply to all tracers, states that don't and diagnostics that don't.

May 17, 2018

Agenda

  • Bill - We may want to have a release branch for cesm2.0, since we'll need to support that code long-term. -- Erik -- I'd like to start with just release-clm5.0 if there's a need to diverge we can do that.
  • Bill - With Matt R, have started on the work needed for water isotopes
  • Bill - Plans for master diverging from release branch?
    • Soon I'd like to bring unified_land_model to master.
    • At the point when they diverge, will we change the tag naming to ctsm tags? What should the version number start with?
  • Bill - meeting times moving forward?
  • Erik -- ChangeLog for release branch? Part of master ChangeLog? New file?
  • Erik -- need a cleanup tag that updates to externals from beta10, clean's up any small issues.
  • Erik -- time-stamps on files as per Dave in CLM science meeting?
  • Erik -- Can I have an "I told you so moment"? ;-) The issue with orbit is really that we haven't been testing the variable year option. I brought this up as an issue, but we couldn't spend time on it. Fulfilling the maxim "if you don't test it -- it's broken" we ran into it as an issue (See #19, #260, and cime issues https://github.com/ESMCI/cime/issues/2044, https://github.com/ESMCI/cime/issues/2082)
  • Erik -- VIC memory issue? VIC isn't running at f09, I think it needs more memory (#384).
  • Erik -- #371? #316? #312? #276? #268? #262? #162? #13?

Tag plans

Erik will do the one cleanup tag, and then Bill will bring unified_land_model to master. At that point we'll start using 'ctsm' tags: ctsm1.0.dev001.

Want to add something in the ChangeLog at that point like:

Does this significantly change the science of:
- CLM4.5? No
- CLM5.0? No

Note that the isotope-enabled version needs to have similar science to the cesm2.0 version. Feeling is: For now, we'll bring the isotope-necessary stuff to master. If we find that we want to make a big science change on master, we can consider what to do at that point: either have diverging branches or maintain the old science via an option.

Removal of clm4?

Erik points out that this would be helpful, so that phys options become runtime rather than build-time. Let's talk to Mariana about this.

Meetings moving forward

For the next few months, let's keep separate meetings for the basic logistics (Monday PM meetings) vs. bigger-picture software meetings (Thurs AM meetings). We could reduce the Monday meetings to something like biweekly.

April 30, 2018

Agenda

  • Bill - Planning to move to isotopes, even though soil hydrology is only partially completed.
  • Dave - Repository for planning documents and meeting presentations?
  • Bill - New initial conditions files for release? (https://github.com/ESCOMP/ctsm/issues/312)

Repository for planning documents and meeting presentations

Can we use git large file storage (LFS)? Martyn tried it about a year ago and found it wasn't as good as it could be.

Feeling is: Let's put these on google drive with links from the wiki.

We should separate software vs. science meetings. In the science meetings, put links to presentations that are available.

We'll also have wiki pages for reports.

cheyenne mpt issues

The previous version of mpt had an issue where it would sometimes change a bit.

However, the new version (16) can abort before the run starts. Jim just put in a workaround that restarts the run if it detects that that's what happened.

April 23, 2018 -- Software meeting

  • EBK -- Some FATES tests are failing for the dev007 update. I think we could use updated finidat files for the FATES science update.
  • EBK -- Cheyenne strangeness. Is there anything we can do about it?
  • EBK -- PE layouts do we want to optimize for SP vs BGC-Crop+Ciso?
  • EBK -- What configurations (compset+res) will be scientifically supported for CESM2.0? Meaning will have simulations that go along with it? We did simulations before -- do they need to be updated again?
  • EBK -- Plan to update manage_externals.
  • EBK -- Need to have a telecon with FATES developers about use of manage_externals
  • Bill - tag status / ordering

PE layouts

Dave: cost is probably more important now than speed.

Would be useful to see plots of speed vs. cost for different PE layouts.

Bill suggests:

  • Hypothesis: Having LND run be slightly less than ATM run is "optimal". (He doesn't necessarily believe that hypothesis, but that's the starting point.)
  • Test that hypothesis. e.g., one possibility is that optimality is actually achieved with LND run being half the time of ATM run, due to the uneven nature of atm run time.

Idea: Get an optimal ratio of "LND run" to "ATM run", figured out for one resolution and configuration. Hope is that we can then do a quick cut at other resolutions and configurations by trying to get about the same ratio.

Dave: Should really try to get in the changes to speed up transient runs, since that will be a big bang for the buck.

Scientific support for CESM2.0

Dave: feels we don't need to redo the simulations, but call the same things as before scientifically supported.

Want to add science_support flag for the other configurations in Keith's matrix. Also want to say that transient are scientifically supported.

Dave feels this science_support flag isn't as relevant for land-only cases, though - it really is more important for fully-coupled cases where things could be out of balance, etc.

What does scientifically supported mean? It means that you've run a simulation and it's giving the science you'd expect - i.e., confirmed that it's not doing something crazy.

Feeling is that we should list support for f09 and f19, but not coarser resolution. Since the land model is resolution independent, we expect other resolutions to be fine, too, but we'll just list f09 and f19 to be safe because that's what we've looked at. Erik notes that we did the simulations at g16, but to avoid confusion we'll list g17.

April 16, 2018 -- Software meeting

Agenda

  • Bill - https://github.com/ESCOMP/ctsm/issues/340
    • Any objections to my making this change?
    • Confirm logical to use to determine first time step, but not trigger on a branch or continue run
  • Bill - Meaning of "known bugs introduced in this tag" in ChangeLog. I interpret this to mean, "bugs that were newly-introduced in this tag", whereas Erik seems to use this to list any bugs that have been opened recently.
  • EBK - cime and cism update change answers for clm tags (cime changes because of orbit)
  • EBK - Order of upcoming tags?
  • EBK - Failing test for radtemp branch, floating point exception. I think it's due to something being set to NaN that shouldn't be.
  • EBK - NOTE: cesm requirement of model_doi_url being added to history files, creating ctsm specific env_archive.xml in our cime_config.
  • EBK - Should we make a release tag based on current point of master? Or wait? Should the release branch be what we give cesm2.0? Or separate branch for it? I'd prefer it to be the same.
  • EBK - FYI. meeting after this to go over Wednesday git tutorial
  • EBK - cesm recognizes that we will have an update for initial conditions. I think that might be done as a change to cesm/cime_config as a ref case? If so CLM's default would be out of sync with it.

"Known bugs introduced in this tag"

Erik will edit the ChangeLog template to separate this out into two things:

  1. Bugs that were truly just introduced in this tag

  2. Important bugs that were newly discovered since the last tag - even if the bugs pre-date the given tag. (To find this: look for issues that have been opened since last tag.)

Release planning

Erik's plan: Once we have beta10, we'll move the release branch pointer to point to the version in beta10 and make a new release branch tag.

Question: when and who do we notify of release updates like this?

Bill: This discussion makes me think that we should direct people to checkout a specific tag rather than the head of the release branch. This way we won't continuously run into the problem of, "we've changed the release branch; who do we need to notify?". Instead, people will have to be deliberate about what tag to check out; this requires more work on their part, but ensures that they'll get the same version of the code with each clone they make, if that's what they want. (If we direct them to checkout the release branch, then the risk is that they'll get different versions each time they clone, when what they really want is the same version each time.)

  • Dave and Erik tentatively agree, but feel we should ruminate on this a bit.

Tutorial

  • No recording?

  • Lots of remote participants

    • Use ReadyTalk chat?

April 5th, 2018 -- CLM Software meeting

Agenda

  • EBK -- Talk about FATES workflow...
  • EBK -- Dave Hart and the new storage paradigm. Plan for talking with CLM folks, CESM, CGD?
  • EBK -- I just got a status update on svn tags...
  • EBK -- I did the manage externals update for CTSM master, Ben any other tricks we need to know about manage_externals?
  • EBK -- the testing for dev004 is done, a few things to finish up with it.
  • Bill - okay to bring my init_interp fixes to master now, given that they'll change answers (slightly) in cases that use init_interp?
  • Bill - Plans for https://github.com/ESCOMP/ctsm/pull/331
    • Need new landuse_timeseries files
    • Need CLM code
    • Should I do this? At what priority?

Updating version of manage_externals

Key point: Pull in a branch with the actual commit itself that you want - do NOT pull in the annotated tag.

PR 331 (Peter's)

Erik will take the lead on this. Bill will do the necessary updates to CTSM in a follow-on tag.

March 23, 2018 - Software meeting

Agenda

  • EBK -- FATES process
  • EBK -- Status of cheyenne and our tags?
  • EBK -- Anything from yesterdays meeting to discuss here?
  • EBK -- Was able to add ucar addresses to ctsm-dev, I have trouble with non-ucar addresses. Contacted CISL, they can add, suggested I use a different browser and/or the direct add option.
  • Bill - status
  • ben - how to incorporate ctsm into cam
  • Bill - manage_externals

How to incorporate CTSM into CAM

For people who want to do coupled runs, which do we recommend?

  1. Adding CAM and CICE to the externals file in a CTSM checkout

    • Advantage: easier for development

    • Disadvantage: may run into external dependency problems (difficulty finding a working set)

    • Disadvantage: need to get your paths just right

  2. Start with a CESM or CAM checkout, replacing clm/ctsm with your branch

We lean towards (2) - probably easier on balance

manage_externals

Ben has some patches. Up to components to pull them in when you want them.

Erik points out that we can pull this into CTSM master at any point via a PR. Ben will issue a PR once 1.0.2 is ready.

FATES process

Tentative plan: Have a long-lived branch called fates-next-api. Changes to the api can go into there, and that branch would periodically be merged into master.

Point of having a long-lived branch is because there are multiple developers and having a stably-named branch can help people know where to look.

Ben points out that there would be some advantages to their having this in their own fork, so they'd have more control (over who has write access, etc.). But he's okay with the tentative plan if others want to go that way.

Bill doesn't want a proliferation of long-lived branches in the main repo, but is okay with the tentative plan since FATES is a large enough project.

Robustness of setting up single-point cases

Rosie raised the question of how we can ensure that setting up single-point simulations remains working and robust.

We have PTCLM, but some problems with it are:

  • Feels overly complex
  • Periodically breaks

Many people have their own scripts to do this. We want to get some people together to determine the use cases and requirements, and then evaluate how to proceed. Some possibilities are:

  • Improve PTCLM and make sure it really keeps working moving forward
  • Ditch PTCLM. Start with our favorite of someone else's scripts and make sure that's tested and maintained moving forward.

Yesterday, Ned raised a somewhat related point: desire for more robust SCAM capabilities. e.g., can't do restarts... probably not a big fundamental problem, but need to prioritize getting it working.

March 22, 2018 - weekly meeting

Agenda

  • Bill - More debriefing of LIS discussion (Mariana wants to be present for this)

    • Mariana's questions:
      • What exactly is LIS?
      • Would getting CTSM working in WRF address LIS's requirements for atmosphere-CTSM coupling?
      • NOTE: We did not address Mariana's questions at today's meeting
  • Martyn - Reporting requirements

  • Bill - Protocol for reviewing PRs

Protocol for reviewing PRs

Can add one or more reviewers to the PR.

If you are a reviewer, you can:

  • Add a review. In the end, you should either "approve" or "request changes" on the PR. Then you can see if a PR is fully approved by ensuring that there's a green check by all reviewers' names.
  • Remove yourself as a reviewer, if you're happy to defer to the other reviewers.

Review board?

Mike asks if there is a review board that vets things that come in. Dave answers that CLM hasn't had something formal, though we've done this informally. Authors need to be invested enough to stick with us through the process of getting their developments working globally, meet our software standards, and as we get all tests passing.

March 15, 2018

Agenda

  • wjs - Brief update (stuck on some baseline failures)

Sean's timing studies

Canopy iterations

Sean changed the number of canopy iterations from 40 to 2. This gave close to a factor of 2 difference in the timing of canflux / can_iter. Note that most of the time spent there is in photosynthesis.

There was some unexplained variability here. These runs were on hobart. One hypothesis is that the runs executed on different nodes which might have different hardware. Or there could be day-to-day variability. Or Martyn suggests that a science change in one place could make other code take longer. One idea to deal with this is to run a few ensemble members for each experiment. Another idea is to ensure that the two runs use the same hobart nodes.

It might be worth trying the timing without PHS.

Sean is doing some work on heat capacity that should help with this.

It could be worth looking at maps of how many iterations it takes in various places, at various times of day.

Nested calls in setting up matrices

He also tried undoing the deeply nested calls in setting up some matrices for SoilTemperature(?). This improved the timing of that particular part of the code by about 1/3.

Maybe we can revert this back to something intermediate between what was there before and the current code - to maintain some modularity, but not the over-modularity of the current code.

Loops over all soil layers

A lot of the pieces of code that loop over all soil layers have similarly large costs. This might be worth investigating further to determine if we can identify some common things that can be done to improve performance of these loops over soil layers.

Integrating a network-based routing model

One big decision for bringing this into CESM: Do we want to (1) improve the modularization of the current MOSART code so that we can pull the new routing model into this code base as an alternative (reusing existing code for decomposition and other infrastructure), or (2) bring in the new model separately - which may involve copying some of the infrastructure code from MOSART/RTM (which currently have a lot of copied code between the two of them).

At a high-level, three steps for integration with CESM:

  1. Get ESMF tools working with the polygon-based schemes (Joe Hamman is working on this)

  2. Better understand similarities and differences between MOSART/RTM and the network-based models

  3. Determine what can/should be done by the MOSART code, and what should be done by the new code. May involve some refactoring of the MOSART code to allow slotting in an alternative model.

In parallel, there will be work to generalize the current US-applicable models to global.

Once those are done, we can work on the lake / reservoir piece, and how the coupling happens between the river model and the rest of the system in that respect.

March 12, 2018 - Software Meeting

Agenda

  • wjs - Inputdata currently requires authentication (so requires registering for CESM release access). Are we okay with this, or would we like to request that this be changed to allow anonymous read access? (Mark confirmed he'd be okay with the latter if we want it.)
  • ebk -- what do we need to do about cmip6? We had some discussion of this at the mornings CSEG meeting. Mariana was going to explain something to us.
  • wjs - Erik: status of testing / baseline generation for 003 tag? If not complete, what should I do when the machine comes back up?
  • wjs - Tutorial:
    • We're tentatively structuring it as git basics - not assuming much prior knowledge. If we want to cover testing, too, that won't let us get much beyond git basics. Is this a problem, given that many people may have already figured out the basics by then? Alternatives would be to cut testing, or assume more background knowledge so that we could skip or breeze through the basics, covering some slightly more advanced git things.
    • Brief agenda for tutorial
    • What we'll ask people to do ahead of time
      • Have a github account
      • Do basic git setup on their machine (username & email, at the very least) - see recommended git setup document
  • wjs - testing on my reduce-allocation branch is mostly good, except I'm getting some failures due to running init_interp. An LII test that I think should be passing is failing.
  • ebk -- Next weeks meeting? I'll be in Arizona, but probably could call in.
  • ebk -- clm-dev email list. New world for spam, and openness. Still email clm tags to it?
  • potential bug in quadratic

CMIP6

Dave feels that we're going to want to keep most / all of the current default variable list.

Sheri will send out what her script is producing for historical and preindustrial simulations - i.e., which output variables are needed, and the implications for data volume.

Keith has put together a user_nl with what we think we need for all daily, hourly, etc.

Re-testing 003

Bill will redo testing for 003 once cheyenne is back up (hobart testing is done). Expect tests to pass, baseline comparisons to fail relative to 002.

Tutorial

Dave: Don't dwell on the basics. But Erik thinks there's still a fair amount of confusion - especially around the distributed nature. Maybe going through some of this at a conceptual level (pictures?) would help.

Going through an exercise that involves managing remotes would be helpful. Also, an example where two people need to collaborate on a branch. This is something a lot of people need and don't know how to do.

  • Include how to do this in github (including adding collaborators)

Comparison / contrast with subversion could actually be useful

Let's just let ourselves stretch to 3 hours.

Also include recommended workflow for doing science with this - moving away from using SourceMods. Ben, Erik and Bill should meet with Brian to discuss this.

Include conflicts - at least "don't panic" (and maybe how to get yourself out of this).

Still try to keep some time on testing workflow - e.g., clm_short.

Inputdata

Mariana feels we can make this totally open for read.

It would help for CTSM to have inputdata totally open.

Bill will clear this with Gokhan.

clm-dev email list

Will make it so that only a few people can post, to avoid spam.

Dave: it would be nice if emails to LMWG also went to clm-dev. Is there a way to set this up? We wonder if it's possible to add clm-dev as a member of the LMWG mailing list. Otherwise, may need to add this as a recipient whenever emails go out to LMWG.

We should rename clm-dev to ctsm-dev.

March 5, 2018 - Software Meeting

Agenda

  • DML - user_mods CMIP6 behavior by default.
  • bja - discussion of Erik's email regarding remotes, naming, manage_externals, etc.
  • EBK - User's Guide. Working on some global changes. Some sections can be removed now since the cime documentation is better. For the root of the CLM directory plan to use an env variable $CLM_ROOT_DIR to illustrate paths, since there are two allowed directories now (. and components/clm). So I'll have a section at the top talking about the two locations for $CLM_ROOT_DIR. We still need the tools and single-point chapters. And special cases (such as spinup) and examples are important. I think talking about the namelist xml files is important as well. As a user the most important sections are the quick-start, what's new since last version and the examples. For a new user talking about creating cases is important, but I think this section can be trimmed because the cime documentation is better. But, it's still good to have specific CLM examples. I think I can remove most or all of the troubleshooting section and point to cime's chapter. Since CO2 is now a namelist item, it's example can be removed. In terms of things that people do that need documentation to point to, setting up single-point cases with tower data, and how to spinup are the two most important that they need help with. And if we don't document it, we have to email them what to do rather than just point to a web-link.
  • EBK -- cheyenne shepherd problem still killing jobs and even in the middle of simulations. What should we do about it? I do have a ticket in with cisl for the problem I've been seeing with the FATES spinup. Apparantly, Cecile says using mpt2.15 (rather than mpt2.16) fixes it for her. CLM is using mpt2.15f right now, I haven't tried vanilla mpt2.15.
  • wjs - Considering interpolating all out-of-the-box initial conditions files. Won't do this if there are plans to redo all initial conditions soon anyway.
  • wjs - Should my performance changes come onto the clm5.0 release branch? Note that this will force everyone to interpolate their initial conditions. If so: Is the process right now to bring it to master, then at some point we'll bring things from the master branch to the release branch? (I forget: is the intention that the clm5.0 release branch will be used for cesm2, or that we'll be at a new clm5.1 release branch by then?)
  • wjs - cmip6 usermods discussion at cseg meeting

CMIP6 output

From discussion at cseg meeting: We'll move ahead with usermods directories. Not clear whether we'll (1) define a few usermods that are used for all cmip6 runs, or (2) have user_nl auto-generated separately for every experiment. CSEG needs to work this out more.

Dave: it seems like we may need different sets of output for different periods of runs - e.g., need high-freq output for the last few decades of the historical run.

Initial conditions associated with decreased memory use

Probably go ahead and regenerate initial conditions files. Should be able to use the file name to determine the configuration to run to interpolate each file. (But see email from Keith.)

What to do about FATES tests that use initial conditions? For now, maybe just set run_zero_weight_urban in the testmods directories? Actually: check with Ryan to see what's needed to regenerate initial conditions that could be used for testing.

Should decreased memory use be in clm5.0 or clm5.1?

Some other things we have: bug fixes for energy balance, N bug fix, new N dep... need to determine which of those will come to the release branch.

For now, I'll bring my change to master.

git documentation

What should we recommend remotes be named? Bill and Ben both prefer naming remotes explicitly, like "escomp", "billsacks", etc. - rather than generic "origin", "upstream".

Erik: part of the motivation here is consistency with the FATES workflow.

Rosie: How should we include FATES workflow pieces? Bill suggests having a chapter 3 in https://github.com/ESCOMP/ctsm/wiki/Getting-started-with-CTSM-in-git that is "Working with FATES in CTSM". Erik points out that the same workflow applies to any external (MOSART, etc.), but FATES has the largest user community.

March 1, 2018

Present: Martyn Clark, Mike Barlage, Sean Swenson, Bill Sacks

Mike's redone timing runs

30-km CONUS

72 proc (might try scaling back to 36, because Noah-MP might saturate beyond 36).

What number should we focus on for total? Probably clm_run. This includes history output, which eventually we could probably reduce substantially - so we'll keep that in mind when looking at the results.

Note that the control run has special landunits, whereas others do not. Sean suggests an intermediate run that uses all pfts but no special landunits.

Just grass + bare compared with control: 55% reduction in time.

Also memory reduction: further 60% reduction.

Also going to 4 soil layers: further 22% reduction. About a 70% reduction in hydro_without_drainage and soiltemperature.

Combining all these, relative to control: 86% reduction compared with control. So now we're maybe at about 2x Noah-MP cost - but Mike wants to redo the Noah-MP timings.

Changing number of CanopyFlux iterations from 40 to 20 (consistent with Noah-MP), and also changing another number of iterations from 40 to 3 (consistent with Noah-MP): little effect - though maybe would get more effect if you had trees rather than just grasses.

With all of these combined, other than the iteration changes: big culprit is canflux (nearly half of the time), second most bgflux (bare ground). Together those two account for nearly 75% of the time.

Note: To get more detailed timing information, probably need to change TIMER_DETAIL and maybe TIMER_LEVEL env_run.xml variables.

February 28, 2018 - Software meeting

Agenda

  • EBK - CPLHIST cases. I added a test case, but it can only run on cheyenne. Do we want to be able to run it elsewhere?
  • EBK -- When moving over from svn do we rebase or merge?
  • EBK - Process for mosart and rtm tags? I just created two for each. Did a PR, ran testing. Didn't ask for review since changes small enough. I think we should in general ask for review. Also, my first few tags have been lightweight tags, should I redo as annotated? Is it possible to make annotated the default in git config?
  • EBK -- Surfdata for new conus and physgrid options for CAM? Will meet with Colin Z. in a few weeks.
  • wjs - git tutorial?
  • bja - Keith's documentation question for mosart

cplhist cases

Currently, the path to the data is hard-coded in the test xmlchange command, so we can only run this test on cheyenne.

Currently this is just a problem for testing. But Erik wonders if this is a problem for users.

If we want to support this out-of-the-box, want to put it in inputdata, at which point we can use $DIN_LOC_ROOT. For now, we won't worry about this.

Migrating branches from svn

Should we do a rebase or merge when bringing things over from svn and then bringing things up to date with latest master?

Ben: If you're just bringing over the latest (r272), doesn't matter. If you're pulling over the whole history, do a merge.

Process for mosart and rtm

Should we always get a PR review? Feeling is: get one if you feel it's warranted, but not required.

Tagging: Use annotated tags moving forward, but don't go back and change previous tags

MOSART and RTM documentation

Let's delete the MOSART and RTM repos in NCAR, and for MOSART just put in links pointing to the appropriate chapter in the CTSM documentation.

For RTM, could point to the CLM45 tech note.

git tutorial

Let's include some workflow and best practice-related things like:

  • Whether to do cleanup before submitting PR
  • Create branches rather than SourceMods

We do need some basics (like how to create a branch).

Would like this to be hands-on: actually create a branch, commit it back, etc.

We might want to split this as:

  • Brian gives a hands-on general git tutorial (but targeted to CTSM-specific workflows)
  • We give some CTSM-specific workflow things

Remote participants? We should probably allow it, but we won't be able to provide support very easily.

Probably reserve a 3-hour block, but maybe aim for 2-ish hours.

Could also be helpful to include a bit on the testing workflow, too: e.g., create an SMS_D test and verify that it passes. Point is: this is really part of the process.

  • Creating a failing test could be nice, too.

We can probably assume that people know basics of CESM, though.

February 21, 2018 - Software meeting

Agenda

  • bja - push converted branches to escomp/ctsm or personal repo? If ctsm, are we deleting when pulled by devs?
  • DML - Inevitable question at LMWG about CLM-WRF. Discuss timeline for LILAC? - bja - discussed with DML and MV 'end of year'.

Where should migrated branches go?

Ben will put them in his fork, then have people move it to their own fork

February 15, 2018

Agenda

Mike's "fair performance comparison"

In order to do a more fair comparison between Noah-MP and CTSM, Mike set up CTSM runs with just one PFT and one bare. Initial results were a 10x difference. By only allocating memory in CTSM where needed, we can get down to 4x. It looks like we have some more low-hanging fruits that could bring CTSM more in line with Noah-MP.

What should we do about memory allocation

The easy first thing would be to make it so that we only allocate memory for the necessary PFTs in non-transient runs.

Developments on the Noah-MP side

The National Water Model has two SEs tasked with modularizing it. They're looking at it top-down.

Mike and Bill talked about what approach we could take that might make it easier to transition from Noah-MP to CTSM in the future.

February 12, 2018 - Software Meeting

Agenda

  • EBK -- KnownBugs and KnownLimitation files, what should be done with them? How do people figure out which important issues affect a given tag they are using?
  • EBK -- I think we should do a tag with an update to beta08/beta09 with a cime branch.
  • EBK -- for doing my tag I pushed changes for the ChangeLog directly to ESCOMP/master. I'm not sure that's a good thing.
  • EBK -- Current priorities for tags, README, UG, SSP timeseries etc.?
  • EBK -- Issue #262 hirespft option?, #258 code of conduct, #249 data assim
  • EBK -- Branches to move over to git?
  • wjs - Do we still need to maintain a list of changed files in the ChangeLog?
    • With git, it's easier to get a definitive version of this than it was with svn (using git diff and git log --first-parent filename).
    • I'm fine keeping it if people really want it, but I wonder whether it will be useful enough moving forward to be worth the effort it takes to maintain it.
    • ebk -- there were two parts to this that I found useful. The first part is that documenting the changes per file caused me to do my own code review, and that step has shown to be vital. git's PR mechanism does a better job of this and makes it public to everyone. The second part though is looking in the ChangeLog for when a specific file was changed. Having it in the ChangeLog allowed you to do it without network access. Since git is distributed we may not need that mechanism anymore.
    • ebk -- in my latest tag I replaced that section with a listing of the PR's.
  • EBK -- exact planning for CLM release sequence? Do we give tagnames to release versions -- even if identical to non-release? Since, tags are different in git than in svn, I'm not sure we really want to make a release tag for each tag on master.
  • wjs - Erik, I see you started making all tags into github "pre-releases". What's the advantage of doing this? (I'm not saying we shouldn't do it; I just want to understand that it's serving some worthwhile purpose before we add that to our tag-making process.)
    • ebk -- I wanted to document at the top the versions that actually work in git. And figured the "pre-release" tag was a good indicator that this is a tag not to use. I also added notes about the last tag that doesn't work in git, and called it a prerelease. I think a good reason for these notes is to document problems that you find out after a tag is made (a DO-NOT-USE note kind of thing). That's one thing I tried to put in the notes about the tag that required CIME_MODEL to be set for example. In terms of viewing things on github, these notes are useful in that they are what you see in the web-site. The main reason I did it now, is as I said to document the versions that people can't use or the pre-release versions before the version that has everything we want in place. In the long run, we may want to ONLY do this for the tags that go on the release branch.

KnownBugs and KnownLimitations

Erik has tried to keep these up-to-date in releases.

Maintaining a document is problematic because it's a static document that's hard to keep up to date.

However, there isn't a good way from the issues page to see what issues apply to a given release version.

Dave feels it isn't worth maintaining this list. We can point people to the issues page and let them search as needed.

Pushing directly to master

Feeling is it's okay to push things like ChangeLog updates directly to master.

Tracking upcoming tags

How to track upcoming tags with github projects? In particular, how to include tags that include various issues?

(See also notes from a couple of weeks ago.)

Ben's idea: We can open a place-holder PR linking to the various issues that will be addressed.

Documenting specific changes in ChangeLog

Feeling is we don't need to list specific files changed. But let's include a list of PRs in each tag.

Make sure to have good commit messages.

Should we be tagging everything on the release branch?

The only thing that should go onto the release branch is merges

Every merge commit on the release branch should be tagged (often this may include a few changes together).

Feeling is that we should create a release tag tagging what's there right now: release-clm5.0.00.

We might have multiple tags on master before making the next release tag. e.g., we could get up to clm5.0.003 and then that would be equivalent to release-clm5.0.01.

Do we want some other way to distinguish between release and dev tags, to avoid confusion - since the numbers won't align?

We'll plan to have releases tagged like release-clm5.0.01, and dev tags like clm5.0.dev001.

Labeling tags on github: Plan is to label all tags on release branches as a release. We won't do anything for tags on master: leave them as just plain tags without release notes. (So we won't use pre-release for them moving forward.)

January 29, 2018 - Software Meeting

Agenda

  • EBK - can reseed_dead_plants, spinup_state, reset_snow, reset_snow_glc, reset_snow_glc_ela, be set or changed on a branch? Should these only be allowed for startup or hybrid?
  • EBK Need to make a clm branch for cesm2_0_beta08. The problem is that config_compsets.xml is specifying a hybrid startup for B cases, but changes in CLM mean these files need to be interpolated rather than used without interpolation. So either config_compsets need to change, or CLM has to figure out that these files need to be interpolated.
  • EBK -- upcoming priority after clm branch tag? UG, new-IC files, bugs-list?
  • EBK -- changes coming down from CLM teamwork page? What do we do? Move into ctsm github issues?
  • EBK -- FATES IC files (for both clm45 and clm50?) Do you want IC files for users or should users spinup?
  • DML -- Can we use the "LND_TUNING_MODE" feature in env_run.xml to handle the nitrogen deposition for coupled runs -- ebk, we didn't want to move to 1850 CMIP6 ndep until we have both 1850, and hist ndep versus offline runs

github projects

This could be a useful thing for some of our needs.

How to do tag planning? One idea is to have a project like "upcoming tags". Then could have notes for each upcoming tag.

How to plan which issues go in a tag? We could do that with projects, too: Have a project for each tag, with the issues that will be included in that tag. Then the above notes could point to the appropriate project as a link.

reseed_dead_plants on a branch run?

It causes issues to set reseed_dead_plants on a branch run - since it violates the rule that answers shouldn't change in a branch run if your namelist hasn't changed. Would someone ever want to do reseed_dead_plants in a branch run?

  • Dave feels that this isn't really needed
  • Erik will make it so this doesn't operate on a branch run, maybe throwing an error in that case.
    • Mariana: You're not allowed to throw an error in this case: you need to be able to kick off a branch run with the same namelist as the initial run. So write out a warning that says we're ignoring this.

The reset_snow options already don't do anything on a branch.

What about spinup_state? Erik will talk to Keith about that. Dave is less inclined to stop allowing a branch run in that case.

Branch needed for cesm2_0_beta08

Right now, the hybrid case pointed to for B1850 has incompatible initial conditions. The logic in CLM says to have use_init_interp false for hybrid runs.

Let's make a refcase with a new initial conditions file. Can take an initial conditions file from one of Cecile's recent cases (e.g. finidat used for /glade/p/cesmdata/cseg/runs/cesm2_0/b.e20.B1850.f09_g17.pi_control.all.266)

Fix for branch cases with reseed_dead_plants less critical.

January 25, 2018

Present: Dave Lawrence, Martyn Clark, Sean Swenson, Mike Barlage, Bill Sacks

Agenda

  • Martyn - Update on paper

  • wjs - Project board for hillslope hydrology: https://github.com/ESCOMP/ctsm/projects/3

    • I don't particularly like github projects for this purpose:
      • Hard to see an overview, since all of the note is shown at once, rather than just the title
      • Limit on number of characters in notes
      • Seems better for tracking and organizing existing issues
    • If we want projects organized around issues, there are various value-added services like ZenHub (free), maybe Zube (if pricing is low or free... it's possible this allows more than just github issues), and others
    • Maybe just use individual issues for this purpose?: https://github.com/ESCOMP/ctsm/issues/222
  • wjs - Update on hillslope integration

  • Mike - timing tests

Update on paper

Martyn talked with Paul Dirmeyer about the possibility of having a commentary in JAMES. He thought that could be great.

Balance between not wanting to put something out too early vs. getting something out that people can reference.

This could be part of the CESM special issue

Should we include details on what will be included - e.g., which parameterizations, and from where? This could be a lot of details... maybe include in appendix? General feeling is to keep this higher-level.

github project boards

Feeling is that, for now, let's try using individual issues for science enhancements. If that gets cumbersome, we can explore breaking things into separate issues (with projects) or some other solution

hillslope hydrology integration

Martyn asks if we can reduce all the argument passing. e.g., inline some things, or introduce overall structures holding a bunch of the common arguments.

Bill sees pros & cons, but is open to this. If we want to package various things together, we could introduce a new locally-defined type for this, or just package them into the lateral_outflow ("this") object.

  • Martyn asked if this can be done with 'associate'. Bill doesn't think so.

  • Could copy data around, though there's performance overhead with that, and risk that you'd forget to do the copy. This risk is especially great for output arguments that you need to copy out (but we don't have many or any of those right now).

  • Could have pointers to the appropriate data, though pointers can have performance issues

Bill likes Martyn's idea of having some packages that contain commonly-grouped data - e.g., combining bounds, col, grc, etc. into one higher-level structure; maybe making a higher-level hydrology structure; etc.

January 24, 2018 - Software Meeting

Present: Dave Lawrence, Erik Kluzek, Ben Andre, Bill Sacks

Agenda

  • WJS - final decision on branch names
  • DML - We will need aerosol deposition fields from coupler history output from fully coupled transient runs (historical and future)
  • EBK - We have the CLM team on NCAR/CLM, should we move them over to a team on CTSM? How should we handle teams and collaborators for CTSM?
    • WJS: it looks like people at least need to be members of the ESCOMP organization to be assigned to issues, so I think that anyone who is a potential issue assignee should be on a team with at least read permission. Based on https://help.github.com/articles/repository-permission-levels-for-an-organization/ it looks like people need write permission to have access to some conveniences around issues and PRs - though that's more a convenience than a necessity. There's a trade-off between that convenience and the risk that someone will accidentally mess something up (e.g., delete or overwrite a tag). The key conveniences that come with write access (in addition to being able to write to the repo) are:
      • Request PR review
      • Close, reopen and assign issues
      • Apply labels and milestones
      • Create project boards
    • WJS: I think we want at least two groups:
      • Admin group
      • Larger group of core CTSM developers, with either read or write permission
  • WJS: if we expand write permissions (and maybe even if we don't), I'm thinking we should ask Mark to expand the backup strategy to keep a few years of backups - in case (for example) some old tag gets overwritten and we don't notice it for a while
    • I also was wondering if we should have some optional mechanism in manage_externals that allows you to list the SHA-1 along with the tag. (This would be used only when doing a checkout: after checking out the tag, it would check the checked-out SHA-1; if they don't match, it would abort with an error.) But I'm leaning towards thinking that this is overly paranoid of me....
  • EBK - what are the requirements for us to say "the release is done"? Testing, list of machines tested on, bugs fixed, cleanup tasks to do, UG, Tech note, list of things to do, who does each of these? What is the name going to be for the release? -- from last week there are only a few requirements as well as being before the LMWG meeting.
  • EBK -- Are we still putting planning on trello? Should we archive meeting notes?
  • bja - also need to update copyright date in license file for rtm, mosart, ptclm
  • EBK -- SSP3-7 datasets for clm50? Peter made, priority for general ability to do this?
  • bja - git branch conversion upon request....
  • bja - still not clear to me what is desired for #215

Final decision on branch names

Main argument for "master": it follows the standard git convention, so appears in git tutorials, documentation, etc.

Main argument for "develop": it makes it more explicit that this is for development.

Dave points out there are other reasons for "master": it's consistent with FATES; also, it avoids naming confusion with individuals' development branches.

Final decision: "master".

What about a long-lived production branch? If we have one, we'll call it "production". Update: we changed our mind on this: see below.

It's not clear if there's strong benefit to having a long-lived production branch (that outweighs the complexity). We'll come back to this if/when needed.

For now we can create a release branch: release-clm5.0

Then we can instruct people to do:

git clone -b release-clm5.0 https://github.com/ESCOMP/ctsm.git
cd ctsm
./manage_externals/checkout_externals

Dave points out: It could be confusing to have "production" and "release". Feeling is: let's use "release" for everything, not have anything labeled "production".

Summary: Development will happen on "master". We may or may not have a single, long-lived "release" branch, from which all releases are made (either directly on the "release" branch or from branches off of the "release" branch). If we don't have a single "release"branch, then we'd still have individual release branches for each release; they would just come off of "master" in this case.

Where will we direct people for help?

Should we direct people to the bulletin board or the issues tracker?

Maybe for now (until CTSM is a thing independent of CESM) continue to use the bulletin board.

User's Guide

We should sit down for a couple of hours to determine what we can cut from the User's Guide.

Suggestion that Erik starts with a quick take at this, then we can go through it as a group.

PDF version of tech note

Ben thinks it's possible to have a link to the pdf from the html documentation... it may be a two-step process.

Mariana: not sure if we want to commit the pdf to the repo. May want to host the pdf from somewhere else for now.

Teams on github

Feeling is: let's have 3 teams:

  • CTSM-Admin

  • CTSM-Write

  • CTSM-Read

The admin team will be very small. Others will start on the Read team, and then can graduate to Write.

Automated tag emails

We'll think about an automated solution. For now send out email manually to clm-dev.

Tag names

Bill: Would be good to prefix tags to make it clear the distinction between dev tags and release tags. e.g., maybe prefix dev tags with "dev"??

Branch: release-clm5.0

Tags: release-clm5.0.0, release-clm5.0.1, etc.

Critical bugs?

Feeling is: let's do what we can get done by next Friday, even if it means that we'll need a release update soon afterwards.

January 19, 2018 - Software Meeting

Present: Dave Lawrence, Ben Andre, Erik Kluzek, Bill Sacks, Keith Oleson, Rosie Fisher

Agenda

  • WJS - go through critical and high-priority bugs; determine what needs to be fixed for release
  • WJS - develop vs. master plans from Jan 18 meeting
    • Basic plan is to use gitflow: http://nvie.com/posts/a-successful-git-branching-model/
    • However, in contrast to some descriptions where 'develop' is potentially unstable, we'll treat 'develop' like svn trunk: Branches can only come to develop after the full test suite passes. When we do releases on 'master' is more subjective and based on scientific readiness.
    • What should the default branch be? Tension between what we'd want users to get ('master') and the default we'd want for PRs ('develop'). At yesterday's meeting, we said we'd use 'master' as the default. The risk there is that we might periodically mistakenly merge PRs into master, which would need to be reverted.
    • EBK -- Do we do the part of the process for hotfix branches? And reserve those branch names?
    • (Further reading, with discussions of when gitflow is too complex: http://scottchacon.com/2011/08/31/github-flow.html)
    • EBK -- Do we both make tags and do PR's? Right now we mostly do not do code reviews, which is bad IMHO. I hope we do more reviews in git. When do we ask for reviews? Also should allow PR's that don't touch code, so they can be combined with other changes. What is the process for a develop tag to move over to release? How often do we do release tags on master? Still need to decide on naming convention.
  • WJS - Worth having a simple_bfb tag for issues? Point is: could help us find issues that could be combined into a bfb tag (ebk likes this, but it will need someone responsible to make it happen)
  • WJS - Ben: should we do anything more on manage_externals for v1, or make v1 with what we have? When we're ready, can you pull it into the andre-standalone branch (or show me how)?

Bugs we absolutely want to fix for the release (and release plans)

Dave: Feels we should be a little more lenient about what we mean by release, and how much needs to be fixed for that. There's a lot of positive benefit to releasing before the meeting. We can always put out a release update via a new tag on master.

Bill & Ben: Feel that there are still some big issues that should be fixed before release.

Only machines that need to be supported in the release are cheyenne and hobart. We should explicitly mention the machines/compilers known to work.

Erik: For tools, we just test on cheyenne; feels we can expand that for the cesm2 release.

We'll probably have a release update that comes with the CESM2 release

Github workflow / branching model

Notes from Bill's email

At yesterday's ctsm meeting, we discussed what kind of github workflow / branching model we want in order to support development and releases. I'd like to discuss this further at today's clm-cmt meeting to make sure everyone is on board with this.

The key aspect of yesterday's proposal is that we'd have two main, long-lived branches:

  • develop: This would take the place of our current trunk. Most branches branch off of develop and get merged back into develop. A key difference for us (compared with what's suggested in the blog post) would be that branches only get merged to develop after they are run through the full test suite – so from a pure software engineering standpoint, all commits on develop are "production-ready".

  • master: This would be the branch along which releases are tagged. The only commits that go on master are merges from release branches, when we're ready to make a release.

This is the "gitflow" workflow, which is presented here:

http://nvie.com/posts/a-successful-git-branching-model/

The core elements are illustrated nicely in the figure at the top of that page. If you aren't familiar with gitflow, it would be helpful if you could at least look at that figure to familiarize yourself with it. The discussion in the rest of that post is also worthwhile to read at some point – I think we would adopt most of the workflow as described there.

I was initially hesitant to adopt this relatively complex branching model. (e.g., http://scottchacon.com/2011/08/31/github-flow.html argues for a simpler workflow in many cases.) However, the requirement to distinguish "blessed" versions (which may be released, say, every few months) makes me feel like gitflow is probably the right choice for us. The downsides are that this more complex workflow will involve a bit more overhead and carries a larger risk of mistakes.

Discussion

People are generally happy with the gitflow workflow.

However, don't like the names develop vs. master.

Suggest using "master" for what used to be trunk, and something like "release" or "production" or "stable".

General consensus is for "production".

Feeling is that we want to merge latest master into a dev branch before doing final testing (similarly to our svn workflow), in order to catch integration issues. However, need to make sure to use '--no-ff' when doing the merge.

Feeling is that we shouldn't recommend rebasing to scientists. One reason is that, if you've done any science from your branch, the provenance is lost if you rebase.

Documentation for release

General feeling is that we don't want to duplicate information that's elsewhere, because it's easier to maintain.

Rosie suggests hosting a webpage with the relevant information. The README.rst file in the repo will then just point to that. Ben suggests adding some other links in the README.rst, too - like pointing to cime user's guide.

Probably point to http://cesm-development.github.io/cime/doc/build/html/quickstart.html - though we aren't sure if it's going to stay there or move to esmci.github.io.

There's currently more documentation here http://esmci.github.io/cime/

Naming: CLM vs. CTSM

e.g., this applies to naming for cfg files for manage_externals.

Let's keep things named CLM for now, until post-release.

RTM / MOSART documentation

https://escomp.github.io/mosart/doc/build/html/tech_note/MOSART/CLM50_Tech_Note_MOSART.html is incorrect, according to Ben.

Dave/Keith think we should just point to the CLM tech note for this.

simple_bfb tag

Bill: We didn't discuss this, but based on Erik's feedback on the agenda, I have added a "simple bfb" tag for issues. The idea is: this tag can be used to find issues that can be relatively easily pulled in to a larger bfb tag, to combine multiple changes in one set of testing.

January 18, 2018

Present: Dave Lawrence, Mike Barlage, Sean Swenson, Martyn Clark, Bill Sacks

Agenda

  • Meetings: do we want to maintain separate meetings for CLM software and science vs. CTSM or combine those? Also, bigger CTSM meetings?

    • We are deferring this item
  • wjs: status: I haven't been able to spend much time on the lateral flow refactor. The extraction of the power law routine is done-ish (need to figure out cause for answer changes). Need to put in the hillslope version and the clm4.5 version, then finish cleaning up the routine in SoilHydrologyMod. Hillslope branch moved to git.

Model naming for participation in MIPs

Martyn asks: For branding, will we call it CLM or CTSM?

Do we want to do something like "CTSM-CLM5"? People think that could work.

Dave: For CESM, you're supposed to do something like "CESM2 (list of things that differ from default)" - e.g., "CESM2 (WACCM)"

Martyn feels it's important to include "CTSM". Dave thinks we need to keep the "CLM" brand, at least for now. So something like "CTSM-CLM" is probably best, at least for now.

For the short-term: Dave is comfortable with "CTSM-CLM5", but is a bit hesitant because we haven't published anything with CTSM yet. He's happy as long as it has CLM somewhere in the name.

Publication strategy

Feeling that we'd like to get something out there sooner rather than later.

Martyn: One option would be to start with a vision paper, then follow up with a more detailed technical paper. Or should we start with the more detailed technical paper?

Dave: JAMES probably wants to see a working model. Maybe something short like a GEWEX or EOS article.

Martyn: May be able to do a commentary in JAMES. He'll look into this.

Dave: This could be a way to get some things out there like naming convention, version control, etc. - kind of boring, but stuff we should get out there.

Questions about github workflow, and master/develop/release branches

Bill: For the most part, developments will happen in people's forks

Mike: Does it make sense to have some collaborative development via people's forks rather than the main fork?

Bill thinks that can definitely make sense, though prefers that things get integrated into the main repository more frequently so that things don't get too out-of-sync.

How do we want to handle releases - particularly, how to point people to the right version of the code they should be using?

Martyn asks if we want separate master vs. develop for that. Or Sean suggests having a "release" branch.

Feeling is: Let's have "master" and "develop".

  • develop will take the place of our current trunk. Things need to pass all tests before coming to develop.

  • master is where releases are made and tagged. This contains blessed versions. This will be the default that people get when they clone.

Bill: I think this is roughly equivalent to http://nvie.com/posts/a-successful-git-branching-model/

In the short-term, before the CLM5 release, we won't have a master branch. The first existence of the master branch (and the first tag along that) will be the CLM5 release.

January 8, 2018 - Software Meeting

Present: Dave Lawrence, Rosie Fisher, Ben Andre, Mariana Vertenstein, Bill Sacks, Erik Kluzek

Agenda

  • DML - Single point on Cheyenne

  • ebk -- RTM/mosart issue #1, missing history file with gnu compiler

  • ebk -- note ran yellowstone_pgi test list on both cheyenne_intel/cheyenne_gnu worked fine outside of above issue with gnu for rtm/mosart.

  • ebk -- moving to new testlist format (also allows wallclock time, and memleak fraction), and for mosart/rtm. Some of those tests weren't being run because they weren't changed from aux_clm45 to aux_clm. Some machines are gone now: janus, hopper. There was a testlist for "aux_clm_ys_pgi", I don't think that applies anymore.

  • dml - tag for diagnostic skin temperature

  • dml - moving forcing data to cisl's data archives

  • bja - branch migration for git

Replacement for trello for managing tags?

We can use github projects for this

Single point on cheyenne

Can you do a single-point run using slurm, going to geyser/caldera?

Currently, no way to do this from cheyenne: on cheyenne, you end up using a full node.

You could login to caldera and set up a yellowstone case

But we can set things up to use the share queue. Bill will do that.

RTM / MOSART ERP decStart restart case fails

Just with gnu

See https://github.com/ESCOMP/mosart/issues/3

Feeling is: let's determine if this is decStart, but then let's not worry about this for right now. Let's just move this to intel.

Hosting forcing data with CISL

Dave suggested starting by moving cru/ncep forcing data over. If that seems to work well, we could move gswp3 over.

These are data that have been stored in inputdata.

This is not providing a way for others to download the data - it's just an alternative place to store the data locally so it's not filling up our quota.

Side-note: Mark Moore has been talking about setting up a gridFTP server to replace svn inputdata.

Need to make sym links from the old place to the new so old tags continue to work.

Is it an issue that we won't have backups of these data? The GSWP3 and cru-ncep data are in the svn inputdata repo. The WATCH data won't be backed up, which is probably okay: we just want to make sure that the scripts are backed up.

Managing development post-clm5 release

CTSM development will be incremental development from the CLM5 release - won't break things like FATES integration.

How will we deal with big changes coming in - documenting / notifying people?

Feeling is that we'll use the tag/version numbers to denote these significant developments that users may be interested in.

Tag numbers will just refer to ctsm numbers - not clm version numbers.

We'll maintain the ability to revert to clm5.0 namelist options. We'll have ability to use clm5.0 or "current". As CLM evolves (clm5.1, clm5.2, etc.) you would not (e.g.) be able to go back to clm5.1 with a single setting, though you could go back to clm5.0.

How do we want to document bug fixes that affect the science of certain versions? No clear answer....

Presentation at LMWG meeting

Dave requests that we have a 20-min presentation at the LMWG meeting walking people through the new workflow, where to get the new version of ctsm, etc.

This could be a live demo as opposed to a presentation.

Ben points out: could be useful to have a quick start slide with key commands.

January 4, 2018

Agenda

  • Git migration follow-up: see email from Bill before the holidays

  • Agendas now stored in the notes

  • Ditching origflag

  • Status of lateral flow modularization

  • Interest in CTSM: What's our strategy to introduce people?

Git migration

https://github.com/escomp/ctsm

has migrated from old repo - in unified_land_model branch. This will become master after CLM5 release is done.

origflag

People are comfortable with removing it as long as it's in the history. It will be in the CLM5 release.

We may want to re-introduce the parameterizations that were covered with origflag eventually - but can be ditched for now.

Lateral flow modularization

Eventually, we may want to treat qflx_latflow_out and qflx_latflow_out_vol in a more unified / consistent way. Will do that in a second step.

Interest in CTSM / introducing people

Transitioning from CLM to CTSM, maintaining CLM brand and CLM science

We can do this quickly after the CLM5 release.

We should have some sort of document letting people know what's happening: CTSM branching off of CLM5, with CLM remaining an instantiation of CTSM.

We want to maintain the CLM "brand", probably via a suite of physics options.

We may want to think about how to verify that changes we make moving forward maintain the science of CLM5. Could do something like the verification test that's used for atmosphere and ocean?

  • Bill notes that you can't rely on roundoff-level changes remaining roundoff-level in most cases - things tend to blow up over some time, at least in some grid cells.

As we change things like the numerics, do we need to maintain the old numerics, too? Or can we allow changes that are greater than roundoff, which reproduce CLM5 science without necessarily reproducing CLM5 answers? Feeling is that we can be more flexible here: want to maintain CLM5 science capabilities, but not the exact answers.

But we still want a tool to tell us if a change leads to anything going off the rails - e.g., the roundoff-level hydrology reordering that led methane to go to 0. So instead of or in addition to a more formal verification tool, could have something that checks whether any variable went grossly off the rails.

Tag numbering

Tag numbering: For CLM, we've come up with a scheme where we flag science changes. We'll probably drop this in CTSM (since it's multi-model), and just note this in the ChangeLog.

We could start with CTSM v. 0 being CLM5.

Currently we've been using tags like clm4_5_18_r270. clm4_5 gives release / interim release versions. 18 tells you when the science has changed in the last release version. r270 is bumped for every tag.

For CTSM, maybe we can get away with just two numbers: a major release version and a tag number.

Updated feeling: let's use 3 numbers:

  • 1st number: major release

  • 2nd number: big change (e.g., introducing FATES)

  • 3rd number: normal changes

Documentation: Tech note and User's guide

Mariana suggests following the model of cime: The rst source for the tech note and user's guide lives in the main repo, so it can/should be updated with the source code. People like this idea.

High-level documentation of CTSM

How do we document what CTSM is? What documents do we point people to?

There's the original vision document, and other documents that we prepared in earlier meetings. Feeling is that we can make those documents available on the wiki.

Should we have a design document? We could have a high-level design document, based on Martyn's ideas (and maybe add something on LILAC). Other than that, the best thing right now could be to point people to some good code examples.

Do we want some mechanism for informing people?

Do we want an email list?

Sometime in spring / summer we want to more broadly introduce CTSM to people.

Timing? When we have the hydrology of Noah-MP working within CTSM could be the time to talk broadly about this.

Want to make the point that LILAC will facilitate adoption.

January 3, 2018 - Software Meeting

Present: Dave Lawrence, Rosie Fisher, Ben Andre, Mariana Vertenstein, Bill Sacks, Erik Kluzek.

git transition

We don't want to switch over to git until we're ready to totally drop the svn trunk

The git transition will be in beta09. There are still some changes needed in beta08, in the next week or so.

Still more work is needed to get tools working, probably. Erik will work on that.

Let's create a stable development branch in git, so we can bring together whatever changes in git we want. We can start with that at the head of the branch in pr #189.

People feel that the svn to git migration is good to go: we won't need to redo this. So it's safe to base other changes off of what's there.

tagging

What is the workflow for tagging once we've moved to git? Do we want to make a tag every time a PR is merged to master, or make less frequent tags?

Initial feeling is that we don't need to tag everything.

We could distinguish between documentation-only changes (or things like .gitignore) vs. substantive changes (code, updating version of mosart, etc.) - allowing documentation-only changes to come to master without full testing and without being tagged.

Dave points out that our whole tag numbering will need to change with CTSM, since it is a multi-model system - so the science version is kind of meaningless.

PDF of tech note?

You should be able to do make latexpdf. You need the right tools installed for that to work, though. One problem is that the tech note and User's Guide may be bundled into one pdf.

Clone this wiki locally