Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TOML schema #58

Open
simonbyrne opened this issue Jan 10, 2022 · 15 comments
Open

TOML schema #58

simonbyrne opened this issue Jan 10, 2022 · 15 comments

Comments

@simonbyrne
Copy link
Member

I think there has been a general consensus that we should move to a TOML format. The specifics of how it should be specified in this file need to be clarified.

@odunbar did you have some examples somewhere?

@simonbyrne
Copy link
Member Author

simonbyrne commented Jan 10, 2022

What I had been thinking (and is roughly what is implemented in #57) is something like the following:

A parameter set file would have the following keys (all are optional unless marked as required)

  • name (required): the name of the parameter set: this would become the name of the struct in the code, so should be a valid Julia symbolic name
  • inherits_from: the name of the parameter set to inherit values from
  • parameters: a table of parameters (see below)
  • parameters_include: an array of relative paths (from the current file) of other .toml files containing parameter tables
  • override_values: a table of (parameter key, numeric value) pairs describing inherited parameters which should be modified in this parameter set.

A parameter is an entry in a table. Each key in the table should be a descriptive but valid Julia symbolic name, e.g. MolarMassConstant, and each entry would contain the following keys

  • description: a long form description of the constant, and necessary references (can be formatted using Julia Markdown).
  • symbol: the standard symbol name used to refer to the parameter
    • should this be required to be unique?
  • units: a string containing the units of the parameter; should be parsable by Unitful.jl.
  • value: the numeric value of the parameter in the current parameter set

Additional keys for UQ (e.g. prior distributions, etc) can also be added. We might also be able to add some mechanism for derived parameters, to support things like
https://github.com/CliMA/CLIMAParameters.jl/blob/6f660320358bd2be611f382897aacf3084c66bf1/src/Planet/planet_parameters.jl#L5

Examples

# default.toml
name = "DefaultParameterSet"

parameters_include = [
 "parameters/universal.toml",
 "parameters/planet.toml",
]
# parameters/universal.toml
[MolarMassConstant]
description = "universal gas constant"
symbol = "R"
units = "J*K^−1*mol^−1"
value = 8.3144598
# custom.toml
name = "CustomParameterSet"
inherits_from = "CLIMAParameters.DefaultParameterSet"

# add new parameter
[parameters.Wobble]
description = "wobble rate"
units = "s^-1"
value = 10.2

[override_values]
MolarMassConstant = 8.5

@tapios
Copy link

tapios commented Jan 10, 2022

This goes in the right direction. @odunbar has more details of what we need. @glwagner and @ilopezgp also have relevant ideas and experiences on the calibration/UQ algorithm end (where we need the same files as input).

@odunbar
Copy link
Contributor

odunbar commented Jan 11, 2022

clima_interface_v2_new
This shows the parameter file content (As we know, whether we still do a "distribution of parameters" is WIP and depends on how the different model components will be written/communicate in the future)

The default file is in ClimaParameters. There will be an override file for every experiement we run.

The format of the parameter file contains those features relevant to Clima, (similar to Simon's construction above) plus some information for the calibration pipeline (e.g the priors). PS I had units as part of the description.

Some high level goals for the parameter file part:

  1. Software side, ideally ability to change just runtime parameter values (i.e we do not need to change the number of overrides etc. at runtime)
  2. User side. (a) Parameter names should be unique and verbose in this file as they may be used across the code. (b) they should be values, we will treat "derived parameters" as functions.

e.g.

[zeropointseven]
RunValue = 0.7
ValueType = "real"
Tags=""
Prior = "fixed"
Transformation = "none"
Description = "This parameter is constant 0.7 [nondimensional]"

@simonbyrne
Copy link
Member Author

A couple of questions I would like to clarify:

  • In CLIMAParameters, will all the parameters be in a single file (if so, this could get quite large and be difficult to navigate), or split over multiple files?
  • What are the potential values of Type?
  • Can you expand on the role of Tags? I still don't quite understand their role.
  • What would the override file look like?

On this last point, depending on how we choose to represent the parameters (#59), it may be helpful to distinguish between information we need to know at compile time (e.g. which parameters are "overrideable"), and information that is only needed at runtime (the actual override values).

Also, at one point there was some discussion about having a mechanism to output all the parameter values (both overrides and not) that were used in a given experiment. I'm not sure this is feasible, because there isn't an obvious way to track whether a parameter is "used" in a given experiment, but if you store both the Manifest.toml (which will contain the exact version of CLIMAParameters used), and the override file, you should have all the information to reproduce the exact experiment.

@odunbar
Copy link
Contributor

odunbar commented Jan 11, 2022

  1. Lets keep the defaults in CLIMAParameters as 1 file for now. In the end it may break up naturally, this is easy to implement later, without modifying any interfaces, (we can also make tools for searching such a file easily.)
  2. I was advised that Type category would be useful. I will allow people to decide what the point is here, i was basically just thinking of "string" or "number" here.
  3. The Tags are part of a more general design question. They should list of all repositories that use this parameter, we can also have a catch-all category (e.g. "Planet") for basic parameters used everywhere. I believe tags will have many uses down the line with a multi-repo codebase, For example ways of breaking up the parameter sets, or even ways of easily searching the parameter file.
  4. The override file will look like a list of the parameters (as seen with [zeropointseven]), it will replace the default with the same name. For different runs in a calibration the EKP writes 100 new files, each look identical to the original override file, but with the "Value = ..." line replaced with a new value.
  5. Parameter log is straightforward. For a run you can just merge default + override toml files (If a param is in the override file you use its values instead). I would advise we do this for every single run at run-time.I take your point RE the CLIMAParameters changing, but in theory if CLIMAParameters will only change when a relevant repository is also changed, so this is a more general Q about how you want to store the exact information of a run with a distributed Repo. I think a more frequent use case is people running simulations and certain ones crash/ exhibit interesting behvaiour due to parameter values etc. and would like to inspect the recent logs.

@tapios
Copy link

tapios commented Jan 11, 2022

To add to what @odunbar says:

  1. We will need multiple files in the end because we want to, for example, run the land model in isolation, and modify its parameters, without having to deal with atmospheric parameters. The multiple (component-model) files can be input into one master file though.
  2. For types, we need float, integer, logical, string. What type precisely the parameters will have (e.g., float32 or float64) should be determined by the type chosen for the model run (e.g., double or single precision).

Logging parameters and other model configuration for each run will be extremely important for being able to use the model effectively. This can just consist of a merge of all the parameter files. The way we have done this in the past is to write a log file with configuration information, into which parameter information is written as the parameter files are parsed.

@odunbar
Copy link
Contributor

odunbar commented Jan 11, 2022

Still, regarding point 1. I would still argue for just 1 file for now - as this partitioning is maybe not so clear yet until we have everything together.

@simonbyrne
Copy link
Member Author

Are there any examples of non-float parameters?

@tapios
Copy link

tapios commented Jan 11, 2022

We have used integers in the past, e.g., for different structural choices in parameterization schemes, or to specify the number of updrafts in the EDMF scheme.

@tapios
Copy link

tapios commented Jan 11, 2022

Still, regarding point 1. I would still argue for just 1 file for now - as this partitioning is maybe not so clear yet until we have everything together.

We should have, at minimum, files for

  • Planet (shared by model components, including thermodynamic constants)
  • Atmosphere
  • Ocean
  • Land

If we make it too monolithic, we create barriers to use for people who do not want to deal with a plethora of, for them, irrelevant parameters.

@simonbyrne
Copy link
Member Author

Okay, that's helpful to know.

For a point of reference, here is what @haakon-e, @costachris and @charleskawczynski had been using in TurbulenceConvection.jl:
https://github.com/CliMA/TurbulenceConvection.jl/blob/7d58863d5de188b28f92792063169577f745d047/driver/generate_namelist.jl
This also includes some non-parameter options, like file names and timestepping options, but a few things to note about what had evolved:

@simonbyrne
Copy link
Member Author

It would be useful to see more examples of how parameters have been used: if you have any example, please post them here

@tapios
Copy link

tapios commented Jan 11, 2022

Flatter hierarchies may be better. And we do need vector-valued and array-valued parameters (e.g., to specify neural network weights).

@odunbar
Copy link
Contributor

odunbar commented Jan 11, 2022

For NN weights, would it be easier to save these to file and just provide the file name to clima as the parameter?

@simonbyrne
Copy link
Member Author

  • The override file will look like a list of the parameters (as seen with [zeropointseven]), it will replace the default with the same name. For different runs in a calibration the EKP writes 100 new files, each look identical to the original override file, but with the "Value = ..." line replaced with a new value.

If the only information it needs to add is the value for each parameter, why would we need to output all the other information (since that is already available in CLIMAParameters)?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants