Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tools for writing out parameters to a Dolang configuration file #446

Closed
sbenthall opened this issue Dec 5, 2019 · 19 comments
Closed

tools for writing out parameters to a Dolang configuration file #446

sbenthall opened this issue Dec 5, 2019 · 19 comments

Comments

@sbenthall
Copy link
Contributor

Simulations built with HARK currently start with long sections of Python code setting parameters.

This custom code is written in a few different styles; it would be cleaner to have these parameters in a configuration file. This is what Dolang does, and is a pattern it would do well to get to.

This also would be a step in the direction of automating the creation of tables for showing the mapping between notation and Python variables.

@llorracc
Copy link
Collaborator

llorracc commented Dec 8, 2019

As with other things, this is something that is best handled by crafting a single "template" example and then adapting/improving the template by thinking about the extent to which the template could be used for various different projects.

I think good candidates would be documentation examples; you might try ConsPortfolioModelDoc, for example.

@sbenthall
Copy link
Contributor Author

Notes from the sprint meeting this morning:

  • the HARK library will contain good default parameters
  • there are so many parameters we don't want to expose all of them to DemARK users
  • Also, there are probably too many to include in Python documentation (i.e. readthedocs)

I'll work on a demo of ways to work with this after ofter some other directory cleanup (i.e. #440 )

@sbenthall
Copy link
Contributor Author

I've been doing research into how other scientific simulation libraries handle this problem.
What I've found is that there is no standardized way of doing it yet, but there's some patterns that seem to hold across the libraries.

  • None of these libraries has anything as tightly integrated as REMARKs currently are for publications. The examples provided with the core libraries vary in how 'complete' they are as useful demos or exploratory tool, but I don't see any submodules or linking across repositories.
  • There are almost never parameters hard-coded into the library itself. Most of the time, these are coded into the python of an example notebook or python file on a case-by-case basis. There are a couple exceptions to this:
    • ActivitySim has many, many parameters for its simulations; it stores these in .yaml and .csv files
    • Mesa has substantive model classes that are initialized at the start of particular simulations or experiments. These have their default parameters loaded as default arguments to the class initializer and sometimes stored in static variables of the class itself.

These are the libraries I included in my survey, with some notes about how each managed parameters in their examples.

PySB

Systems Biology Modeling
http://pysb.org/
--- Parameters coded into examples. No visible inheritance.

Mesa

Agent Based Modeling
https://pypi.org/project/Mesa/
https://forum.comses.net/t/mesa-an-agent-based-modeling-framework-in-python-3/7039

--- Individual examples have their own requirements.txt
--- models have default parameters in __init__method of the model class

ActivitySim

Metropolitan Travel Activity
https://activitysim.github.io/

-- Many, many configuration options
-- Everything provided in .csv or .yaml files, e.g.:
https://github.com/ActivitySim/activitysim/tree/master/example/configs

SimuPy

Dynamic systems
https://github.com/simupy/simupy
https://readthedocs.org/projects/simupy-personal/downloads/pdf/latest/
--- Parameters coded into each example file
--- No reuse -- library is immature

Nengo

Brain simulations
https://www.frontiersin.org/articles/10.3389/fninf.2013.00048/full
--- examples are all notebooks in the docs directory
https://github.com/nengo/nengo/tree/master/docs
--- parameters are all just entered as arguments in (very lightweight) modeling interface, e.g.:
https://github.com/s72sue/std_neural_nets/blob/master/hopfield_network.ipynb

nilearn

Neuro imaging
http://nilearn.github.io/auto_examples/index.html#tutorial-examples
--- datasets are loaded by a data loading handler
--- many examples, with few parameters, which are hardcoded as method arguments
E.g.
https://github.com/nilearn/nilearn/blob/master/examples/04_manipulating_images/plot_roi_extraction.py

Special mention:

Yggdrasil

Plant simulations
https://academic.oup.com/insilicoplants/article/1/1/diz001/5479575
https://github.com/cropsinsilico/yggdrasil
Software for combining models across programming languages to accommodate different layers of abstraction.

@llorracc
Copy link
Collaborator

llorracc commented Dec 13, 2019 via email

@sbenthall
Copy link
Contributor Author

The main lessons from this work are:

  • If there are a large number of parameters, it makes sense to put them in a serial configuration file, like a .yaml
  • If there are substantive models, it makes sense for default parameter values to be loaded by the model's class when it initializes.

I will make a PR with a demonstration of how this could work with the HARK core and a template example.

One thing that occurred to me after I did the survey of simulation libraries, but which I think is important, is this:

  • The libraries I looked at are mainly about defining a model's content by giving it parameters, and delivering simulated output.
  • Some HARK parameters are actually more about how the model is executed, which is quite a bit different from model content. Maybe this should be treated differently. This would imply a comparison with a different set of Python libaries that emphasis model-fitting more, such as scikit-learn and PyMC3.

This may be a more complex issue, better dealt with in a separate task. But I wanted to flag for future work the possibility of:

  • Distinguishing, when defining parameters, between those that are for the model's substantive content (like CRRA and DiscFac), and parameters that guide how it works operationally, perhaps like CubicBool.

I've noticed that Dolo configuration files do separate meaningful categories of parameters from each other, which I think helps add clarity.

@albop
Copy link
Collaborator

albop commented Dec 19, 2019 via email

@llorracc
Copy link
Collaborator

llorracc commented Dec 19, 2019 via email

@llorracc
Copy link
Collaborator

Pablo,

So, your suggestion would be that we adopt toml as our standard for
defining input files of all kinds (esp. parameter files)?

For example, right now we have a giant ConsumerParameters.py file which we use with commands like

from ConsumerParameters import init_perf_foresight as PerfForesightDict

PFexample = PerfForesightConsumerType(**PerfForesightDict)

(and the relevant part of ConsumerParameters.py is excerpted below).

We’ve had some discussions about alternative ways of doing this including:

  1. Having default parameter values defined directly in the specification
    of the class itself
  2. Breaking things up so that rather than having a giant
    ConsumerParameters.py file for all of our consumption/saving classes, there
    would be a standalone file (maybe toml?) for each class, like
    PerfForesightCRRA.toml
  3. A combination: Define default parameter values when the class is
    defined, but write those parameter values out into a file (say, a toml
    file) so that they are easy to inspect (but make it clear to the user that
    the toml file is generated content).

One complexity is that we build our models by inheritance, so that for
example ConsIndShockType model inherits the characteristics of
PerfForesightType, and we would want it to inherit default parameter values
too, which would argue for options 1. or 3. above.

My inclination is for 3, but am curious about your thoughts.

 ConsumerParameters.py:

CRRA = 2.0                          # Coefficient of relative risk aversion
Rfree = 1.03                        # Interest factor on assets
DiscFac = 0.96                      # Intertemporal discount factor
LivPrb = [0.98]                     # Survival probability
PermGroFac = [1.01]                 # Permanent income growth factor
BoroCnstArt = None                  # Artificial borrowing constraint
MaxKinks = 400                      # Maximum number of grid points to
allow in cFunc (should be large)
AgentCount = 10000                  # Number of agents of this type
(only matters for simulation)
aNrmInitMean = 0.0                  # Mean of log initial assets (only
matters for simulation)
aNrmInitStd  = 1.0                  # Standard deviation of log
initial assets (only for simulation)
pLvlInitMean = 0.0                  # Mean of log initial permanent
income (only matters for simulation)
pLvlInitStd  = 0.0                  # Standard deviation of log
initial permanent income (only matters for simulation)
PermGroFacAgg = 1.0                 # Aggregate permanent income
growth factor (only matters for simulation)
T_age = None                        # Age after which simulated agents
are automatically killed
T_cycle = 1                         # Number of periods in the cycle
for this agent type

# Make a dictionary to specify a perfect foresight consumer type
init_perfect_foresight = { 'CRRA': CRRA,
                           'Rfree': Rfree,
                           'DiscFac': DiscFac,
                           'LivPrb': LivPrb,
                           'PermGroFac': PermGroFac,
                           'BoroCnstArt': BoroCnstArt,
                           #'MaxKinks': MaxKinks,
                           'AgentCount': AgentCount,
                           'aNrmInitMean' : aNrmInitMean,
                           'aNrmInitStd' : aNrmInitStd,
                           'pLvlInitMean' : pLvlInitMean,
                           'pLvlInitStd' : pLvlInitStd,
                           'PermGroFacAgg' : PermGroFacAgg,
                           'T_age' : T_age,
                           'T_cycle' : T_cycle
                          }

@sbenthall
Copy link
Contributor Author

#462 is intended to demonstrate an incremental step in the right direction here.

In master, the library's default parameters are all hard-coded into ConsumerParameters.py.

In this PR, ConsumerParameters.py is still there, but when it is imported it loads all the parameters from a ConsumerParameters.yaml file.

With the exception of a few small changes, this PR could in principle be merged with no change to the API for downstream uses.

@llorracc
Copy link
Collaborator

@sbenthall, sounds like you've made a nice prototype (though I haven't had time to look at it yet).

I'd be interested in your thinking about the pros and cons of my idea from the prior discussion, of having default values embedded in the definition of the class, then written out to a yaml file. As I see it:

pro: There's one place to look both for how the parameter is used and what its default numerical value is
con: The values of the parameters are scattered through the class definition instead of concentrated in one place

  • counterpoint: If they are all written out as a yaml/toml file, then there is still a single place to look for the defaults
  • countercounterpoint but the user might think the yaml file is where they can SET the defaults and that is not correct

PS. Did you look into why Pablo suggested toml instead of yaml?

@sbenthall
Copy link
Contributor Author

@llorracc Ok, I'll be honest.

I don't like the idea of having the classes write the parameters out to a yaml file, with that yaml file stored in the version control, for your "countercounterpoint" reason. I think it's confusing.

I think it would accomplish the same thing, but be less confusing, if each class instance had a method that clearly reported to the user what its parameters are.

These parameters might even be displayed in the __repr__ of the class.
https://www.pythonforbeginners.com/basics/__str__-vs-__repr

I think these conversations are very tricky because they often depend on quite unscientific intuitions about what's "easier to use", which is a very noisy human variable. Most of the time when I have an opinion on this, it's based on my understanding of software engineering conventions. But there's always room to disagree.

I've now looked at TOML, as Pablo recommends. It looks quite similar to YAML. I think it's less widely used than YAML. My impression is that it would be idiosyncratic to adopt it. If it isn't as good as YAML for including equations, I think that's a dealbreaker for depending on it in the long run.

https://gist.github.com/oconnor663/9aeb4ed56394cb013a20

@llorracc
Copy link
Collaborator

llorracc commented Dec 24, 2019 via email

@sbenthall
Copy link
Contributor Author

As a small step: Standardize the names of the parameter dictionaries in ConsumerParameters.py with the names of the classes that use them. Then fix the downstream parameters.

@sbenthall
Copy link
Contributor Author

Referring to that proposed "small step"--a problem with naming the dictionaries in ConsumerParameters.py with the class that ingests them is that many of these dictionaries are reused in multiple places at the moment.

For example, here, init_idiosyncratic_shock is imported, updated, and then given at runtime as arguments to the initializer of RepAgentConsumerType:
https://github.com/econ-ark/HARK/blob/master/HARK/ConsumptionSaving/ConsRepAgentModel.py#L340-L350

This is a good example of why it would be better for classes to have default parameters that get inherited by their subclasses. Indeed, RepAgentConsumerType is a subclass of IndShockConsumerType and if the variables were being passed through by inheritance, then only the changes values would need to be defined at runtime.

For this reason, I'm working on #466, which gives each class the parameters as overrideable defaults.

@sbenthall
Copy link
Contributor Author

With #442 merged, now it's easier to see why the current way of handling parameters is problematic.

Because of the old way of handling parameters, there are downstream dependencies on a parameter file that shouldn't be in the HARK module:
#440 (comment)
https://github.com/econ-ark/HARK/blob/master/HARK/SolvingMicroDSOPs/Calibration/EstimationParameters.py

SolvingMicroDSOPs is now a REMARK. But it is still depending on HARK for its Calibration file. This is not right.

Whatever solution we find for parameter management within HARK will, in the best case, also inform how REMARKs work as well.

@sbenthall
Copy link
Contributor Author

sbenthall commented Jan 7, 2020

I think at yesterday's meeting we came to some conclusions about where to go with this.
It's actually several different features, which will allow for efficient and flexible configuration.

For (c) and (d) there's questions about how specifically the YAML will be formatted.
But I think it's fair to say that if a value is not specified in an (input) YAML file, it will be filled with the default value.

The idea behind (c) and (d) is to have model portability. This is quite a big lift.
I'd like to scope this ticket at (a) (b) and the preliminary version of (c).
The design decisions for (c) and (d) are going to require a lot more discussion.

MridulS pushed a commit that referenced this issue Jan 9, 2020
* loading init_perfect_foresight parameters by default. See #446

* loading init_idiosyncratic_shock into IndShochConsumerType on initialization #446

* using default ConsIndShock parameters when configuring RAexample

* loading kinked_R parameters into class by default

* Using default params in initializer for classes in ConsPrefShockModel

* removing act_T from default Cobb-Douglas parameters because it's already in __init__

* loading default params into initialization, last cases. see #446

* Fixing issue with multiple cycles arguments in IndShockExplicitPermIncConsumerType() and MedShockConsumerType()

Co-authored-by: Christopher Llorracc Carroll <[email protected]>
@sbenthall
Copy link
Contributor Author

The next step in this issue is to allow the configuration of a HARK model from a YAML file.

The best thing to do would be to use an existing YAML format for model configuration: Dolang! So this is related to #763

@sbenthall
Copy link
Contributor Author

Since #763 covers the case of having dolang YAML input into HARK, I'm changing the scope of this ticket to be outputting HARK modles to dolang YAML.

This will depend on having an internal, functionalized version of the transition functions in HARK. So this depends on #761

@sbenthall sbenthall changed the title tools for loading parameters from a configuration file tools for writing out parameters to a Dolang configuration file Aug 4, 2020
@sbenthall
Copy link
Contributor Author

This also depends on having an organized representation of the parameters of a model, or #660

@sbenthall sbenthall modified the milestones: 1.0.0, 1.x.y Jan 13, 2021
@sbenthall sbenthall modified the milestones: 1.x.y, 1.1.0 Jan 23, 2021
@sbenthall sbenthall removed their assignment Jun 3, 2024
@econ-ark econ-ark locked and limited conversation to collaborators Jul 3, 2024
@mnwhite mnwhite converted this issue into discussion #1466 Jul 3, 2024

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Projects
None yet
Development

No branches or pull requests

3 participants