Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Configuration file #10

Closed
barneydobson opened this issue Jan 18, 2024 · 12 comments · Fixed by #83
Closed

Configuration file #10

barneydobson opened this issue Jan 18, 2024 · 12 comments · Fixed by #83
Assignees

Comments

@barneydobson
Copy link
Collaborator

barneydobson commented Jan 18, 2024

Intended use would be to have an initial setup config and one that intends to be executed many times. The main file (swmmanywhere) will have to create a file structure that works with this. I would envisage (reflected below) that there will be two folders - one for the preprocessing executaion (first config file), and one for the model creation (second config). Probably there will also be a national_downloads folder still

Run (once for a project) a slow to execute config (i.e., runnning downloaders and particularly slow graphfcns)

bbox:
    - minx
    - miny
    - maxx
    - maxy
project_name: my_project
run:
  base_directory_to_search: '' # empty because this is the first time to run for a project
  base_directory_to_save_to: 'base_project_dir'
  downloaders: 
  starting_graph: 
    - 'street.json'
    - 'river.json'
  list_of_functions:
      - graphfcn_A
      - graphfcn_B
      - graphfcn_C

Run (thousands of times in, e.g., for sensitivity analysis)

project_name: my_project
downloaders: False
run:
    -base_directory_to_search = 'folder in which output of pre_process is placed'
    -base_directory_to_save_to = 'another folder'
    -starting_graph: graphfcn_C.json
    -list_of_function:
        -graphfcnD
        -graphfcnE
        -graphfcnF
@barneydobson
Copy link
Collaborator Author

RE config files can we keep the discussion here please.

@cheginit
#19 (comment)

@dalonsoa
#19 (comment)

OK so I understand yaml, so would be keen to use that unless either of you have a strong suggestion to do something otherwise. The above setup is just one proposal, but I think in broad terms makes sense to have a flexible setup that can iterate over a various number of pre-processing steps before creating a graph that is a starting point for the config that will be running many many times?

@cheginit
Copy link
Collaborator

I usually create my own config reader module with custom data types (dataclasses) and methods for efficient and lazy iteration over config files when there are many of them. This can be achieved with custom __getitem__ and __iter__ methods for the reader module. I think, first, we need to figure out all the user-specified parameters and come up with a data schema for efficient storage and access. So, if there are many of these parameters, I recommend creating a table of all these parameters, their definitions, units, their “themes”, and other metadata that can help us categorize them.

@barneydobson
Copy link
Collaborator Author

barneydobson commented Jan 25, 2024

https://github.com/ImperialCollegeLondon/SWMManywhere_old/blob/main/swmmanywhere/defs/parameter_defaults.yml Here is what I had been using to do for user parameters - I've tried to be exhaustive but I'm sure in polishing I would come up with more parameters. And so in the config file a user could supply parameters not set to their default values.

Edit:
Oh you probably mean parameters as in the config setup parameters (e.g., bbox) rather than model parameters. I will have a think

@dalonsoa
Copy link
Collaborator

dalonsoa commented Jan 26, 2024

We might be mixing input parameters in the config file with model parameters, indeed. They are related, but not exactly the same.

Having said that, and following on some of @cheginit comments, the defaults you shared @barneydobson very much call for pydantic models. One example will be:

from pydantic import BaseModel, Field

class OutletDerivation(BaseModel):
    max_river_length: float = Field(default = 30.0, ge=5, le=100, description="Distance to split rivers into segments (m).")
    river_buffer_distance: float = Field(default = 150.0, ge=50, le=300, description="Buffer distance to link rivers to streets (m).")
    outlet_length: float = Field(default = 40.0, ge=10, le=100, description="Length to discourage street drainage into river buffers (m).")

And everything (validation, setting defaults and documentation) comes automatically whenever this object is initialised.

@barneydobson
Copy link
Collaborator Author

barneydobson commented Jan 26, 2024

OK well I understand what you've @dalonsoa written there - but I don't think I understand how this is actually going to work.

A few questions:

  • The module with these classes becomes the definitive location to store all parameters?
  • We think that model setup can also be handled like this? I guess that makes sense for something like the list of graphfcns. For example, I can have a InitialSetup class that has the standard instructions for that (e.g., run downloaders etc), and an IterativeSetup (or some better name) that has the latter parts? I guess it will seem possibly overkill for things like base_dir, but maybe it is fine.
  • I guess we add a set function if a user wants to provide a new value? Then maybe when a get function is called, if there is a user provided value we use that and otherwise use the default?

Edit:

  • And if a user wants to provide a file with a bunch of new values... what we still use a yaml or something?

@barneydobson
Copy link
Collaborator Author

Further - I guess we only need to pass the model parameters (and not the config parameters) to the graph functions. These were setup thinking that the parameters and addresses would be stored as dicts - but this seems to change that somehow?

From #8

@register_graphfcn
def generic_function(G, a, b, **kwargs):

    # create G_ from G, a, b
    return G_

for function in list_of_functions 
    G = function(G, **parameters, **addresses)

@dalonsoa
Copy link
Collaborator

It can still work like that. It could look something like this (most likely an oversimplification, I'm just using it as illustration):

def validate_parameters(config: dict) -> dict:
   """Validates parameter configuration and fill with defaults when not present.
    parameters = {}
    parameters["outlet_derivation"] = OutletDerivation(**config.get("outlet_derivation", {}))
    parameters["subcatchment_derivation"] = SubcatchmentDerivation(...)
    ...

# Then where relevant, after loading one or more configuration files into the dict
parameters = validate_parameters(config)

@register_graphfcn
def generic_function(G: graph, a: OutletDerivation, b: SubcatchmentDerivation, **kwargs) -> graph:

    # create G_ from G, a, b
    return G_

for function in list_of_functions 
    G = function(G, **parameters, **addresses)

This has the advantage that the parameters are now well defined objects with types, not a collection of entries in a dictionary, so it is easier for the type checkers to figure out if things are correct, for you while coding to receive suggestions, etc.

@barneydobson
Copy link
Collaborator Author

barneydobson commented Jan 26, 2024

OK that is very helpful. I don't understand how we're going to do addresses yet - presumably this is linked to #9 .

But a functional mockup based on what you've got there would be as follows:

# This would be in graph_utilities.py
graphfcns = {}

def register_graphfcn(func):
    # Add the function to the registry
    graphfcns[func.__name__] = func
    return func

@register_graphfcn
def double_directed(G, river_buffer_distance, **kwargs):
    print(river_buffer_distance)
    return G

@register_graphfcn
def split_long_edges(G, max_river_length, **kwargs):
    print(max_river_length)
    return G

# This would be in parameters.py
from pydantic import BaseModel, Field
class OutletDerivation(BaseModel):
    max_river_length: float = Field(default = 30.0, ge=5, le=100, description="Distance to split rivers into segments (m).")
    river_buffer_distance: float = Field(default = 150.0, ge=50, le=300, description="Buffer distance to link rivers to streets (m).")
    outlet_length: float = Field(default = 40.0, ge=10, le=100, description="Length to discourage street drainage into river buffers (m).")

class SubcatchmentDerivation(BaseModel):
    subcatchment_buffer_distance: float = Field(default = 150.0, ge=50, le=300, description="Buffer distance to link subcatchments to streets (m).")
    outlet_length: float = Field(default = 40.0, ge=10, le=100, description="Length to discourage street drainage into subcatchment buffers (m).")

def validate_parameters(config: dict) -> dict:
    """Validates parameter configuration and fill with defaults when not present."""
    parameters = {}
    parameters["outlet_derivation"] = OutletDerivation(**config.get("outlet_derivation", {}))
    parameters["subcatchment_derivation"] = SubcatchmentDerivation(**config.get("subcatchment_derivation", {}))
    parameters_only = {}
    for value in parameters.values():
        for key_, value_ in dict(value).items():
            parameters_only[key_] = value_
    return parameters_only


# This kind of behaviour would be in generate_network.py
custom_parameters = {'outlet_derivation' : {'max_river_length': 50.0}}
params = validate_parameters(custom_parameters)

import networkx as nx
G = nx.Graph()
function_list = ['double_directed', 'split_long_edges']
for function in function_list:
    assert function in graphfcns.keys(), f"Function {function} not registered in graphfcns."
    G = graphfcns[function](G, **params)

Note the 'unpacking' of parameters from their subcategory. Happy to discuss if this is a good idea or not - it is just that is how it works in the existing code.

@dalonsoa
Copy link
Collaborator

You can use directly the following to unpack the fields:

    for value in parameters.values():
        parameters_only.update(value.model_dump())

Having said that, I think it is a pity not using the fully formed objects directly, even if a particular graph function is only using one of the fields of that higher level object. It will give some reasurance thanks to the type checkers that you're doing the right thing. The functions would look something like this:

@register_graphfcn
def double_directed(G, outlet_derivation: OutletDerivation, **kwargs):
    print(outlet_derivation.river_buffer_distance)
    return G

@register_graphfcn
def split_long_edges(G, outlet_derivation: OutletDerivation, **kwargs):
    print(outlet_derivation.max_river_length)
    return G

This was referenced Jan 26, 2024
@barneydobson
Copy link
Collaborator Author

note from #31 - the 'default loadout' of graphfcn order could be in a default config file or a default base model class. Many pro's and cons discussed in the linked PR

@cheginit
Copy link
Collaborator

Since you're going with pydantic, that, I think, is a good decision, I also have a suggestion as an alternative to pydantic. There's a package called Param that is similar to pydantic but is more tailored toward validating model-like parameters. I think it's worth exploring since it requires less coding compared to pydantic, doesn't have any dependencies, and some neat built-in functionalities for numerical parameters.

@barneydobson barneydobson mentioned this issue Feb 5, 2024
4 tasks
@barneydobson
Copy link
Collaborator Author

Also linked with this are:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants