-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable customisation #54
Conversation
The extensions consist of two addresses: (1) 'extension_file' includes the address of a script for any decorations of existing nodes/arcs/models objects; (2) 'new_classes' includes the new node name and the associated script address. It is noted that only the folder address is provided rather than the exact script address.
…rcs/models objects
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, after a first pass, I think that the current approach is OK given the underlying architecture of WSIMOD, but the problem is that this architecture makes expanding with new functionality REALLY complicated.
The first thing we need to consider is what we want users to be able to customise. Letting them customise everything can only end badly, resulting in an inconsistent tool that cannot produce reproducible results. Based on the examples provided, I don't see any reason why we cannot constrain any custom functionality to providing custom Nodes/Arcs. For example, the node my_reservoir
could easily be a custom node class with the modified functionality.
To make this possible, it is necessary that all nodes/arcs have a common PUBLIC interface. Having each of them a custom interface imposes having to modify a lot of code whenever there is a new node to consider.
For example, if nodes would look like:
class Node:
...
def process(self):
# Functionality customised in each Node subclass.
...
Then part of the model.run
function could be written as:
for node in self.nodes:
node.process()
instead of having to call treat_water
for FWTW
nodes, create_demand
for Demand
nodes, etc. As a consequence, if we add a new node, we will not need to modify model.run
for it to be considered as ALL the nodes will be run. The same applies to arcs and any other high level object: they should keep the same interface so they can be treated collectively and facilitate customising the functionality.
I understand that this might be a major undertaking, but I see it as a necessary step before we can consider the possibility of enabling the users to add their own functionality.
@barneydobson , @liuly12 is this more or less clear? Happy to have a chat on this next week, ideally in person.
return func() | ||
return reservoir_functions_wrapper | ||
|
||
def wrapper_run(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is very much NOT sustainable. It requires from anyone willing to add its own functionality to copy the whole of the model.run
function and add bits here and there to run the custom functionality. That is really complicated.
The underlying problem is that each node has its own interface, which means it is impossible to abstract their execution: you need to call each type of node separately and run the specific function for that node. So, if you add a new node, you need to add a new step in run
to execute it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dalonsoa The run
function is a really tight balance between enabling customisation over orchestration vs getting something tidy. In principle we would just like every node to have a run
function that is called during the model run
timestep and that is that and very easy/sustainable. Though the whole point of having this more arduous orchestration is to facilitate customisation, e.g., note how some nodes have different functions that are called multiple times throughout the timestep within the run
function - triggering one node, then another, and then the first node - is needed to capture many systems.
I definitely understand it is not particularly satisfying, but I can't say I came up with a better solution. One software I have used has 'before timestep', 'during timestep' and 'after timestep' functions (this inspired the architecture of the Surface
object with three function lists of inflows
, processes
and outflows
). We could take a similar approach within the model's run
function but a) it would require quite a lot of work implementing this behaviour for every node, and 2) I'm not convinced that many water systems would still probably require further customisation to capture more complicated interactions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess a compromise solution could be that the user provides a list of <node_type> : <function_name> to the run function, and it can do the rest from there?
So to recreate the default run
behaviour you would provide a list of dicts:
[{'FWTW' : 'treat_water'},
{'Demand' : 'create_demand'},
{'Land' : 'run'},
{'Groundwater' : 'infiltrate'},
{'Sewer' : 'make_discharge'},
{'Foul' : 'make_discharge'},
{'WWTW' : 'calculate_discharge'},
{'Groundwater' : 'distribute'},
{'River' : 'calculate_discharge'},
{'Reservoir' : 'make_abstractions'},
{'Land' : 'apply_irrigation'},
{'WWTW' : 'make_discharge'},
{'Catchment' : 'route'}]
And the run function would just iterate over the list and call functions accordingly
@barneydobson , I see your point and fully understand that any significant changes in this regard will require significant work, but having to re-write the Your intermediate solution is not a bad option, to be honest. I think it is better than hard coding the sequence and will simplify the process of adding functionality since it can be controlled simply from the input file and whatever custom nodes are provided. I understand that it will be necessary to add to the sequence only the nodes that are present in the input file, right? If there are not Having said that, users might wonder why the sequence is what it is. Is it OK to swap steps 3 and 4, for example? Why not? Why is the default run sequence what it is and not something else? How would a user know? Where should I put my custom It could be possible to include logical dependencies in the nodes such that the list that you suggest can be created dynamically. That approach is followed in another project we work on (implemented in this function), but there it is a bit easier because all models keep the same interface and you just need to figure out in what order a specific interface of all the models need to be run. I'm not entirely sure how we could apply that when each node has its own, different interface, but it might be worth giving it some thought. |
Well this sequence was simply selected as one adjudged to apply to a reasonably large number of water systems. There is a logic to the order (broadly following the sequence of 'water flows downstream') and in general swapping would often cause strange model behaviours. Following the same reasoning, if a user adds new behaviour, functions or nodes - then they will need to think about where does water flow and if the ordering needs to change to capture their effects of interest. It will likely be somewhat iterative, i.e., they will try something, find it behaves not quite how they hoped, and adjust the ordering accordingly. This is also how I arrived at the default order. Yes it's correct that unused nodes wouldn't need to be called. Our preprint explains quite a bit of reasoning behind why it is helpful to give a user control of this https://egusphere.copernicus.org/preprints/2023/egusphere-2023-1662/egusphere-2023-1662.pdf Doing something automatic as you link sounds very attractive, but yes also complicated |
OK, in that case, I'd suggest to:
Then, all customizations can be included via either custom nodes and this custom sequence. Does it makes sense? |
-Update model: -- Add custom orchestration -- Load/save the custom orchestration -Update model_extensions.py -- Remove custom orchestration (this is now simplified) - Update settings_saved.yaml -- Add custom orchestration
I've reviewed the changes made by @barneydobson and generated the same results as before without other errors reported. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added a few more comments.
I'm going to write in a separate PR a few suggestions to improve the way known nodes are currently handled, which I think might simplify this a bit.
|
||
def extensions(model): | ||
# Apply customations | ||
model.nodes['my_groundwater'].residence_time *= 2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is updating residence_time
(and other node properties) not a customisation that can be done via the input file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on my experience, it can be done via the input file but then the user needs to define the new type of nodes, which requires more efforts in naming and tidying everything.
# Add dcorator to a reservoir | ||
model.nodes['my_reservoir'].net_evaporation = model.nodes['my_reservoir'].empty_vqip() | ||
model.nodes['my_reservoir'].mass_balance_out.append(lambda self=model.nodes['my_reservoir']: self.net_evaporation) | ||
model.nodes['my_reservoir'].net_precipitation = model.nodes['my_reservoir'].empty_vqip() | ||
model.nodes['my_reservoir'].mass_balance_in.append(lambda self=model.nodes['my_reservoir']: self.net_precipitation) | ||
|
||
model.nodes['my_reservoir'].make_abstractions = wrapper_reservoir(model.nodes['my_reservoir'], model.nodes['my_reservoir'].make_abstractions) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any reason why this type of extensions cannot be implemented via a custom node? Something like:
# Inherit from the relevant Node sublcass, of course
class MyReservoir(Node):
def __init__(self, ...):
super().__init__(self, ...)
self.net_evaporation = self.empty_vqip()
self.net_precipitation = self.empty_vqip()
self.mass_balance_out.append(self.net_evaporation)
self.mass_balance_in.append(self.net_precipitation)
def make_abstractions(self):
# do custom stuff to the node
...
return super().make_abstractions()
And then use this new node in the input file.
While your approach works, it requires setting two different ways of adding extra functionality and while, as discussed, we do want that flexibility, it will be much better if those additions can be somewhat streamlined into a few specific approaches. Creating custom Nodes seems like the best approach as thy are used in very specific places and very specific ways. In combination with the workflow customisation added by @barneydobson , that's a lot of fleixbility.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To me, the reason why we have two different ways of customisation is:
(1) sometimes we want the node keeping the default designs, we just want to change the values of some parameters.
This is essential, as we plan to develop wrappers to enable sensitivity tests in the future, which needs the portal to only change the parameters.
(2) sometimes we want the new node that has a new design (functions and attributes), we need to write a new class.
Formulating a unified way to enable these two situations is definitely great. But to me I feel like keep them separate is more logical, which may not be the most user-friendly way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(1) sometimes we want the node keeping the default designs, we just want to change the values of some parameters. This is essential, as we plan to develop wrappers to enable sensitivity tests in the future, which needs the portal to only change the parameters.
Cannot we just define this override in the input file when requesting a node? Maybe in an override
section? If it is just a value, I see no reason why you cannot do sensitivity analysis with this.
In my opinion, if some user interface is not user friendly, then the underlying architecture resulting in that interface needs improvement. As developer, we can modify the software as much as we can if that makes life for end users easier. In this particular case, if customisation can be done via de config file for simple cases and custom nodes for the more complicated ones, that will make the process more robust and reproducible, then that's a step we should take. Enabling a free for all manipulation of the model can only end in disaster...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@barneydobson , any comment on this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are definitely correct @dalonsoa. For sensitivity test, I've made new changes on the code to let the model receive a dict when loading the model. This new dict contains the information of changes in the yaml and will be used to replace the original yaml dict before adding nodes/arcs. It works as expected and I've push the changes to a new channel 'enable_receiving_parameters'.
However, the key challenge here is that most nodes have parameters that are initialised automatically and do not need external args input. These parameters cannot be changed via overriding the yaml dict but can only be changed after the model has been built. One way for unifying this is to substantially rewrite all the node scripts to compulsorily initialise every parameter via args input - then all the parameters can be only changed via overriding. I'll let @barneydobson to comment on this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, lets see if I got this right. The nodes are all initialised (unless created programmatically) in this line:
wsi/wsimod/orchestration/model.py
Line 418 in e3b66fe
self.nodes_type[type_][name] = NODES_REGISTRY[node_type](**dict(data)) |
Where the data
is all the configuration associated to the node of interest in the config file, right?
The doubt to enable an easier customisation of the nodes by the user are:
- expand the constructor to accept more input parameters and include those in the config file, so the node is created with the right parameters at initialisation time,
- leave the constructor as it, creating a fully working node with some default parameters, and then update selected parameters included in an override section of the config file via setters such that any side effects of those changes are propagated to the right places.
Is this correct?
I would go for the second option, which has also the advantage that is backwards compatible. I think it makes for a cleaner interface to have a core configuration that will produce a working model and then "tweak" the models based on overrides. I would do something like the following after adding the nodes and the arcs in:
# Initialise nodes
# Initialise arcs
# And then...
model.add_overrides(data.get("overrides", {}))
# And in Model
def add_overrides(self, config: Dict):
for node in config.get("nodes", {}):
type_ = node.pop("type_")
name = node.pop("name")
self.nodes_type[type_][name].apply_overrides(node)
for arc in config.get("arcs", {}):
name = arc.pop("name")
self.arcs[name].apply_overrides(arc)
The node.apply_overrides
and arc.apply_overrides
will be the methods that will call the appropriate setters to ensure that any side effects of modifying those parameters are taken care of.
apply_overrides
should be a method (not an abstract one) of the base class for nodes
and arcs
that will do nothing unless it is overridden. This way, only those specific nodes that can be overridden will need to have this method (and the relevant setters) implemented.
class Node:
def apply_overrides(self, config: Dict):
pass
class Arc:
def apply_overrides(self, config: Dict):
pass
In terms of how this can be done from a practical point of view, I would open issues for:
- Implement the above overriden logic. Should be harmless and backwards compatible with any existing code and config files.
- Identify which nodes/arcs have parameters that are worth to be overriden and open an issue for each of those nodes/arcs, indicating which parameters need such override. Watch out for recursive calls! You might need some hidden internal attributes that can be set directly and only use the setters in the overrides:
class MyNode:
def __init__(...):
# Does not use the setter
self._some_param = get_default_some_param(...)
@property
def some_param(self):
return self._some_param
@some_param.setter
def some_param(self, value):
if not value:
return
# Some complex logic
self._some_param = ...
# More logic here
def apply_overrides(self, config: Dict):
self.some_param = config.pop("some_param", None)
...
- Implement the above issues, one PR per issue, no more!
There might be some extra things to do, but I would say it is a safe approach, backwards compatible and that should give plenty of flexibility to pretty much do whatever you want via the configuration file. For the most complex cases, naturally the user can create their custom nodes or arcs, which I think we have agreed is useful to have.
Does this make sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep! Thanks for the help - I think the only thing we'd add to that is some default behaviour just overwrite a parameter directly as, for the majority of parameters, this should be suitable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @dalonsoa! @barneydobson What would be the best way forward? To ensure the consistency, do you want to open an example issue and make the changes for one node and then I'll follow it to change the others?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep - I will do the high level changes in Node
and Model
, and then do an example in WWTW
. After this I think your focus should be on Land
, since that will by far the most complicated - could even do an issue per Surface
- start with the base surface and we'll see how many changes there are on the PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
self.orchestration = [{'FWTW' : 'treat_water'}, | ||
{'Demand' : 'create_demand'}, | ||
{'Land' : 'run'}, | ||
{'Groundwater' : 'infiltrate'}, | ||
{'Sewer' : 'make_discharge'}, | ||
{'Foul' : 'make_discharge'}, | ||
{'WWTW' : 'calculate_discharge'}, | ||
{'Groundwater' : 'distribute'}, | ||
{'River' : 'calculate_discharge'}, | ||
{'Reservoir' : 'make_abstractions'}, | ||
{'Land' : 'apply_irrigation'}, | ||
{'WWTW' : 'make_discharge'}, | ||
{'Catchment' : 'route'}] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed, this seems like the right approach, but I would try to think on a way of facilitating the user to use this proven and tested workflow without having to re-write it from scratch. In other words, setting up a function (happy to write it) that replaces all the Growndwater
steps with MyGroundwater
steps if I'm using a custom groundwater node.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@liuly12 @barneydobson , any comment on this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree. I'm also wondering it might be worth printing this orchestration list (i.e., sequence of calculation order in the model) after the replacement, to let the users fully notice and reconfirm the change of calculation order. This is because the change of calculation order is often the source of bugs after adding new node types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess some insert/replace functionality or something should do the job. If we're thinking this is all in config
you'd have something like:
orchestration:
replace_nodetype:
Groundwater: MyGroundwater
at:
WWTW: make_discharge
insert_after:
MyGroundwater: do_a_flip
OR (would achieve the same thing)
orchestration:
replace_orchestration:
FWTW: treat_water
Demand: create_demand
Land: run
MyGroundwater: infiltrate
Sewer: make_discharge
Foul: make_discharge
WWTW: calculate_discharge
MyGroundwater: distribute
River: calculate_discharge
Reservoir: make_abstractions
Land: apply_irrigation
WWTW: make_discharge
MyGroundwater: do_a_flip
Catchment: route
I think both options should be available because someone might want to do something indecently complicated like having a mix of Groundwater node types performing different functions at different points in the orchestration (and something like that you'll definitely get to a point where you just need to replace_orchestration
)
from wsimod.nodes.storage import Storage | ||
from wsimod.core import constants | ||
|
||
class Groundwater_h(Storage): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No idea but, given that this is a custom version of Groundwater
, would it be helpful to inherit from Groundwater
and not storage? That way we can somewhat consider that your custom GroundwaterH
can be used wherever the built-in Groundwater
is used, just doing things differently.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It can be. In some cases, the default node is too complicated (e.g., Land) to be revised. A user may want to create a much simpler new representation from scratch, which would be preferably based on the primary classes of nodes. I think it is more a choice to the users, and here I just provide an example that may represent a not very efficient user's choice :).
I'll work on this first thing tomorrow morning. |
I've left a few comments to the Having said that, I have to admit that the decorator approach might be simpler to implement and to understand, compiling in a few lines (the Keeping that aside, on a totally separate topic, there are a few aspects that could be improved for readability purposes, like the name of some nodes - a long string of numbers and letters is not that informative - or the (visual) structure of the config file. Being so extremely long, I'm tempted to suggest to break it down into multiple files, but that's another story. |
Thanks @dalonsoa ! I agree that most of the comments can be done via setting a new subclass, especially regarding those changes on specific functions of a class. Two points specifically listed here (I saw the comments existing Friday night but they somehow disappear Saturday morning ...) (1) change of In the current (2) change of The current So I guess you might have seen that parameter attribute changes for a class is very handy via the So what do you think? Shall we keep both or only the new subclass one? |
Yeah, to be honest I'm not sure what to suggest. I like the more rigorous approach of sub-classing, but the simplicity of the decorators has also enormous value. Maybe, what I don't really like of the decorators option is the lack of structure on how wsimod is modified. You can change absolutely everything in wsimod without restriction or control. Maybe, using the decorators approach in full would be an option? Something like the following content for the model extensions file: import wsimod.extensions as extend
@extend.node("1823-land", method="run")
def land_1823_run(...):
...
@extend.surface("1823-land", index=-1, method="atmospheric_deposition")
def adjust_atmospheric_deposition(...):
...
@extend.model_attribute("river_discharge_order")
def river_discharge_order(...):
return ['1823-river'] Here I don't know. It might be overkilling. @barneydobson , @liuly12 what do you think? |
Can I just clarify that we're all in agreement that custom subclasses is definitely something that we need in extensions, if so maybe we can start with that? My intention was that new model development is achieved by new subclasses, and customisations to accommodate specific circumstances that are unlikely to be widely reused is to be achieved by decorators. What I would want to avoid is a new subclass that also requires another node type to be decorated. I think in this case the small change to the other node type should also be subclassed so that it is properly packaged with the new behaviour. Of course this is a blurred line, and because decorators has a bit less overhead, people will probably prefer it. Though if custom subclasses are well supported by extensions, I think the overhead is reduced and people may equally use it. I think @liuly12 's point that using a subclass for a tiny change seems excessive, and for sure this could not just be excessive but could lead to whatever those things are called with multiple inheritance issues. For me @dalonsoa 's suggestion to use decorators 'properly' may be a satisfactory solution for this (provided subclasses are enabled too, which I hope we are agreed on), and in particular in a study with lots of decorators that are applied differently to different nodes, then validation feels essential. A question for @dalonsoa is whether the decorator method you propose above can pass other objects as arguments that can exist within the scope of the decorated function. For example, it is very common that the rules for an abstraction licence (i.e., how much water can I take out of a river and put in my reservoir) can depend on the state of something that is disconnected (e.g., the flow at a distant gauging station). A situation such as this is a good use of a decorator since the rules for that licence will not be reused in any other situation, however, to evaluate the rule, the reservoir object must have state information on another object that it is not directly connected to. |
@dalonsoa little reminder for above (sorry) |
I don't understand what's going on with this repo and the other that I do not receive notifications... I think the use case you suggest should be possible to arrange. It is a question of providing the decorators with the right flexibility. I guess it could be something along the lines of: @extend.node("some_node", method="discharge", depends_on="other_node")
def some_node_discharge(..., other_node):
... In this case, the decorator will fetch Ok, so the conclusion is:
Is this correct? |
Agree. Alternative could be we enable sub-classing and free-decorators for current (as I feel it very flexible) and regulate the latter in the future. @barneydobson What do you think? |
Are you planning to create an example or something on that 'regulated' decorator? @barneydobson |
We can have examples/notebook as a separate PR (so that they don't hold things up) |
This branch aims to test and enable customisation. The customisation includes two scenarios: decorating existing nodes/arcs/models objects & introducing new nodes developed by the users.
The main changes include:
(1) code added in the load() function in model.py to allow reading extension files;
(2) add example yaml (settings_saved.yaml) and data (timeseries_data_saved.csv) files, and the yaml files provide the addresses of model_extention (for decoration) and new nodes (Groundwater_h) scripts;
(3) add outputs (flows/tanks/surfaces.csv) after running the example, which has been validated by liuly12 to show the customisations are working as expected.
Please review the changes and test examples and let me know your further thoughts for improvement.