-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow setting params with command-line arguments #267
Conversation
For example, you can now run: $ vr_run --param hydrology.initial_soil_moisture=0.2 You can specify multiple parameters with multiple --param flags. Closes #266.
1d2d12c
to
3733e95
Compare
Codecov Report
@@ Coverage Diff @@
## develop #267 +/- ##
===========================================
- Coverage 95.28% 94.27% -1.02%
===========================================
Files 43 43
Lines 1781 1816 +35
===========================================
+ Hits 1697 1712 +15
- Misses 84 104 +20
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks sensible to me, just had a query about how parameters defined in both config files and the command line will be handled.
virtual_rainforest/core/config.py
Outdated
@@ -432,7 +438,7 @@ def build_config(self) -> None: | |||
""" | |||
|
|||
# Get the config dictionaries | |||
input_dicts = list(self.toml_contents.values()) | |||
input_dicts = list(self.toml_contents.values()) + self.extra_params |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Am I right in thinking that this step adds the extra parameters to the existing config dictionaries? If so is there anything here that handles the case where the config files define a parameter which is then supplied as an extra parameter? I'm not actually 100% sure what the desired behaviour would be in that case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If a parameter is set in one of the config files and you then also try to set it via a command-line arg, then you'll get an error, just as you would if you tried to set it twice in different config files.
I can see that there might be a case where a user wants to override a parameter set in a config file, but I don't think that should be the default behaviour as it's a bit error prone. If we want to add this feature later we can do (e.g. we could have a separate --param-override
flag for this case).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh yeah that makes sense! Also agree that overriding parameters seems error prone
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure it is dangerous and I think there is a case for making the override behaviour the explicit mechanism for this approach.
-
The configuration process at the moment takes a bunch of files and uses those to build the configuration tree, substituting default values for any settings that aren't explicitly mentioned. That gives us a base configuration, with everything set.
-
If we want to tweak those settings, we shouldn't need to worry about whether the base files set a value or if it is just being populated by default. That would put us in a position of having to maintain multiple sets of base configuration files, each one omitting the setting that we want to tweak. It would be neater to have a default config that can just be adjusted as needed.
-
Explicitly providing extra params would then be a way to substitute values in a particular run for a sensitivity analysis, by updating the
Config
object created from the base configuration for a given run. Those extra params aren't then simply added to the TOML contents, but are fed into a separateConfig
method. That does then need some kind of a validation step though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that method would be something like:
def override_config(self, extra_params) -> None:
"""Adjust an already created config to update settings
Raises:
ValidationError: if duplicate config definitions occur across files.
"""
# Copy the current dictionary
master = deepcopy(self.__dict__)
# Update that dictionary
# TODO - don't know if this works to replace values in master even when conflicts found,
# or whether we need an explicit switch or method to do override.
master, conflicts = config_merge(master, extra_params, conflicts=tuple())
# Ignore any conflicts - explicitly overriding settings - so now revalidate against the
# schema defined in the base config.
self.update(master)
self.validate_config()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At the moment, none of the constants are populated by default in the config (i.e. not using default values set within model schema). It's only after config validation when a set of constants is used to create a model specific constants class that default values come in, These defaults are hardcoded into the various constants classes, and just aren't replaced if there is no value stored in the config to replace them with.
So I don't really see a danger the current approach, unless we change how default constants are handled
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The config_merge
does indeed return the updated dictionary when conflicts are present:
In [11]: from virtual_rainforest.core.config import Config, config_merge
In [12]: cfg = Config('tests/core/data/all_config.toml')
In [13]: updated, conflicts = config_merge(c, {'core': {'layers': {'soil_layers': 3}}})
In [14]: updated['core']['layers']['soil_layers']
Out[14]: 3
So, this should work as:
def override_config(self, extra_params) -> None:
"""Adjust an already created config to update settings
Raises:
ValidationError: if an invalid value is provided for any configuration settings.
"""
# Update the dictionary
updated, conflicts = config_merge(self, extra_params, conflicts=tuple())
# Ignore any conflicts - explicitly overriding settings - so now revalidate against the
# schema defined in the base config.
self.update(updated)
self.validate_config()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Obviously, if changing the grid size or another part of the config that does have a default setting through this method would be risky, but I don't think we would want to do that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So configuration settings that are then checked against other inputs/data could cause all sorts of chaos.
But this process should only (?) ever be invoked when the Config
is built and validated and bad overrides would cause exactly the same errors as bad initial settings - they can be JSONSchema invalid and get trapped early - or bad/inconsistent in setting up the rest of the simulation and cause a later error.
Exposing a Config.override_config
method does raise a route where programmatic use could change the Config
half way through a simulation (and we should very clearly say that this is extremely ill-advised!), but in vr_run
it would only ever be something like:
config = Config(toml_files)
config.override_config(extra_params)
# Now leave config alone!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This works - and that is probably the thing that matters now - but I think having to avoid duplicating configuration settings in TOML files and extra_params
is another route to piles of slightly differing configuration TOML files. This approach definitely avoids having multiple files that differ in just one (or two or three...) values, but does require unique config files for each param set being altered in a specific sensitivity analysis.
What do you think? @jacobcook1995 - what do you think the dangers here? We'd have to explicitly validate any overridden values, but we have the mechanism to do that.
Incidentally, @alexdewar, my son Thomas has just gotten quite into Exploding Kittens - the Beard Cat icon was instantly familiar 😄
virtual_rainforest/core/config.py
Outdated
self, cfg_paths: Union[str, Path, list[Union[str, Path]]], auto: bool = True | ||
self, | ||
cfg_paths: Union[str, Path, list[Union[str, Path]]], | ||
extra_params: Optional[list[dict[str, Any]]] = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm curious as to why this uses Optional
and None
when the custom attribute is immediately set to be an empty list if it isn't provided. As opposed to doing:
extra_params: Optional[list[dict[str, Any]]] = None, | |
extra_params: list[dict[str, Any]] = [], |
That is then a mutable default, which is dangerous, but it could be a tuple of dicts instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. We're deepcopy
ing it anyway, so it doesn't matter if it's mutable.
virtual_rainforest/core/config.py
Outdated
@@ -432,7 +438,7 @@ def build_config(self) -> None: | |||
""" | |||
|
|||
# Get the config dictionaries | |||
input_dicts = list(self.toml_contents.values()) | |||
input_dicts = list(self.toml_contents.values()) + self.extra_params |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure it is dangerous and I think there is a case for making the override behaviour the explicit mechanism for this approach.
-
The configuration process at the moment takes a bunch of files and uses those to build the configuration tree, substituting default values for any settings that aren't explicitly mentioned. That gives us a base configuration, with everything set.
-
If we want to tweak those settings, we shouldn't need to worry about whether the base files set a value or if it is just being populated by default. That would put us in a position of having to maintain multiple sets of base configuration files, each one omitting the setting that we want to tweak. It would be neater to have a default config that can just be adjusted as needed.
-
Explicitly providing extra params would then be a way to substitute values in a particular run for a sensitivity analysis, by updating the
Config
object created from the base configuration for a given run. Those extra params aren't then simply added to the TOML contents, but are fed into a separateConfig
method. That does then need some kind of a validation step though.
virtual_rainforest/core/config.py
Outdated
@@ -432,7 +438,7 @@ def build_config(self) -> None: | |||
""" | |||
|
|||
# Get the config dictionaries | |||
input_dicts = list(self.toml_contents.values()) | |||
input_dicts = list(self.toml_contents.values()) + self.extra_params |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that method would be something like:
def override_config(self, extra_params) -> None:
"""Adjust an already created config to update settings
Raises:
ValidationError: if duplicate config definitions occur across files.
"""
# Copy the current dictionary
master = deepcopy(self.__dict__)
# Update that dictionary
# TODO - don't know if this works to replace values in master even when conflicts found,
# or whether we need an explicit switch or method to do override.
master, conflicts = config_merge(master, extra_params, conflicts=tuple())
# Ignore any conflicts - explicitly overriding settings - so now revalidate against the
# schema defined in the base config.
self.update(master)
self.validate_config()
Is there any real danger in terms of the validation? Because the new parameter gets added to the list of config files to validate, i.e. the change is made before any validation takes place |
@davidorme I guess my view is that overriding existing config settings introduces a fair bit of complexity to this pull request without really addressing any immediate problem? Like if down the line it becomes very annoying to have to ensure that options have not been populated in config files when doing sensitivity analysis then we can always take the approach you've suggested here, but I'm not sure there's any great need to address it now as a potential future problem (because there are an awful lot of those). |
This method is a vast improvement on having to write multiple files to run variations for a given SA but I think having to create a separate compatible set of base config files for any given sensitivity analysis is an immediate problem. If there isn't a danger to the override behaviour and if the resulting user experience is better, then I'm not sure why we don't just implement it, given that we already have the mechanisms in place to do so. |
To give a concrete example:
|
@davidorme I take your point that that it could be useful to be able to override params so you can reuse config files between simulation runs. I think it would be a mistake to silently override the value the user has put in a config file, though, in case it was accidental; we should at least warn them. I'm happy to have a go at this, but if it turns out to be finicky, then perhaps we could merge this as is and open a separate issue for the "allow users to override parameters" feature? My main goal atm is to get a minimal version of the SA working so that we can all have a discussion about it. Once that's done we can of course polish things. |
That sounds good to me - we shouldn't lose the bigger SA picture - but I think this is worth fixing here if we can. We can warn users - no harm in that - but the context of sensitivity analyses and also the documentation for |
Ok, I've had a go. Let me know what you think. I haven't written tests yet, but can do if people are happy with the implementation. |
It turns out override_config() doesn't raise ValidationError after all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That implementation looks good to me. I'd thought of Config.override_config
as something to call post __init__
but embedding it in __init__
also makes sense. I briefly considered suggesting we make it 'private' as Config._override_config
to further discourage use, but honestly none of those Config
methods are expected to be used except via __init__
in 99% of use cases.
I hadn't thought that the conflicts
object provides an automatic list of altered settings - so that is pretty seamless for reporting changes. I guess the thing there is that the conflicts will be against things set in files and things set by default - but honestly the whole point of this mechanism is to make it easy to change settings, so I think that distinction is largely artificial and recording in logs which settings have been changed is useful.
Thanks @davidorme. I think we could probably safely make all of those methods private and remove the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
I'll merge this now so that I can rebase my other work on top of it. |
Description
This PR adds the option to set additional parameters when invoking
vr_run
by using command-line arguments, without having to create additional config files.For example, it allows you to do something like this:
The main motivation for this is that I want to be able to invoke
vr_run
in parallel with different parameters as part of my work on the SA (see #239). While that is currently possible, it requires generating config files as an intermediate step, which makes the process substantially more complex.Closes #266.
Type of change
Key checklist
pre-commit
checks:$ pre-commit run -a
$ poetry run pytest
Further checks