Allow setting params with command-line arguments #267

alexdewar · 2023-07-31T16:20:39Z

Description

This PR adds the option to set additional parameters when invoking vr_run by using command-line arguments, without having to create additional config files.

For example, it allows you to do something like this:

vr_run --param hydrology.initial_soil_moisture=0.2 --param some.other.param=value dummy_data/

The main motivation for this is that I want to be able to invoke vr_run in parallel with different parameters as part of my work on the SA (see #239). While that is currently possible, it requires generating config files as an intermediate step, which makes the process substantially more complex.

Closes #266.

Type of change

New feature (non-breaking change which adds functionality)
Optimization (back-end change that speeds up the code)
Bug fix (non-breaking change which fixes an issue)

Key checklist

Make sure you've run the pre-commit checks: $ pre-commit run -a
All tests pass: $ poetry run pytest

Further checks

Code is commented, particularly in hard-to-understand areas
Tests added that prove fix is effective or that feature works

For example, you can now run: $ vr_run --param hydrology.initial_soil_moisture=0.2 You can specify multiple parameters with multiple --param flags. Closes #266.

codecov-commenter · 2023-07-31T16:27:46Z

Codecov Report

Merging #267 (1ec1fb0) into develop (382daa1) will decrease coverage by 1.02%.
The diff coverage is 43.24%.

@@             Coverage Diff             @@
##           develop     #267      +/-   ##
===========================================
- Coverage    95.28%   94.27%   -1.02%     
===========================================
  Files           43       43              
  Lines         1781     1816      +35     
===========================================
+ Hits          1697     1712      +15     
- Misses          84      104      +20

Files Changed	Coverage Δ
virtual_rainforest/entry_points.py	`51.02% <33.33%> (-23.98%)`	⬇️
virtual_rainforest/core/config.py	`98.52% <83.33%> (-0.47%)`	⬇️
virtual_rainforest/main.py	`81.19% <100.00%> (ø)`

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

jacobcook1995

Looks sensible to me, just had a query about how parameters defined in both config files and the command line will be handled.

jacobcook1995 · 2023-08-01T08:23:01Z

virtual_rainforest/core/config.py

@@ -432,7 +438,7 @@ def build_config(self) -> None:
        """

        # Get the config dictionaries
-        input_dicts = list(self.toml_contents.values())
+        input_dicts = list(self.toml_contents.values()) + self.extra_params


Am I right in thinking that this step adds the extra parameters to the existing config dictionaries? If so is there anything here that handles the case where the config files define a parameter which is then supplied as an extra parameter? I'm not actually 100% sure what the desired behaviour would be in that case?

If a parameter is set in one of the config files and you then also try to set it via a command-line arg, then you'll get an error, just as you would if you tried to set it twice in different config files.

I can see that there might be a case where a user wants to override a parameter set in a config file, but I don't think that should be the default behaviour as it's a bit error prone. If we want to add this feature later we can do (e.g. we could have a separate --param-override flag for this case).

Ahh yeah that makes sense! Also agree that overriding parameters seems error prone

I'm not sure it is dangerous and I think there is a case for making the override behaviour the explicit mechanism for this approach.

The configuration process at the moment takes a bunch of files and uses those to build the configuration tree, substituting default values for any settings that aren't explicitly mentioned. That gives us a base configuration, with everything set.

If we want to tweak those settings, we shouldn't need to worry about whether the base files set a value or if it is just being populated by default. That would put us in a position of having to maintain multiple sets of base configuration files, each one omitting the setting that we want to tweak. It would be neater to have a default config that can just be adjusted as needed.

Explicitly providing extra params would then be a way to substitute values in a particular run for a sensitivity analysis, by updating the Config object created from the base configuration for a given run. Those extra params aren't then simply added to the TOML contents, but are fed into a separate Config method. That does then need some kind of a validation step though.

I think that method would be something like:

def override_config(self, extra_params) -> None: """Adjust an already created config to update settings Raises: ValidationError: if duplicate config definitions occur across files. """ # Copy the current dictionary master = deepcopy(self.__dict__) # Update that dictionary # TODO - don't know if this works to replace values in master even when conflicts found, # or whether we need an explicit switch or method to do override. master, conflicts = config_merge(master, extra_params, conflicts=tuple()) # Ignore any conflicts - explicitly overriding settings - so now revalidate against the # schema defined in the base config. self.update(master) self.validate_config()

At the moment, none of the constants are populated by default in the config (i.e. not using default values set within model schema). It's only after config validation when a set of constants is used to create a model specific constants class that default values come in, These defaults are hardcoded into the various constants classes, and just aren't replaced if there is no value stored in the config to replace them with.

So I don't really see a danger the current approach, unless we change how default constants are handled

The config_merge does indeed return the updated dictionary when conflicts are present:

In [11]: from virtual_rainforest.core.config import Config, config_merge In [12]: cfg = Config('tests/core/data/all_config.toml') In [13]: updated, conflicts = config_merge(c, {'core': {'layers': {'soil_layers': 3}}}) In [14]: updated['core']['layers']['soil_layers'] Out[14]: 3

So, this should work as:

def override_config(self, extra_params) -> None: """Adjust an already created config to update settings Raises: ValidationError: if an invalid value is provided for any configuration settings. """ # Update the dictionary updated, conflicts = config_merge(self, extra_params, conflicts=tuple()) # Ignore any conflicts - explicitly overriding settings - so now revalidate against the # schema defined in the base config. self.update(updated) self.validate_config()

Obviously, if changing the grid size or another part of the config that does have a default setting through this method would be risky, but I don't think we would want to do that?

So configuration settings that are then checked against other inputs/data could cause all sorts of chaos.

But this process should only (?) ever be invoked when the Config is built and validated and bad overrides would cause exactly the same errors as bad initial settings - they can be JSONSchema invalid and get trapped early - or bad/inconsistent in setting up the rest of the simulation and cause a later error.

Exposing a Config.override_config method does raise a route where programmatic use could change the Config half way through a simulation (and we should very clearly say that this is extremely ill-advised!), but in vr_run it would only ever be something like:

config = Config(toml_files) config.override_config(extra_params) # Now leave config alone!

davidorme

This works - and that is probably the thing that matters now - but I think having to avoid duplicating configuration settings in TOML files and extra_params is another route to piles of slightly differing configuration TOML files. This approach definitely avoids having multiple files that differ in just one (or two or three...) values, but does require unique config files for each param set being altered in a specific sensitivity analysis.

What do you think? @jacobcook1995 - what do you think the dangers here? We'd have to explicitly validate any overridden values, but we have the mechanism to do that.

Incidentally, @alexdewar, my son Thomas has just gotten quite into Exploding Kittens - the Beard Cat icon was instantly familiar 😄

davidorme · 2023-08-01T09:00:02Z

virtual_rainforest/core/config.py

-        self, cfg_paths: Union[str, Path, list[Union[str, Path]]], auto: bool = True
+        self,
+        cfg_paths: Union[str, Path, list[Union[str, Path]]],
+        extra_params: Optional[list[dict[str, Any]]] = None,


I'm curious as to why this uses Optional and None when the custom attribute is immediately set to be an empty list if it isn't provided. As opposed to doing:

Suggested change

extra_params: Optional[list[dict[str, Any]]] = None,

extra_params: list[dict[str, Any]] = [],

That is then a mutable default, which is dangerous, but it could be a tuple of dicts instead.

Good point. We're deepcopying it anyway, so it doesn't matter if it's mutable.

davidorme · 2023-08-01T09:26:45Z

virtual_rainforest/core/config.py

@@ -432,7 +438,7 @@ def build_config(self) -> None:
        """

        # Get the config dictionaries
-        input_dicts = list(self.toml_contents.values())
+        input_dicts = list(self.toml_contents.values()) + self.extra_params


I'm not sure it is dangerous and I think there is a case for making the override behaviour the explicit mechanism for this approach.

The configuration process at the moment takes a bunch of files and uses those to build the configuration tree, substituting default values for any settings that aren't explicitly mentioned. That gives us a base configuration, with everything set.

If we want to tweak those settings, we shouldn't need to worry about whether the base files set a value or if it is just being populated by default. That would put us in a position of having to maintain multiple sets of base configuration files, each one omitting the setting that we want to tweak. It would be neater to have a default config that can just be adjusted as needed.

Explicitly providing extra params would then be a way to substitute values in a particular run for a sensitivity analysis, by updating the Config object created from the base configuration for a given run. Those extra params aren't then simply added to the TOML contents, but are fed into a separate Config method. That does then need some kind of a validation step though.

davidorme · 2023-08-01T09:48:35Z

virtual_rainforest/core/config.py

@@ -432,7 +438,7 @@ def build_config(self) -> None:
        """

        # Get the config dictionaries
-        input_dicts = list(self.toml_contents.values())
+        input_dicts = list(self.toml_contents.values()) + self.extra_params


I think that method would be something like:

def override_config(self, extra_params) -> None: """Adjust an already created config to update settings Raises: ValidationError: if duplicate config definitions occur across files. """ # Copy the current dictionary master = deepcopy(self.__dict__) # Update that dictionary # TODO - don't know if this works to replace values in master even when conflicts found, # or whether we need an explicit switch or method to do override. master, conflicts = config_merge(master, extra_params, conflicts=tuple()) # Ignore any conflicts - explicitly overriding settings - so now revalidate against the # schema defined in the base config. self.update(master) self.validate_config()

jacobcook1995 · 2023-08-01T10:12:48Z

Is there any real danger in terms of the validation? Because the new parameter gets added to the list of config files to validate, i.e. the change is made before any validation takes place

jacobcook1995 · 2023-08-01T10:57:37Z

@davidorme I guess my view is that overriding existing config settings introduces a fair bit of complexity to this pull request without really addressing any immediate problem?

Like if down the line it becomes very annoying to have to ensure that options have not been populated in config files when doing sensitivity analysis then we can always take the approach you've suggested here, but I'm not sure there's any great need to address it now as a potential future problem (because there are an awful lot of those).

davidorme · 2023-08-01T11:10:12Z

This method is a vast improvement on having to write multiple files to run variations for a given SA but I think having to create a separate compatible set of base config files for any given sensitivity analysis is an immediate problem. If there isn't a danger to the override behaviour and if the resulting user experience is better, then I'm not sure why we don't just implement it, given that we already have the mechanisms in place to do so.

davidorme · 2023-08-01T11:23:10Z

To give a concrete example:

Let's say we run a big long analysis that saves a whole bunch of stuff with some non-default configured settings like plants.beta_value = 0.5 and soil.zeta_value=0.95.
Now we want to run an SA to see how much GPP varies if we change plants.beta_value = range(0.4, 0.6, 0.01) and soil.zeta_value= range(0.85, 1.05, 0.1).
Those SA runs have to have different config files to our base run, because they cannot contain plants.beta_value = 0.5 and soil.zeta_value=0.95.
That is less clear and less reproducible.

alexdewar · 2023-08-01T11:41:50Z

@davidorme I take your point that that it could be useful to be able to override params so you can reuse config files between simulation runs. I think it would be a mistake to silently override the value the user has put in a config file, though, in case it was accidental; we should at least warn them.

I'm happy to have a go at this, but if it turns out to be finicky, then perhaps we could merge this as is and open a separate issue for the "allow users to override parameters" feature? My main goal atm is to get a minimal version of the SA working so that we can all have a discussion about it. Once that's done we can of course polish things.

davidorme · 2023-08-01T11:49:15Z

That sounds good to me - we shouldn't lose the bigger SA picture - but I think this is worth fixing here if we can.

We can warn users - no harm in that - but the context of sensitivity analyses and also the documentation for extra_params should be very clear that we are adjusting configs. Maybe calling it alter_params or override_params to give a clear hint that this is what it will do!

alexdewar · 2023-08-01T12:24:06Z

Ok, I've had a go. Let me know what you think.

I haven't written tests yet, but can do if people are happy with the implementation.

It turns out override_config() doesn't raise ValidationError after all.

davidorme

That implementation looks good to me. I'd thought of Config.override_config as something to call post __init__ but embedding it in __init__ also makes sense. I briefly considered suggesting we make it 'private' as Config._override_config to further discourage use, but honestly none of those Config methods are expected to be used except via __init__ in 99% of use cases.

I hadn't thought that the conflicts object provides an automatic list of altered settings - so that is pretty seamless for reporting changes. I guess the thing there is that the conflicts will be against things set in files and things set by default - but honestly the whole point of this mechanism is to make it easy to change settings, so I think that distinction is largely artificial and recording in logs which settings have been changed is useful.

alexdewar · 2023-08-01T12:42:09Z

Thanks @davidorme.

I think we could probably safely make all of those methods private and remove the auto parameter to __init__. That way we wouldn't have to worry about users using them at all. There aren't any current uses of them outside of __init__ and anyone will want to invoke them directly anyway.

jacobcook1995

LGTM!

alexdewar · 2023-08-02T08:42:25Z

I'll merge this now so that I can rebase my other work on top of it.

Fix typo

29cf0a4

alexdewar requested review from davidorme, jaideep777, TaranRallings, jacobcook1995, vgro and LivDaniel July 31, 2023 16:20

vr_run: Allow for passing params as command-line arguments

3733e95

For example, you can now run: $ vr_run --param hydrology.initial_soil_moisture=0.2 You can specify multiple parameters with multiple --param flags. Closes #266.

alexdewar force-pushed the command_line_params branch from 1d2d12c to 3733e95 Compare July 31, 2023 16:22

alexdewar changed the title ~~Command line params~~ Allow setting params with command-line arguments Jul 31, 2023

jacobcook1995 reviewed Aug 1, 2023

View reviewed changes

davidorme reviewed Aug 1, 2023

View reviewed changes

No need to make extra_params Optional

c034d08

Allow users to override parameters with command-line arguments

1ec1fb0

alexdewar requested review from davidorme and jacobcook1995 August 1, 2023 12:23

Fix docstring

cf7178b

It turns out override_config() doesn't raise ValidationError after all.

davidorme approved these changes Aug 1, 2023

View reviewed changes

alexdewar mentioned this pull request Aug 1, 2023

Allow setting path for merged config file with a parameter #270

Closed

jacobcook1995 approved these changes Aug 1, 2023

View reviewed changes

alexdewar merged commit cf74b38 into develop Aug 2, 2023

alexdewar deleted the command_line_params branch August 2, 2023 08:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow setting params with command-line arguments #267

Allow setting params with command-line arguments #267

alexdewar commented Jul 31, 2023 •

edited

Loading

codecov-commenter commented Jul 31, 2023 •

edited

Loading

jacobcook1995 left a comment

jacobcook1995 Aug 1, 2023

alexdewar Aug 1, 2023 •

edited

Loading

jacobcook1995 Aug 1, 2023

davidorme Aug 1, 2023

davidorme Aug 1, 2023

jacobcook1995 Aug 1, 2023

davidorme Aug 1, 2023 •

edited

Loading

jacobcook1995 Aug 1, 2023

davidorme Aug 1, 2023 •

edited

Loading

davidorme left a comment •

edited

Loading

davidorme Aug 1, 2023

alexdewar Aug 1, 2023

davidorme Aug 1, 2023

davidorme Aug 1, 2023

jacobcook1995 commented Aug 1, 2023

jacobcook1995 commented Aug 1, 2023

davidorme commented Aug 1, 2023

davidorme commented Aug 1, 2023

alexdewar commented Aug 1, 2023

davidorme commented Aug 1, 2023

alexdewar commented Aug 1, 2023

davidorme left a comment

alexdewar commented Aug 1, 2023

jacobcook1995 left a comment

alexdewar commented Aug 2, 2023

	extra_params: Optional[list[dict[str, Any]]] = None,
	extra_params: list[dict[str, Any]] = [],

Allow setting params with command-line arguments #267

Allow setting params with command-line arguments #267

Conversation

alexdewar commented Jul 31, 2023 • edited Loading

Description

Type of change

Key checklist

Further checks

codecov-commenter commented Jul 31, 2023 • edited Loading

Codecov Report

jacobcook1995 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alexdewar Aug 1, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

davidorme Aug 1, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

davidorme Aug 1, 2023 • edited Loading

Choose a reason for hiding this comment

davidorme left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jacobcook1995 commented Aug 1, 2023

jacobcook1995 commented Aug 1, 2023

davidorme commented Aug 1, 2023

davidorme commented Aug 1, 2023

alexdewar commented Aug 1, 2023

davidorme commented Aug 1, 2023

alexdewar commented Aug 1, 2023

davidorme left a comment

Choose a reason for hiding this comment

alexdewar commented Aug 1, 2023

jacobcook1995 left a comment

Choose a reason for hiding this comment

alexdewar commented Aug 2, 2023

alexdewar commented Jul 31, 2023 •

edited

Loading

codecov-commenter commented Jul 31, 2023 •

edited

Loading

alexdewar Aug 1, 2023 •

edited

Loading

davidorme Aug 1, 2023 •

edited

Loading

davidorme Aug 1, 2023 •

edited

Loading

davidorme left a comment •

edited

Loading