exp run: validate params filename and parameter names #5477

pmrowla · 2021-02-16T14:27:27Z

One thing I noticed when trying the changes from #5302 is that if the user specifies a parameter or params file that doesn't exist (i.e. typo's a param name) we just silently accept it and create the new file or new entry in the existing params file (this behavior existed before the -S/--set-param change as well).

If the specified parameter doesn't exist in the target stage/pipeline we should probably throw an error instead of silently ignoring it.

Reproduce

git clone https://github.com/iterative/example-get-started
cd example-get-started
dvc pull
virtualenv .venv
. .venv/bin/activate
pip install -r src/requirements.txt
dvc exp run -S feature.n_grams=3
git diff params.yaml

The experiment is run as normal and params.yaml has a new parameter:

diff --git a/params.yaml b/params.yaml
index e5a636f..aed2663 100644
--- a/params.yaml
+++ b/params.yaml
@@ -6,6 +6,7 @@ featurize:
   max_features: 3000
   ngrams: 2
 
+  n_grams: 3
 train:
   seed: 20170428
   n_est: 100
(END)

Note the typo in the parameter name.

The user thinks that they have changed a parameter, and no error is reported.

Expected

I'd expect parameters' existence are checked before making any modifications. So the line with

dvc exp run -S train.n_grams=3

should report that train.n_grams is not a valid parameter.

Environment information

Output of dvc doctor:

$ dvc doctor
DVC version: 2.4.3 (pip)
---------------------------------
Platform: Python 3.8.10 on Linux-5.8.0-59-generic-x86_64-with-glibc2.29
Supports: All remotes
Cache types: hardlink, symlink
Cache directory: ext4 on /dev/nvme0n1p2
Caches: local
Remotes: https
Workspace directory: ext4 on /dev/nvme0n1p2
Repo: dvc, git

The text was updated successfully, but these errors were encountered:

dberenbaum · 2021-07-05T17:18:19Z

Need to consider what's expected here. It could be useful for new users doing experiments who have no prior dvc experience to setup experiments without ever touching params.yaml by adding parameters this way. Maybe a warning is best?

mattlbeck · 2021-07-19T09:52:27Z

I have started to use dvc exp run in earnest recently. I have found this behaviour particularly unexpected and had assumed that misspelling a parameter would result in the experiment not running.

In my opinion the current behaviour is pretty dangerous and a major source of experiment error if the user does not validate that the params were correctly applied themselves afterwards.

I appreciate allowing users to stay away from params files, but I think some --verify-params flag would be very much appreciated at least.

mmeendez8 · 2021-08-24T09:38:11Z

I understand @mattlbeck and @pmrowla point here.

I would like to add another view point here. When dealing with DL frameworks, most time you have model config and training parameters already defined in the framework configuration, see for example:

So if all parameters have to exist in params.yaml, I am forced to duplicate all my config parameters twice!

dberenbaum · 2021-08-24T20:29:57Z

@mattlbeck Do you think a warning would be sufficient, or do you think it's important for the experiment to fail?

@mmeendez8 You only need to add to params.yaml those parameters that you change via experiments, right? So basically validating the parameters means you have to edit params.yaml directly instead of using --set-param if you are introducing a new parameter?

mattlbeck · 2021-08-24T21:30:34Z

If it is important to maintain a route to modifying parameters solely via the CLI, then I would suggest a new --verify-params flag that fails the experiment when new params are declared via a dvc exp run.

This is in keeping with the idea that more experienced users are more likely to maintain parameters via the config file, as they will also be the same users who discover additional flags to use such as this one.

I would not suggest a warn by default, as that may contradict the kinds of best practice you are attempting to encourage.

mmeendez8 · 2021-08-25T06:56:40Z

@dberenbaum yes, you are right. Problem here is when you are using the --queue flag. Would this work avoiding the flag, simply changing params.yaml and running multiple dvc exp run --queue?

mmeendez8 · 2021-08-25T12:05:41Z

I am checking this now in a local project for running an experiment with a different learning rate. I will try to give the best feedback I can.

I am using mmdetection so learning rate (and most hyperparams) are located in a config file see example here. I like the option of loading my params.yaml and modifying config file contents "dynamically", that was really useful for me. Otherwise I have to:

Create the mmdetection config file
Add learning rate parameter with the same value as in config file to params.yaml
Run dvc exp run -S learning_rate=new_value

So step 2 above is kind of repeating the configuration twice. I have to know beforehand that I am going to modify learning rate and add it to my params.yaml. Avoiding step two this would simply be:

Create the mmdetection config file
Run dvc exp run -S learning_rate=new_value

Following you comment:

@mmeendez8 You only need to add to params.yaml those parameters that you change via experiments, right? So basically validating the parameters means you have to edit params.yaml directly instead of using --set-param if you are introducing a new parameter?

For this, I would have to know beforehand which parameters am I going to modify and add them to the params.yaml with same value as in my mmdetection config file (as I would do with the learning_rate).

Problem here is that I can modify the learning rate, momentum, architecture details.... So I will end up repeating most of my config file in the params.yaml. That's why I found useful to simply set my parameters with the -S flag directly, even though they do not exist in the params.yaml.

I understand that this might not be the general use case for what you are tryining to build but I clearly vote for raising a warning 😸

dberenbaum · 2021-08-26T18:00:15Z

@mattlbeck I think the warning/error is most useful for new users, since you more experienced users probably can figure it out on your own 😉, which is why I'm not crazy about a flag that won't be obvious for new users.

@dberenbaum yes, you are right. Problem here is when you are using the --queue flag. Would this work avoiding the flag, simply changing params.yaml and running multiple dvc exp run --queue?

Yes, the queue will pick up whatever is in params.yaml. -S will simply overwrite that parameter's value in params.yaml.

2\. Add learning rate parameter with the same value as in config file to params.yaml

Sorry, I don't follow. Are you hard-coding the values in both the config file and params.yaml? How are the values in params.yaml being read into your model training code? I thought the example config you provided would be reading values from params.yaml. If you already have to provide values for all parameters, can they all be solely in params.yaml instead of in both files?

mmeendez8 · 2021-08-30T16:34:10Z

@dberenbaum

Are you hard-coding the values in both the config file and params.yaml?

Yes, I am hard-coding them in both. Mmdetection uses python files as config so this is the only way.

How are the values in params.yaml being read into your model training code?

Yaml params file is loaded programatically and merged with my config files (example here)

If you already have to provide values for all parameters, can they all be solely in params.yaml instead of in both files?

That would be a problem because mmdetection does not work with yaml files

I have created a post with the actual state for training a model with the mmdetection framework with dvc experiments that might serve to visualize the problem.

mattlbeck · 2021-08-31T00:17:19Z

@mmeendez8 I see where you are coming from. You are using --set-param to populate your config file "on the fly" with parameters that are being modified to avoid having to maintain the dvc parameter files yourself.

Another consideration is whether a novel parameter added with --set-param also updates the dvc.yaml to track that param, or if you can work around this by adding some parent reference as a catch-all parameter dependency in this scenario?

@dberenbaum I have an additional suggestion: throw an error when attempting to set a new param which reads something like ERROR: Parameter <param> does not exist in any parameter file. If you would like to set a new param, use --no-validate and provide a new flag like --no-validate as an extra confirmation a user must make to set new parameters via this command. This is akin to how --force flags are used in both dvc an git.

daavoo · 2021-08-31T10:11:01Z

Yes, I am hard-coding them in both. Mmdetection uses python files as config so this is the only way.

There is built-in support for using Python parameters files.

Yaml params file is loaded programatically and merged with my config files (example here)

I reviewed the linked example and it looks like you might be able to get a similar behavior using dvc's built-in functionality (no need for the custom merging of params.yaml and mmdetection configs), although it would require some non-intuitive changes:

Remove configs from deps (this is required in order to properly use dvc exp run --set-param)
Update config files to remove type constructor syntax (See params: Python parameters file does not support type constructors #6511)

With those changes you should be able to run (see dvc exp run options for syntax):

dvc exp run -S configs/schedule_1x.py:optimizer.lr=0.3

And it should behave as originally intended, without duplicating params.

The issue here would be that you should be either making all params changes with this syntax, or manually adding sections to dvc.yaml's params in order to support both --set-param and manual updates of config files.

Regardless, this seems to be getting out of the score of the original issue, so feel free to add a new issue/discussion for continuing with the use case.

mattlbeck · 2021-08-31T21:29:17Z

Another option: provide an --add-param flag in addition to --set-param. The --set-param raises an error if the parameter is new and refers the user to using --add-param, which ensures that their intention is explicit. --add-param in turn raises an error if the parameter already exists.

mmeendez8 · 2021-09-01T09:27:11Z

@daavoo I did not know I could use python files directly as params... that's quite interesting. I followed your advices and that is definetely a valid approach although I find it quite cumbersome.

Thanks for pointing out that issue, I will be following the updates there. At the end it would be super useful to just use my python config files as params directly. At this moment I cannot afford to remove type constructors of all my config files.

I got the point and I understand the implications here now, thank you all!

mattlbeck · 2021-09-01T22:52:59Z

I have a draft PR #6521 that implements basic bailing on finding new parameters. Additional functionality can be added on top of this based on however we decide to add flexibility for new params to be added through exp run.

Covers feature change iterative/dvc#6521. Parameters will now be validated. Related issue iterative/dvc#5477

Throws an error if dvc run exp --set-param references a parameter key path that is not in the target config file. This is done to prevent silent errors in experimentation when new params are accidentally added instead of modifying current ones due to typos. This implementation is based on initial suggestions in issue iterative#5477.

* throw error if new parameters are added by dvc run exp -S Throws an error if dvc run exp --set-param references a parameter key path that is not in the target config file. This is done to prevent silent errors in experimentation when new params are accidentally added instead of modifying current ones due to typos. This implementation is based on initial suggestions in issue #5477. * fix mypy confusion from overwriting a var * tests: remove modify param test that was adding params One of the test in test_modify_params was still adding a parameter * tests: remove param modification during exp run in test_untracked Spurious usage of modifying a parameter when the params.yaml is not yet existing in test_untracked. * remove benedict dependency for verifying params Instead, the keypath util module from benedict is adapted into the dvc codebased and used natively to verify that a keypath exists within a dict. * exp: list all invalid param modifications in error message * exp-run: more specific exception for new params via --set-param Uses established MissingPramsError instead of DvcException. Modified message to be more similar to other uses of this exception. * Revert "remove benedict dependency for verifying params" This reverts commit 6170f16. * exp: use internal benedict import for validating params * exp: combine new param validation into merge_params Moves new param validation into existing merge_params function as an optional check * exp: minor ammendments to error message * Revert "tests: remove param modification during exp run in test_untracked" This reverts commit c9eb1a6. * exp: fix test_untracked to not fail due to new params Fixed by creating the params file with the new parameter already present.

…dation (#2944) * experiment-management: remove warning about no parameter validation Covers feature change iterative/dvc#6521. Parameters will now be validated. Related issue iterative/dvc#5477 * experiment-management: spinkle 'existing param' qualifiers * Update content/docs/user-guide/experiment-management/running-experiments.md * Update content/docs/user-guide/experiment-management/running-experiments.md * Apply suggestions from code review * Restyled by prettier Co-authored-by: mattlbeck <[email protected]> Co-authored-by: Jorge Orpinel <[email protected]> Co-authored-by: Restyled.io <[email protected]>

…2824) * experiment-management: remove warning about no parameter validation Covers feature change iterative/dvc#6521. Parameters will now be validated. Related issue iterative/dvc#5477 * experiment-management: spinkle 'existing param' qualifiers * Update content/docs/user-guide/experiment-management/running-experiments.md * Update content/docs/user-guide/experiment-management/running-experiments.md * Apply suggestions from code review * Update content/docs/command-reference/exp/run.md Co-authored-by: Jorge Orpinel <[email protected]>

pmrowla added p3-nice-to-have It should be done this or next sprint ui user interface / interaction A: experiments Related to dvc exp labels Feb 16, 2021

iesahin mentioned this issue Jul 5, 2021

exp run: --set-param adds new parameters to the params file #6282

Closed

daavoo mentioned this issue Jul 26, 2021

Introduce pydantic schema #5531

Closed

2 tasks

dberenbaum mentioned this issue Aug 23, 2021

guide: Running Experiments iterative/dvc.org#2690

Merged

1 task

daavoo mentioned this issue Aug 24, 2021

exp run: unexpected error - cannot represent an object: {'lr': 0.1} #6476

Closed

daavoo added the discussion requires active participation to reach a conclusion label Aug 24, 2021

daavoo mentioned this issue Aug 24, 2021

Added test_set_empty_params #6484

Closed

2 tasks

mattlbeck mentioned this issue Sep 1, 2021

exp run: throw error if new parameters are added by --set-param #6521

Merged

2 tasks

mattlbeck pushed a commit to mattlbeck/dvc.org that referenced this issue Sep 13, 2021

experiment-management: remove warning about no parameter validation

3adf545

Covers feature change iterative/dvc#6521. Parameters will now be validated. Related issue iterative/dvc#5477

mattlbeck added a commit to mattlbeck/dvc.org that referenced this issue Sep 13, 2021

experiment-management: remove warning about no parameter validation

ebe4a37

Covers feature change iterative/dvc#6521. Parameters will now be validated. Related issue iterative/dvc#5477

mattlbeck mentioned this issue Sep 13, 2021

experiment-management: remove warning about no parameter validation iterative/dvc.org#2824

Merged

pmrowla closed this as completed in #6521 Oct 19, 2021

dberenbaum mentioned this issue Nov 16, 2021

dvc exp run --set-param does not change parameter values from a Python dict #6974

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

exp run: validate params filename and parameter names #5477

exp run: validate params filename and parameter names #5477

pmrowla commented Feb 16, 2021 •

edited by iesahin

Loading

dberenbaum commented Jul 5, 2021

mattlbeck commented Jul 19, 2021

mmeendez8 commented Aug 24, 2021

dberenbaum commented Aug 24, 2021

mattlbeck commented Aug 24, 2021

mmeendez8 commented Aug 25, 2021

mmeendez8 commented Aug 25, 2021

dberenbaum commented Aug 26, 2021

mmeendez8 commented Aug 30, 2021 •

edited

Loading

mattlbeck commented Aug 31, 2021

daavoo commented Aug 31, 2021

mattlbeck commented Aug 31, 2021 •

edited

Loading

mmeendez8 commented Sep 1, 2021

mattlbeck commented Sep 1, 2021

exp run: validate params filename and parameter names #5477

exp run: validate params filename and parameter names #5477

Comments

pmrowla commented Feb 16, 2021 • edited by iesahin Loading

Reproduce

Expected

Environment information

dberenbaum commented Jul 5, 2021

mattlbeck commented Jul 19, 2021

mmeendez8 commented Aug 24, 2021

dberenbaum commented Aug 24, 2021

mattlbeck commented Aug 24, 2021

mmeendez8 commented Aug 25, 2021

mmeendez8 commented Aug 25, 2021

dberenbaum commented Aug 26, 2021

mmeendez8 commented Aug 30, 2021 • edited Loading

mattlbeck commented Aug 31, 2021

daavoo commented Aug 31, 2021

mattlbeck commented Aug 31, 2021 • edited Loading

mmeendez8 commented Sep 1, 2021

mattlbeck commented Sep 1, 2021

pmrowla commented Feb 16, 2021 •

edited by iesahin

Loading

mmeendez8 commented Aug 30, 2021 •

edited

Loading

mattlbeck commented Aug 31, 2021 •

edited

Loading