Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

exp run: validate params filename and parameter names #5477

Closed
pmrowla opened this issue Feb 16, 2021 · 14 comments · Fixed by #6521
Closed

exp run: validate params filename and parameter names #5477

pmrowla opened this issue Feb 16, 2021 · 14 comments · Fixed by #6521
Labels
A: experiments Related to dvc exp discussion requires active participation to reach a conclusion p3-nice-to-have It should be done this or next sprint ui user interface / interaction

Comments

@pmrowla
Copy link
Contributor

pmrowla commented Feb 16, 2021

One thing I noticed when trying the changes from #5302 is that if the user specifies a parameter or params file that doesn't exist (i.e. typo's a param name) we just silently accept it and create the new file or new entry in the existing params file (this behavior existed before the -S/--set-param change as well).

If the specified parameter doesn't exist in the target stage/pipeline we should probably throw an error instead of silently ignoring it.

Reproduce

git clone https://github.com/iterative/example-get-started
cd example-get-started
dvc pull
virtualenv .venv
. .venv/bin/activate
pip install -r src/requirements.txt
dvc exp run -S feature.n_grams=3
git diff params.yaml

The experiment is run as normal and params.yaml has a new parameter:

diff --git a/params.yaml b/params.yaml
index e5a636f..aed2663 100644
--- a/params.yaml
+++ b/params.yaml
@@ -6,6 +6,7 @@ featurize:
   max_features: 3000
   ngrams: 2
 
+  n_grams: 3
 train:
   seed: 20170428
   n_est: 100
(END)

Note the typo in the parameter name.

The user thinks that they have changed a parameter, and no error is reported.

Expected

I'd expect parameters' existence are checked before making any modifications. So the line with

dvc exp run -S train.n_grams=3

should report that train.n_grams is not a valid parameter.

Environment information

Output of dvc doctor:

$ dvc doctor
DVC version: 2.4.3 (pip)
---------------------------------
Platform: Python 3.8.10 on Linux-5.8.0-59-generic-x86_64-with-glibc2.29
Supports: All remotes
Cache types: hardlink, symlink
Cache directory: ext4 on /dev/nvme0n1p2
Caches: local
Remotes: https
Workspace directory: ext4 on /dev/nvme0n1p2
Repo: dvc, git
@pmrowla pmrowla added p3-nice-to-have It should be done this or next sprint ui user interface / interaction A: experiments Related to dvc exp labels Feb 16, 2021
@dberenbaum
Copy link
Collaborator

Need to consider what's expected here. It could be useful for new users doing experiments who have no prior dvc experience to setup experiments without ever touching params.yaml by adding parameters this way. Maybe a warning is best?

@mattlbeck
Copy link
Contributor

I have started to use dvc exp run in earnest recently. I have found this behaviour particularly unexpected and had assumed that misspelling a parameter would result in the experiment not running.

In my opinion the current behaviour is pretty dangerous and a major source of experiment error if the user does not validate that the params were correctly applied themselves afterwards.

I appreciate allowing users to stay away from params files, but I think some --verify-params flag would be very much appreciated at least.

@mmeendez8
Copy link
Contributor

I understand @mattlbeck and @pmrowla point here.

I would like to add another view point here. When dealing with DL frameworks, most time you have model config and training parameters already defined in the framework configuration, see for example:

So if all parameters have to exist in params.yaml, I am forced to duplicate all my config parameters twice!

@dberenbaum
Copy link
Collaborator

@mattlbeck Do you think a warning would be sufficient, or do you think it's important for the experiment to fail?

@mmeendez8 You only need to add to params.yaml those parameters that you change via experiments, right? So basically validating the parameters means you have to edit params.yaml directly instead of using --set-param if you are introducing a new parameter?

@mattlbeck
Copy link
Contributor

If it is important to maintain a route to modifying parameters solely via the CLI, then I would suggest a new --verify-params flag that fails the experiment when new params are declared via a dvc exp run.

This is in keeping with the idea that more experienced users are more likely to maintain parameters via the config file, as they will also be the same users who discover additional flags to use such as this one.

I would not suggest a warn by default, as that may contradict the kinds of best practice you are attempting to encourage.

@mmeendez8
Copy link
Contributor

@dberenbaum yes, you are right. Problem here is when you are using the --queue flag. Would this work avoiding the flag, simply changing params.yaml and running multiple dvc exp run --queue?

@mmeendez8
Copy link
Contributor

I am checking this now in a local project for running an experiment with a different learning rate. I will try to give the best feedback I can.

I am using mmdetection so learning rate (and most hyperparams) are located in a config file see example here. I like the option of loading my params.yaml and modifying config file contents "dynamically", that was really useful for me. Otherwise I have to:

  1. Create the mmdetection config file
  2. Add learning rate parameter with the same value as in config file to params.yaml
  3. Run dvc exp run -S learning_rate=new_value

So step 2 above is kind of repeating the configuration twice. I have to know beforehand that I am going to modify learning rate and add it to my params.yaml. Avoiding step two this would simply be:

  1. Create the mmdetection config file
  2. Run dvc exp run -S learning_rate=new_value

Following you comment:

@mmeendez8 You only need to add to params.yaml those parameters that you change via experiments, right? So basically validating the parameters means you have to edit params.yaml directly instead of using --set-param if you are introducing a new parameter?

For this, I would have to know beforehand which parameters am I going to modify and add them to the params.yaml with same value as in my mmdetection config file (as I would do with the learning_rate).

Problem here is that I can modify the learning rate, momentum, architecture details.... So I will end up repeating most of my config file in the params.yaml. That's why I found useful to simply set my parameters with the -S flag directly, even though they do not exist in the params.yaml.

I understand that this might not be the general use case for what you are tryining to build but I clearly vote for raising a warning 😸

@dberenbaum
Copy link
Collaborator

@mattlbeck I think the warning/error is most useful for new users, since you more experienced users probably can figure it out on your own 😉, which is why I'm not crazy about a flag that won't be obvious for new users.

@dberenbaum yes, you are right. Problem here is when you are using the --queue flag. Would this work avoiding the flag, simply changing params.yaml and running multiple dvc exp run --queue?

Yes, the queue will pick up whatever is in params.yaml. -S will simply overwrite that parameter's value in params.yaml.

2\. Add learning rate parameter with the same value as in config file to params.yaml

Sorry, I don't follow. Are you hard-coding the values in both the config file and params.yaml? How are the values in params.yaml being read into your model training code? I thought the example config you provided would be reading values from params.yaml. If you already have to provide values for all parameters, can they all be solely in params.yaml instead of in both files?

@mmeendez8
Copy link
Contributor

mmeendez8 commented Aug 30, 2021

@dberenbaum

Are you hard-coding the values in both the config file and params.yaml?

Yes, I am hard-coding them in both. Mmdetection uses python files as config so this is the only way.

How are the values in params.yaml being read into your model training code?

Yaml params file is loaded programatically and merged with my config files (example here)

If you already have to provide values for all parameters, can they all be solely in params.yaml instead of in both files?

That would be a problem because mmdetection does not work with yaml files

I have created a post with the actual state for training a model with the mmdetection framework with dvc experiments that might serve to visualize the problem.

@mattlbeck
Copy link
Contributor

@mmeendez8 I see where you are coming from. You are using --set-param to populate your config file "on the fly" with parameters that are being modified to avoid having to maintain the dvc parameter files yourself.

Another consideration is whether a novel parameter added with --set-param also updates the dvc.yaml to track that param, or if you can work around this by adding some parent reference as a catch-all parameter dependency in this scenario?

@dberenbaum I have an additional suggestion: throw an error when attempting to set a new param which reads something like ERROR: Parameter <param> does not exist in any parameter file. If you would like to set a new param, use --no-validate and provide a new flag like --no-validate as an extra confirmation a user must make to set new parameters via this command. This is akin to how --force flags are used in both dvc an git.

@daavoo
Copy link
Contributor

daavoo commented Aug 31, 2021

Yes, I am hard-coding them in both. Mmdetection uses python files as config so this is the only way.

There is built-in support for using Python parameters files.

Yaml params file is loaded programatically and merged with my config files (example here)

I reviewed the linked example and it looks like you might be able to get a similar behavior using dvc's built-in functionality (no need for the custom merging of params.yaml and mmdetection configs), although it would require some non-intuitive changes:

With those changes you should be able to run (see dvc exp run options for syntax):

dvc exp run -S configs/schedule_1x.py:optimizer.lr=0.3

And it should behave as originally intended, without duplicating params.

The issue here would be that you should be either making all params changes with this syntax, or manually adding sections to dvc.yaml's params in order to support both --set-param and manual updates of config files.

Regardless, this seems to be getting out of the score of the original issue, so feel free to add a new issue/discussion for continuing with the use case.

@mattlbeck
Copy link
Contributor

mattlbeck commented Aug 31, 2021

Another option: provide an --add-param flag in addition to --set-param. The --set-param raises an error if the parameter is new and refers the user to using --add-param, which ensures that their intention is explicit. --add-param in turn raises an error if the parameter already exists.

@mmeendez8
Copy link
Contributor

@daavoo I did not know I could use python files directly as params... that's quite interesting. I followed your advices and that is definetely a valid approach although I find it quite cumbersome.

Thanks for pointing out that issue, I will be following the updates there. At the end it would be super useful to just use my python config files as params directly. At this moment I cannot afford to remove type constructors of all my config files.

I got the point and I understand the implications here now, thank you all!

@mattlbeck
Copy link
Contributor

I have a draft PR #6521 that implements basic bailing on finding new parameters. Additional functionality can be added on top of this based on however we decide to add flexibility for new params to be added through exp run.

mattlbeck pushed a commit to mattlbeck/dvc.org that referenced this issue Sep 13, 2021
Covers feature change iterative/dvc#6521. Parameters will now be validated.

Related issue iterative/dvc#5477
mattlbeck added a commit to mattlbeck/dvc.org that referenced this issue Sep 13, 2021
Covers feature change iterative/dvc#6521. Parameters will now be validated.

Related issue iterative/dvc#5477
mattlbeck added a commit to mattlbeck/dvc that referenced this issue Oct 1, 2021
Throws an error if dvc run exp --set-param references a parameter key path
that is not in the target config file. This is done to prevent silent errors
in experimentation when new params are accidentally added instead of modifying
current ones due to typos.

This implementation is based on initial suggestions in issue iterative#5477.
pmrowla pushed a commit that referenced this issue Oct 19, 2021
* throw error if new parameters are added by dvc run exp -S

Throws an error if dvc run exp --set-param references a parameter key path
that is not in the target config file. This is done to prevent silent errors
in experimentation when new params are accidentally added instead of modifying
current ones due to typos.

This implementation is based on initial suggestions in issue #5477.

* fix mypy confusion from overwriting a var

* tests: remove modify param test that was adding params

One of the test in test_modify_params was still adding a parameter

* tests: remove param modification during exp run in test_untracked

Spurious usage of modifying a parameter when the params.yaml is not yet existing in test_untracked.

* remove benedict dependency for verifying params

Instead, the keypath util module from benedict is adapted into the dvc codebased and used
natively to verify that a keypath exists within a dict.

* exp: list all invalid param modifications in error message

* exp-run: more specific exception for new params via --set-param

Uses established MissingPramsError instead of DvcException. Modified message to be more similar to other uses of this exception.

* Revert "remove benedict dependency for verifying params"

This reverts commit 6170f16.

* exp: use internal benedict import for validating params

* exp: combine new param validation into merge_params

Moves new param validation into existing merge_params function as an optional check

* exp: minor ammendments to error message

* Revert "tests: remove param modification during exp run in test_untracked"

This reverts commit c9eb1a6.

* exp: fix test_untracked to not fail due to new params

Fixed by creating the params file with the new parameter already present.
jorgeorpinel added a commit to iterative/dvc.org that referenced this issue Oct 20, 2021
…dation (#2944)

* experiment-management: remove warning about no parameter validation

Covers feature change iterative/dvc#6521. Parameters will now be validated.

Related issue iterative/dvc#5477

* experiment-management: spinkle 'existing param' qualifiers

* Update content/docs/user-guide/experiment-management/running-experiments.md

* Update content/docs/user-guide/experiment-management/running-experiments.md

* Apply suggestions from code review

* Restyled by prettier

Co-authored-by: mattlbeck <[email protected]>
Co-authored-by: Jorge Orpinel <[email protected]>
Co-authored-by: Restyled.io <[email protected]>
jorgeorpinel added a commit to iterative/dvc.org that referenced this issue Oct 20, 2021
…2824)

* experiment-management: remove warning about no parameter validation

Covers feature change iterative/dvc#6521. Parameters will now be validated.

Related issue iterative/dvc#5477

* experiment-management: spinkle 'existing param' qualifiers

* Update content/docs/user-guide/experiment-management/running-experiments.md

* Update content/docs/user-guide/experiment-management/running-experiments.md

* Apply suggestions from code review

* Update content/docs/command-reference/exp/run.md

Co-authored-by: Jorge Orpinel <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A: experiments Related to dvc exp discussion requires active participation to reach a conclusion p3-nice-to-have It should be done this or next sprint ui user interface / interaction
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants