-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
exp run: validate params filename and parameter names #5477
Comments
Need to consider what's expected here. It could be useful for new users doing experiments who have no prior dvc experience to setup experiments without ever touching |
I have started to use In my opinion the current behaviour is pretty dangerous and a major source of experiment error if the user does not validate that the params were correctly applied themselves afterwards. I appreciate allowing users to stay away from params files, but I think some |
I understand @mattlbeck and @pmrowla point here. I would like to add another view point here. When dealing with DL frameworks, most time you have model config and training parameters already defined in the framework configuration, see for example: So if all parameters have to exist in |
@mattlbeck Do you think a warning would be sufficient, or do you think it's important for the experiment to fail? @mmeendez8 You only need to add to |
If it is important to maintain a route to modifying parameters solely via the CLI, then I would suggest a new This is in keeping with the idea that more experienced users are more likely to maintain parameters via the config file, as they will also be the same users who discover additional flags to use such as this one. I would not suggest a warn by default, as that may contradict the kinds of best practice you are attempting to encourage. |
@dberenbaum yes, you are right. Problem here is when you are using the |
I am checking this now in a local project for running an experiment with a different learning rate. I will try to give the best feedback I can. I am using mmdetection so learning rate (and most hyperparams) are located in a config file see example here. I like the option of loading my params.yaml and modifying config file contents "dynamically", that was really useful for me. Otherwise I have to:
So step 2 above is kind of repeating the configuration twice. I have to know beforehand that I am going to modify learning rate and add it to my params.yaml. Avoiding step two this would simply be:
Following you comment:
For this, I would have to know beforehand which parameters am I going to modify and add them to the params.yaml with same value as in my mmdetection config file (as I would do with the learning_rate). Problem here is that I can modify the learning rate, momentum, architecture details.... So I will end up repeating most of my config file in the params.yaml. That's why I found useful to simply set my parameters with the -S flag directly, even though they do not exist in the params.yaml. I understand that this might not be the general use case for what you are tryining to build but I clearly vote for raising a warning 😸 |
@mattlbeck I think the warning/error is most useful for new users, since you more experienced users probably can figure it out on your own 😉, which is why I'm not crazy about a flag that won't be obvious for new users.
Yes, the queue will pick up whatever is in
Sorry, I don't follow. Are you hard-coding the values in both the config file and params.yaml? How are the values in params.yaml being read into your model training code? I thought the example config you provided would be reading values from params.yaml. If you already have to provide values for all parameters, can they all be solely in params.yaml instead of in both files? |
Yes, I am hard-coding them in both. Mmdetection uses python files as config so this is the only way.
Yaml params file is loaded programatically and merged with my config files (example here)
That would be a problem because mmdetection does not work with yaml files I have created a post with the actual state for training a model with the mmdetection framework with dvc experiments that might serve to visualize the problem. |
@mmeendez8 I see where you are coming from. You are using Another consideration is whether a novel parameter added with @dberenbaum I have an additional suggestion: throw an error when attempting to set a new param which reads something like |
There is built-in support for using Python parameters files.
I reviewed the linked example and it looks like you might be able to get a similar behavior using dvc's built-in functionality (no need for the custom merging of
With those changes you should be able to run (see dvc exp run -S configs/schedule_1x.py:optimizer.lr=0.3 And it should behave as originally intended, without duplicating params. The issue here would be that you should be either making all Regardless, this seems to be getting out of the score of the original issue, so feel free to add a new issue/discussion for continuing with the use case. |
Another option: provide an |
@daavoo I did not know I could use python files directly as params... that's quite interesting. I followed your advices and that is definetely a valid approach although I find it quite cumbersome. Thanks for pointing out that issue, I will be following the updates there. At the end it would be super useful to just use my python config files as params directly. At this moment I cannot afford to remove type constructors of all my config files. I got the point and I understand the implications here now, thank you all! |
I have a draft PR #6521 that implements basic bailing on finding new parameters. Additional functionality can be added on top of this based on however we decide to add flexibility for new params to be added through |
Covers feature change iterative/dvc#6521. Parameters will now be validated. Related issue iterative/dvc#5477
Covers feature change iterative/dvc#6521. Parameters will now be validated. Related issue iterative/dvc#5477
Throws an error if dvc run exp --set-param references a parameter key path that is not in the target config file. This is done to prevent silent errors in experimentation when new params are accidentally added instead of modifying current ones due to typos. This implementation is based on initial suggestions in issue iterative#5477.
* throw error if new parameters are added by dvc run exp -S Throws an error if dvc run exp --set-param references a parameter key path that is not in the target config file. This is done to prevent silent errors in experimentation when new params are accidentally added instead of modifying current ones due to typos. This implementation is based on initial suggestions in issue #5477. * fix mypy confusion from overwriting a var * tests: remove modify param test that was adding params One of the test in test_modify_params was still adding a parameter * tests: remove param modification during exp run in test_untracked Spurious usage of modifying a parameter when the params.yaml is not yet existing in test_untracked. * remove benedict dependency for verifying params Instead, the keypath util module from benedict is adapted into the dvc codebased and used natively to verify that a keypath exists within a dict. * exp: list all invalid param modifications in error message * exp-run: more specific exception for new params via --set-param Uses established MissingPramsError instead of DvcException. Modified message to be more similar to other uses of this exception. * Revert "remove benedict dependency for verifying params" This reverts commit 6170f16. * exp: use internal benedict import for validating params * exp: combine new param validation into merge_params Moves new param validation into existing merge_params function as an optional check * exp: minor ammendments to error message * Revert "tests: remove param modification during exp run in test_untracked" This reverts commit c9eb1a6. * exp: fix test_untracked to not fail due to new params Fixed by creating the params file with the new parameter already present.
…dation (#2944) * experiment-management: remove warning about no parameter validation Covers feature change iterative/dvc#6521. Parameters will now be validated. Related issue iterative/dvc#5477 * experiment-management: spinkle 'existing param' qualifiers * Update content/docs/user-guide/experiment-management/running-experiments.md * Update content/docs/user-guide/experiment-management/running-experiments.md * Apply suggestions from code review * Restyled by prettier Co-authored-by: mattlbeck <[email protected]> Co-authored-by: Jorge Orpinel <[email protected]> Co-authored-by: Restyled.io <[email protected]>
…2824) * experiment-management: remove warning about no parameter validation Covers feature change iterative/dvc#6521. Parameters will now be validated. Related issue iterative/dvc#5477 * experiment-management: spinkle 'existing param' qualifiers * Update content/docs/user-guide/experiment-management/running-experiments.md * Update content/docs/user-guide/experiment-management/running-experiments.md * Apply suggestions from code review * Update content/docs/command-reference/exp/run.md Co-authored-by: Jorge Orpinel <[email protected]>
One thing I noticed when trying the changes from #5302 is that if the user specifies a parameter or params file that doesn't exist (i.e. typo's a param name) we just silently accept it and create the new file or new entry in the existing params file (this behavior existed before the
-S/--set-param
change as well).If the specified parameter doesn't exist in the target stage/pipeline we should probably throw an error instead of silently ignoring it.
Reproduce
The experiment is run as normal and
params.yaml
has a new parameter:Note the typo in the parameter name.
The user thinks that they have changed a parameter, and no error is reported.
Expected
I'd expect parameters' existence are checked before making any modifications. So the line with
should report that
train.n_grams
is not a valid parameter.Environment information
Output of
dvc doctor
:The text was updated successfully, but these errors were encountered: