Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Experiments, parameters, and the CLI: remove comma-separated parsing #5302

Closed
sjawhar opened this issue Jan 20, 2021 · 14 comments · Fixed by #5310
Closed

Experiments, parameters, and the CLI: remove comma-separated parsing #5302

sjawhar opened this issue Jan 20, 2021 · 14 comments · Fixed by #5310
Labels
p3-nice-to-have It should be done this or next sprint ui user interface / interaction

Comments

@sjawhar
Copy link
Contributor

sjawhar commented Jan 20, 2021

Bringing this conversation over from Discord:

For v2.0, are the devs open to changing the CLI for dvc exp run so that --params are expected space-separated instead of comma-separated (e.g. dvc exp run --params value1=1 value2=foo)? I feel this would be a much more typical UX, and it would allow for more powerful functionality (e.g. being able to pass in an arbitrary YAML object). Also, it would completely remove all logic around how to split the parameters from your sphere of concern, because it's all handled by argparse and properly quoting things when writing your command (e.g. dvc exp run --params 'value1={"foo": "look at these space, commas too!"}')

After feedback from @efiop and @dberenbaum, we thought a slightly different approach would be better so we could even get rid of the parsing of the equals sign:

dvc exp run --params param_name1 param_value1 --params param_name_2 param_value_2

A simple proof of concept is provided below:

# test.py
import argparse

parser = argparse.ArgumentParser()
parser.add_argument(
    '--params',
    action='append',
    metavar=('param_name', 'param_value'),
    nargs=2,
)
print(parser.parse_args())
$ python test.py --params foo bar --params goo '{"baz": "maybe?"}'
Namespace(params=[['foo', 'bar'], ['goo', '{"baz": "maybe?"}']])

I know there's #4883 for a full-on hydra approach, but what I'm suggesting is much simpler and actually reduces complexity. With 2.0 coming up and this being a breaking change, I thought now would be the perfect time to do it. I'm happy to work on this if there are no objections.

@sjawhar
Copy link
Contributor Author

sjawhar commented Jan 20, 2021

One thing worth considering is that sticking with a key=value format might be the better choice if you do want to eventually use hydra. Since hydra expects key=value, if you use that you could potentially drop it in later without the change being breaking. Splitting on the first equals sign in a string should be much safer than the comma splitting that's happening now.

>>> args = 'params.yaml:nested.param={"complex": "value with spaces, and commas", "query": "test == 3"}'
>>> param_name, _, param_value = args.partition('=')
>>> param_file, _, param_name = param_name.partition(':')
>>> param_file, param_name, param_value
('params.yaml', 'nested.param', '{"complex": "value with spaces, and commas", "query": "test == 3"}')

if param_name is empty, set it param_file and set param_file to the default. If param_value is empty, raise an error.

@pmrowla pmrowla added ui user interface / interaction p3-nice-to-have It should be done this or next sprint labels Jan 21, 2021
@skshetry
Copy link
Member

skshetry commented Jan 21, 2021

@sjawhar, great suggestion. The problem I see with it is that files other than params.yaml will become harder to type as you'll need to keep specifying it again and again. But, I guess it's okay, as it might not be a good idea to encourage users to override a lot of params through the cmd itself.

@dberenbaum
Copy link
Collaborator

Thanks for your diligence on this, @sjawhar !

Stepping back from Hydra integration, consistency and simplicity are important. @efiop mentioned that we should have a guide. Repeating the --params option each time is consistent with the POSIX standard (see 12.1.9), and it makes it easier to parse, so it seems like a good change.

I suggested key value but key=value is probably a better choice. Otherwise, it may look like key and value are each independent arguments passed to --params.

@pmrowla
Copy link
Contributor

pmrowla commented Feb 1, 2021

One other thing we should consider is that I think exp run --params really needs a short option, especially if users now need to specify --params more than once per run. The problem being that -p/-P are both already taken (because of repro --pipeline and --all-pipelines). I think we should consider dropping one (or both) of the short pipeline options for exp run, but I'm not sure how commonly used either pipeline option is for current dvc repro. We could also consider using -a/-A for --all-pipelines which would free up either capital or lowercase -p for params.

@dmpetrov @dberenbaum thoughts?

@dberenbaum
Copy link
Collaborator

Good point, @pmrowla. Some questions/thoughts:

  1. Should dvc encourage dvc exp run --params, especially if users are changing multiple parameters? Editing params.yaml may be easier for those users, and I think it makes the behavior of dvc exp run more transparent (in other words, it makes clear that exp picks up any workspace changes to the dependencies of the stage being run).
  2. Should dvc exp run mimic dvc repro flags? Mimicking dvc repro is a reasonable default, but it does make dvc exp run fairly complicated already, and I'm struggling to think of a great use case for -p/-P in the context of experiments.
  3. Should overriding params values be specified by --params even though that flag has a different meaning in dvc run/dvc stage add? Using something like -o, --override could work, although I don't like using -o for something besides --output. Maybe someone else has a better naming suggestion?

@sjawhar
Copy link
Contributor Author

sjawhar commented Feb 1, 2021

@dberenbaum @pmrowla I think you might not be considering use cases outside a data scientist on their laptop. Being able to specify --params multiple times is hugely useful when running automated jobs, for example. It makes it possible to easily integrate with tools like Weights & Biases for parameter searches. And even if it is just me and my laptop, sometimes it's much easier to code up a quick script to loop through and queue a few different parameter combinations rather than having to edit params.yaml manually.

@dberenbaum
Copy link
Collaborator

@sjawhar Good point. Having an option like --params is essential for such use cases. I imagine users will want to programmatically build a list of commands that invoke dvc exp run --queue --params with different combinations of values to perform a grid search (although brevity may be less important here). I admit that I use it over editing params.yaml even when just playing around with dvc exp run also, so point 1 above is pretty weak.

I'm personally most convinced by 3, but I'm curious to hear what others think.

@sjawhar
Copy link
Contributor Author

sjawhar commented Feb 1, 2021

@dberenbaum Especially now that the syntax has changed, I can definitely see changing the argument flag to something else, but I'm similarly stuck on what it should actually be 😆

@pmrowla
Copy link
Contributor

pmrowla commented Feb 2, 2021

  1. Should dvc exp run mimic dvc repro flags? Mimicking dvc repro is a reasonable default, but it does make dvc exp run fairly complicated already, and I'm struggling to think of a great use case for -p/-P in the context of experiments.

Long term, the current thinking is that dvc exp run will eventually replace dvc repro entirely, given that exp run in the workspace is just repro with the added benefit of automatically saving results locally. So preserving existing repro functionality is important (at least for now).

@dberenbaum
Copy link
Collaborator

So the remaining options are:

  1. Leave as is (no short flag).
  2. Add a short flag that's not -p/-P (???).
  3. Change the flag name (my best idea right now is -S, --set-params, but open to better suggestions!).

@pmrowla
Copy link
Contributor

pmrowla commented Feb 11, 2021

@dberenbaum has any decision been made on this?

@dberenbaum
Copy link
Collaborator

Thanks for checking back on this, @pmrowla! Unless a better name comes to you suddenly, let's go with -S, --set-param (let's go with singular and omit the "s" at the end since only one param can be set at a time). Making it distinct from dvc run --params seems like a good idea since they aren't equivalent.

@sjawhar
Copy link
Contributor Author

sjawhar commented Feb 11, 2021

On it!

@dberenbaum
Copy link
Collaborator

Thanks, @sjawhar! Sorry for the delay here!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
p3-nice-to-have It should be done this or next sprint ui user interface / interaction
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants