parsing: Support dict unpacking in `cmd`. #7907

daavoo · 2022-06-16T19:00:58Z

Allow to use dictionaries as values for template interpolation but only inside the cmd key.

Given the following params.yaml and dvc.yaml:

# params.yaml
dict:
  foo: foo
  bar: bar

# dvc.yaml
stages:
  stage1:
    cmd: python script.py ${dict}

The dictionary will be unpacked with the following syntax:

# dvc.yaml
stages:
  stage1:
    cmd: python script.py --foo foo --bar bar

Closes #6107

❗ I have followed the Contributing to DVC checklist.
📖 If this PR requires documentation updates, I have created a separate PR (or issue, at least) in dvc.org and linked it here.

skshetry · 2022-06-17T07:09:07Z

Any recent motivations for this? 🙂

daavoo · 2022-06-17T07:32:01Z

Any recent motivations for this? 🙂

When exploring how to better integrate dvc params with ML frameworks, this showed as low-hanging fruit for some frameworks like Yolov5 (linked issue), HuggingFace , PyTorch Lightning, etc.
In addition, the default syntax is good enough for custom training scripts using simple argparse and having more flexibility of configuring cmd calls of CLI tools without having to edit dvc.yaml for each iteration (adding/removing argument)

tests/func/parsing/test_interpolated_entry.py

karajan1001 · 2022-06-22T02:45:36Z

dvc/parsing/interpolate.py

+def _(obj: dict):
+    result = ""
+    for k, v in flatten(obj).items():
+        if isinstance(v, bool):


This only works for argparse.BooleanOptionalAction not for store_true/false. Maybe need some better way to handle this?

The same issue about different options, as commented below #7907 (comment)

Which option do you think we should consider as the most appropriate default?

Note that this can be used for things beyond argparse, for example you can use the interpolation to pass flags to arbitrary executables that you call inside cmd

Added config.parsing.list with store_true and boolean_optional .

Not sure about store_false because the interaction with dvc params would look kind of strange, IMO

I also have no idea about it.

I think that's more than enough to support for now.

karajan1001 · 2022-06-22T02:51:19Z

dvc/parsing/interpolate.py

+                    raise ParseError(
+                        f"Cannot interpolate nested iterable in '{k}'"
+                    )
+                result += f"--{k} {i} "


> import argparse > parser = argparse.ArgumentParser() > parser.add_argument('--foo', nargs='*') > parser.parse_args('--foo x y'.split()) Namespace(foo=['x', 'y']) > parser.parse_args('--foo x --foo y'.split()) Namespace(foo=['y'])

Looks like setting arguments for multi times will overwrite the previous one.

The idea was to cover the action='append' mode:

>>> parser = argparse.ArgumentParser() >>> parser.add_argument('--foo', action='append') >>> parser.parse_args('--foo 1 --foo 2'.split()) Namespace(foo=['1', '2'])

Maybe it's a matter of clearly stating it in the docs.
Ideally, we should support all options but I am not sure how to do it. We could add a bunch of config options or just allow users to register a custom to_str method for interpolation.

Let's pick one behavior for now. Is it possible to see which one is more widely used in existing ML CLIs? I would guess the nargs --foo 1 2 behavior is more common, even though I find the explicit append behavior easier to read.

Added config.parsing.list with nargs and append options. It doesn't look that bad to have the options.

Just noticed this, it's a bad idea to make dvc.yaml parsing conditional, it should be self-contained.

karajan1001 · 2022-06-28T10:53:48Z

tests/func/parsing/test_interpolated_entry.py

+
+
+@pytest.mark.parametrize(
+    "bool_config", [None, "store_true", "boolean_optional"]


Maybe we only need to test two different conditions? The None can be tested through some other method (Test if the default value is set correctly)?

Because these two configurations are independent, maybe we can test them in an (I'm not sure how to describe it) instead of using a product. ["a1b1", "a2b2"] instead of ["a1", "a2"] x ["b1", "b2"].
The two suggestions above can reduce the amount of the test from 9 to 2.

I updated it to 3 test cases (["a1b1", "a2b2"]). I still included the default in the same test, not sure if it's worth it to separate it as the body of the test would be the same

karajan1001 · 2022-06-28T10:57:27Z

LGTM for the most part, another concern is that the two new configurations still can't solve all of the problems as two different types of arg might exist in one command.

Allow to use dictionaries as values for template interpolation but only inside the `cmd` key. See tests/func/parsing/test_interpolated_entry.py::test_cmd_dict for detailed syntax. Add `config.parsing` section for configuring behavior of ambiguous data types like booleans and lists.

daavoo · 2022-07-04T08:05:49Z

another concern is that the two new configurations still can't solve all of the problems as two different types of arg might exist in one command.

Indeed. There are many unsolved issues but the idea for this initial P.R. is to cover basic scenarios to don't overcomplicate the logic until users request for specific issues

config: Add `parsing` section. Per iterative/dvc#7907

daavoo linked an issue Jun 16, 2022 that may be closed by this pull request

More flexible dvc.yaml parameterisation #6107

Closed

daavoo requested a review from dberenbaum June 16, 2022 19:01

daavoo self-assigned this Jun 17, 2022

daavoo added A: templating Related to the templating feature feature is a feature labels Jun 17, 2022

dberenbaum reviewed Jun 17, 2022

View reviewed changes

tests/func/parsing/test_interpolated_entry.py Outdated Show resolved Hide resolved

This comment was marked as resolved.

Sign in to view

daavoo force-pushed the 6107-more-flexible-dvcyaml-parameterisation branch from cc627fb to b9daa31 Compare June 20, 2022 09:08

daavoo commented Jun 20, 2022

View reviewed changes

tests/func/parsing/test_interpolated_entry.py Show resolved Hide resolved

daavoo force-pushed the 6107-more-flexible-dvcyaml-parameterisation branch 3 times, most recently from 6a76372 to a2ea47d Compare June 21, 2022 09:47

daavoo marked this pull request as ready for review June 21, 2022 10:49

daavoo requested a review from a team as a code owner June 21, 2022 10:49

daavoo force-pushed the 6107-more-flexible-dvcyaml-parameterisation branch from a2ea47d to e4af3f4 Compare June 21, 2022 10:49

daavoo requested a review from a team as a code owner June 21, 2022 10:49

daavoo requested review from karajan1001 and efiop June 21, 2022 10:49

karajan1001 reviewed Jun 22, 2022

View reviewed changes

daavoo force-pushed the 6107-more-flexible-dvcyaml-parameterisation branch from e4af3f4 to b41e148 Compare June 23, 2022 15:07

dberenbaum approved these changes Jun 27, 2022

View reviewed changes

karajan1001 reviewed Jun 28, 2022

View reviewed changes

daavoo force-pushed the 6107-more-flexible-dvcyaml-parameterisation branch from b41e148 to 7fac181 Compare July 4, 2022 08:02

daavoo requested a review from karajan1001 July 4, 2022 08:04

karajan1001 approved these changes Jul 5, 2022

View reviewed changes

daavoo merged commit 9099413 into main Jul 5, 2022

daavoo deleted the 6107-more-flexible-dvcyaml-parameterisation branch July 5, 2022 07:02

daavoo added a commit to iterative/dvc.org that referenced this pull request Jul 5, 2022

templating: Add dict unpacking section.

a795da5

config: Add `parsing` section. Per iterative/dvc#7907

daavoo mentioned this pull request Jul 5, 2022

templating: Add dict unpacking section. iterative/dvc.org#3730

Merged

daavoo added a commit to iterative/dvc.org that referenced this pull request Jul 6, 2022

templating: Add dict unpacking section. (#3730)

aee09a8

config: Add `parsing` section. Per iterative/dvc#7907

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

parsing: Support dict unpacking in `cmd`. #7907

parsing: Support dict unpacking in `cmd`. #7907

daavoo commented Jun 16, 2022 •

edited

Loading

skshetry commented Jun 17, 2022 •

edited

Loading

daavoo commented Jun 17, 2022

This comment was marked as resolved.

karajan1001 Jun 22, 2022

daavoo Jun 22, 2022

daavoo Jun 23, 2022

karajan1001 Jun 24, 2022

dberenbaum Jun 27, 2022

karajan1001 Jun 22, 2022

daavoo Jun 22, 2022 •

edited

Loading

dberenbaum Jun 22, 2022

daavoo Jun 23, 2022

skshetry Jul 29, 2022

karajan1001 Jun 28, 2022 •

edited

Loading

daavoo Jul 4, 2022

karajan1001 commented Jun 28, 2022

daavoo commented Jul 4, 2022



		@pytest.mark.parametrize(
		"bool_config", [None, "store_true", "boolean_optional"]

parsing: Support dict unpacking in cmd. #7907

parsing: Support dict unpacking in cmd. #7907

Conversation

daavoo commented Jun 16, 2022 • edited Loading

skshetry commented Jun 17, 2022 • edited Loading

daavoo commented Jun 17, 2022

This comment was marked as resolved.

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

daavoo Jun 22, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

karajan1001 Jun 28, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

karajan1001 commented Jun 28, 2022

daavoo commented Jul 4, 2022

parsing: Support dict unpacking in `cmd`. #7907

parsing: Support dict unpacking in `cmd`. #7907

daavoo commented Jun 16, 2022 •

edited

Loading

skshetry commented Jun 17, 2022 •

edited

Loading

daavoo Jun 22, 2022 •

edited

Loading

karajan1001 Jun 28, 2022 •

edited

Loading