-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weβll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP dvc: implement vars
templating in multistage dvc file
#4463
Changes from all commits
b2e001d
e65ed9e
b408c53
6731896
ae9051b
1702ecf
bc1956a
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,62 @@ | ||
import argparse | ||
import logging | ||
|
||
from dvc.command.base import CmdBase, append_doc_link | ||
from dvc.utils.serialize._yaml import dumps_yaml, render_dvc_template | ||
|
||
logger = logging.getLogger(__name__) | ||
|
||
|
||
class CmdRender(CmdBase): | ||
def run(self): | ||
with open(self.args.path, encoding="utf-8") as fd: | ||
text = fd.read() | ||
dvc_dict, vars_dict = render_dvc_template(text) | ||
vars_dict = {"vars": vars_dict} | ||
vars_str = dumps_yaml(vars_dict) | ||
|
||
if self.args.only_vars: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Too many return points in one function. Can we make it more compact? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah definitely. This was tired Josh coding :-p |
||
logger.info(vars_str) | ||
return 0 | ||
|
||
if self.args.stage is not None: | ||
dvc_dict = dvc_dict["stages"][self.args.stage] | ||
dvc_str = dumps_yaml({"stages": {self.args.stage: dvc_dict}}) | ||
|
||
if self.args.only_stages: | ||
logger.info(dvc_str) | ||
return 0 | ||
|
||
logger.info("\n".join([vars_str, dvc_str])) | ||
return 0 | ||
|
||
|
||
def add_parser(subparsers, parent_parser): | ||
RENDER_HELP = "Render templated dvc.yaml" | ||
render_parser = subparsers.add_parser( | ||
"render", | ||
parents=[parent_parser], | ||
description=append_doc_link(RENDER_HELP, "render"), | ||
help=RENDER_HELP, | ||
formatter_class=argparse.RawDescriptionHelpFormatter, | ||
) | ||
render_parser.add_argument( | ||
"--path", default="dvc.yaml", help="Path to dvc.yaml file", | ||
) | ||
render_parser.add_argument( | ||
"--stage", "-s", default=None, help="Only render a specified stage", | ||
) | ||
render_parser.add_argument( | ||
"--only-vars", | ||
action="store_true", | ||
default=False, | ||
help="Only render the `vars` component of the dvc.yaml", | ||
) | ||
render_parser.add_argument( | ||
"--only-stages", | ||
action="store_true", | ||
default=False, | ||
help="Only render the `stages` component of the dvc.yaml", | ||
) | ||
|
||
render_parser.set_defaults(func=CmdRender) |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -3,6 +3,7 @@ | |
from contextlib import contextmanager | ||
|
||
from funcy import reraise | ||
from jinja2 import Template | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hm, not quite sure that jinja template is the best format we can use in dvc files, feels like it is still a bit unusual, at least coming from python. It is mature and powerful though, no doubt about it π There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah, I went with this because it was relatively easy to use and mature. We also touched on this in issue #3633. If you/anyone has any recommendations I'd be open to taking a crack at implementing it. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hey @efiop Would a custom syntax be more acceptable? I have been using this fork quite a lot recently, if we were to go down the path of a custom templating syntax I think the following features are important. I welcome any feedback :-)
vars:
repeated_arg:
- thing
- another_thing
- another_other_thing
stages:
foo:
wdir: .
cmd: >-
python foo.py
{% for x in repeated_arg %}
-a "{{ x }}"
{% endfor %}
# Renders as
vars:
repeated_arg:
- thing
- another_thing
- another_other_thing
stages:
foo:
wdir: .
cmd: >-
python foo.py
-a "thing"
-a "anothet_thing"
-a "another_othert_thing"
stages:
one:
wdir: .
cmd: "echo Bailey"
two:
wdir: .
cmd: "echo {{ stages.one.stdout }} is the greatest doggo in the universe!!!"
# After stage one completes stage two renders as
two:
wdir: .
cmd: "echo Bailey is the greatest doggo in the universe!!!" Though maybe this is getting a bit distracted from the initial goal Also pulling in interested parties from #3633 @dsuess @dmpetrov @karajan1001 @skshetry @elgehelge @jcpsantiago There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Haha, I only had used |
||
from ruamel.yaml import YAML | ||
from ruamel.yaml.error import YAMLError | ||
|
||
|
@@ -14,14 +15,77 @@ def __init__(self, path): | |
super().__init__(path, "YAML file structure is corrupted") | ||
|
||
|
||
def load_yaml(path, tree=None): | ||
return _load_data(path, parser=parse_yaml, tree=tree) | ||
def recursive_render(tpl, values, max_passes=100): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is for nested interpolation, right? {{ var {{ var }} }} There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yep. Not the most efficient thing, just trying to get something working. My team and I have been using this feature in our internal projects. It's been quite nice! |
||
"""This is a bit of black magic to recursivly render a | ||
template. Adaped from: | ||
|
||
https://stackoverflow.com/questions/8862731/jinja-nested-rendering-on-variable-content | ||
|
||
Args: | ||
tpl: Template string | ||
values: dict of values. Importantly this dict can contain | ||
values that are themselves {{ placeholders }} | ||
max_passes: Limits the number of times we loop over the | ||
template. | ||
|
||
Returns: | ||
rendered template. | ||
""" | ||
prev = tpl | ||
for _ in range(max_passes): | ||
curr = Template(prev).render(**values) | ||
if curr != prev: | ||
prev = curr | ||
else: | ||
return curr | ||
raise RecursionError("Max resursion depth reached") | ||
|
||
|
||
def render_vars(dvc_dict): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @tall-josh, this interpolates values in vars:
foo: 3
bar: "{{ foo }}" There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. How important is this scenario? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sorry for the delay. Yeah it does. I have used this a lot. ie: vars:
my_labeling_project: 123
data_out: /storage/data/{{ my_labeling_project }}_data
src: s3://labeled-data/projects/{{ my_labeling_project }}
stages:
download:
cmd: python download_data.py --src {{src}} --out {{ data_out }}
outs:
- {{ data_out }} Without this I would have to repeat the |
||
vars_dict = dvc_dict["vars"] | ||
vars_template = dumps_yaml(vars_dict) | ||
|
||
rendered_vars = recursive_render(vars_template, vars_dict) | ||
return rendered_vars | ||
|
||
|
||
def render_dvc(dvc_dict, vars_dict): | ||
dvc_template = dumps_yaml(dvc_dict) | ||
rendered_dvc = Template(dvc_template).render(**vars_dict) | ||
return rendered_dvc | ||
|
||
|
||
def render_dvc_template(text): | ||
|
||
yaml = YAML(typ="safe") | ||
dvc_dict = yaml.load(text) or {} | ||
if "vars" in dvc_dict: | ||
vars_dict = yaml.load(render_vars(dvc_dict)) | ||
|
||
del dvc_dict["vars"] | ||
rendered_dvc = yaml.load(render_dvc(dvc_dict, vars_dict)) or {} | ||
|
||
return rendered_dvc, vars_dict | ||
|
||
|
||
def parse_yaml(text, path, typ="safe"): | ||
yaml = YAML(typ=typ) | ||
with reraise(YAMLError, YAMLFileCorruptedError(path)): | ||
return yaml.load(text) or {} | ||
result = yaml.load(text) or {} | ||
|
||
if "vars" in result: | ||
try: | ||
result, _ = render_dvc_template( | ||
text | ||
) # yaml.load(text, Loader=SafeLoader) or {} | ||
except Exception as exc: | ||
raise YAMLFileCorruptedError(path) from exc | ||
|
||
return result | ||
|
||
|
||
def load_yaml(path, tree=None): | ||
return _load_data(path, parser=parse_yaml, tree=tree) | ||
|
||
|
||
def parse_yaml_for_update(text, path): | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What use case do you see for a separate
render
command? I see that you've already implemented automatic rendering, so not quite sure what is the use case for a separate command.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the feedback!
Ahhh, this is mainly for a user to debug their
dvc.yaml
. Just to check if the rendered stage and variables are doing what you want them to.