Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

QA: params/vars and loops in dvc.yaml (2.0) round #1 #4854

Closed
4 of 5 tasks
jorgeorpinel opened this issue Nov 6, 2020 · 4 comments
Closed
4 of 5 tasks

QA: params/vars and loops in dvc.yaml (2.0) round #1 #4854

jorgeorpinel opened this issue Nov 6, 2020 · 4 comments
Labels
A: templating Related to the templating feature discussion requires active participation to reach a conclusion enhancement Enhances DVC

Comments

@jorgeorpinel
Copy link
Contributor

jorgeorpinel commented Nov 6, 2020

Last tested with DVC 1.9.1+e21142, last commit in #4734
p.s. the items with check boxes are the most relevant things I found.

Bugs

  • The "Looping through a dictionary" (given under foreach, not read from vars or params file) example from Implement foreach ... in loop in dvc.yaml #4734 doesn't loop. dvc repro just runs a single stage using the given dict as the only item in the loop.

  • Edge: When use: none (where none is an invalid/inexistent path) is given, no params file is used. But it seems that params.yaml is still parsed because if vars has a key that already exists in params.yaml, an error is thrown (since overwriting keys is not allowed). However, if the value is removed from vars, the value is not actually read from the params file (also resulting in an error).

Usability

Mainly from #4734 (review)

  • Foreach loops: since in: is always followed by cmd: what about just avoid in and go straight to cmd, etc.? Like this:

    stages:
      something:
        foreach: ...
        cmd: python script.py --thresh ${item...}
        deps: ...
  • It's not possible to dvc repro stage groups, since the final stage name is built from vars/params values. E.g. in the "Looping through a list" example of Implement foreach ... in loop in dvc.yaml #4734, dvc repro build doesn't work. You can however dvc repro build-foo (which is fine). I think that entire groups should be valid targets too.

  • Should use be called params-file or import? "use" seems like it has a different meaning i.e. pipelines "use" all sorts of things (e.g. dependencies) and the keyword use in some languages has to do with namespaces (e.g. in PHP).

  • use accepts inexistent paths, resulting in silently not loading any params file, even if params.yaml is present. I like this as use: none could be a useful trick, but should it at warn that the given file doesn't exist?

  • If use is given a directory path, it prints a generic unexpected error - [Errno 13] Permission denied: '{path}'. How about a check that the given path is a valid YAML file?

  • When vars used in dvc.yaml don't exist in the params file or vars, you get ERROR: unexpected error - Could not find '{key}' in {}: '{key}' - improve error messageHow about a more informative message like "{key} not found in 'dvc.yaml'vars` or '{params-file}'." ?

  • vars cannot overwrite values in the params file, if both are used. Is this a desired limitation?

@jorgeorpinel jorgeorpinel changed the title QA: variables and loops in dvc.yaml (2.0) [WIP] QA: variables and loops in dvc.yaml (2.0) Nov 6, 2020
@jorgeorpinel jorgeorpinel added bug Did we break something? discussion requires active participation to reach a conclusion enhancement Enhances DVC labels Nov 6, 2020
@jorgeorpinel
Copy link
Contributor Author

This is no longer WIP cc @skshetry @efiop thanks

@jorgeorpinel jorgeorpinel changed the title QA: variables and loops in dvc.yaml (2.0) QA: params/vars and loops in dvc.yaml (2.0) Nov 6, 2020
@skshetry
Copy link
Member

skshetry commented Nov 6, 2020

The "Looping through a dictionary" (given under foreach, not read from vars or params file) example from #4734 doesn't loop. dvc repro just runs a single stage using the given dict as the only item in the loop.

Ahh, I posted the wrong data inside dvc.yaml.

It should be:

stages:
  build:
    foreach:
        us:
          thresh: 10
        gb:
          thresh: 15
    in:
      cmd: echo ${item.thresh}

Edge: When use: none (where none is an invalid/inexistent path) is given, no params file is used. But it seems that params.yaml is still parsed because if vars has a key that already exists in params.yaml, an error is thrown (since overwriting keys is not allowed). However, if the value is removed from vars, the value is not actually read from the params file (also resulting in an error).

Thanks, this is a good one. use is mostly used for interpolating data for foreach. The stage read variables directly from params.yaml already (even without parametrization) by default on params section. Maybe, the behavior you mentioned: changing the behavior of use to change the default file stages look to, is more intuitive.

Foreach loops: since in: is always followed by cmd: what about just avoid in and go straight to cmd? Like this:

Not sure, as I wanted to keep the separation between foreach and the generated stages (this is complicated by the fact of wdir which is a stage-level thing that also changes the outs/params/deps location if wdir is templatized). But, I'll consider and discuss your suggestion.

It's not possible to dvc repro stage groups

I guess this goes to the convenience part, which we can focus later. Let's focus on the design and behavior of the features and not focus too much on the error messages and cli part. As, I think the design and behavior are not quite easy to grasp for users and might not be intuitive (as seen from lots of your suggestion). That's be greatly appreciated.

vars cannot overwrite values in the params file, if both are used. Is this a desired limitation?

That's a good design question. What should the syntax be? Is it always clear if we overwrite by default rather than erroring out?
Initially we did thought of ${params.yaml:x} and ${params.json:y} syntax but it's a bit verbose.


Thanks, @jorgeorpinel for the feedback. I'll focus on improving messages at a later stage, though I'll introduce a few patches for overwriting and set so that I can ask for the feedback on the issue tickets as well.

@jorgeorpinel
Copy link
Contributor Author

OK. So mainly here the edge bug is more or less important to address first. I agree everything else is secondary but let's keep it here to follow up when the time comes, preferably before this is released to users, as that's the purpose of QA.

I wanted to keep the separation between foreach and the generated stages

I don't understand how removing in in the foreach syntax relates to this idea. It's just a better (I think) syntax but nothing really changes?

vars cannot overwrite values in the params file

What should the syntax be? Is it always clear if we overwrite by default rather than erroring out?

I think permitting overwriting is more intuitive, because merging params and vars only if they're non-overlapping seems pretty random too.

Thanks!

@jorgeorpinel jorgeorpinel changed the title QA: params/vars and loops in dvc.yaml (2.0) QA: params/vars and loops in dvc.yaml (2.0) round #1 Dec 27, 2020
@jorgeorpinel jorgeorpinel removed the bug Did we break something? label Dec 27, 2020
@jorgeorpinel
Copy link
Contributor Author

This one is mostly addressed and outdated so let's continue in #5165.

@skshetry skshetry added the A: templating Related to the templating feature label May 17, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A: templating Related to the templating feature discussion requires active participation to reach a conclusion enhancement Enhances DVC
Projects
None yet
Development

No branches or pull requests

2 participants