run: reference dependencies and outputs in command #2437

ghost · 2019-08-25T01:53:38Z

There have been already many threads discussing different ways to specify
outputs and dependencies on the command invocation:

This issue intents to summarize the different approaches so we can discuss
them and hopefully take a decision. (cc: @iterative/engineering , et al.)

Reducing duplication while defining the run command will help you to:

Make cmd more readable (by having meaningful names instead of relpaths)
Avoid mistakes (typos)
Thinking that you are doing the right thing by "not repeating yourself" (DRY principle) 😬

Ideas

Introduce --pass-params to add all the deps/outs in the same order:

Pros:
- Minimal overhead
Cons:
- Not flexible
- Implicit dvc run command (cryptic)

Introduce named deps/outs and pass them as env vars (e.g. dvc run -d raw=data/raw "for file in $raw ..."):

Pros:
- Flexibility
Cons:
- Not that straight forward to implement (Maybe PathInfo could store it in an alias attribute)
- Shell will try to expand the variable before dvc, maybe we will need to implement a special syntax

Add special syntax, expanding input and output (e.g. dvc run -d foo -o bar "cp {input} > {output}")

Pros:
- Explicitness
Cons:
- No way to split several outputs (e.g. dvc run -o 1.txt -o 2.txt)

Introduce a build matrix - #1018 and use wildcards:

Pros:
- This will solve some problems
Cons:
- This will raise more questions 😅

Makefile shenanigans (e.g. $<, $^, $%, rules, etc.)
Use environment variables and let the shell do the job (e.g raw=data/raw dvc run -d $raw "for file in $raw"):

Pros:
- No code required 🎉
- You can use hip shell tricks when defining your command
Cons:
- Syntax is different for Windows users (%var% instead of $var)

The text was updated successfully, but these errors were encountered:

shcheklein · 2019-08-25T02:31:02Z

@MrOutis amazing summary 🙏

jorgeorpinel · 2019-08-26T07:18:59Z

Enjoyed the summary also 🥂 I like point 3.

Add special syntax, expanding input and output (e.g. dvc run -d foo -o bar "cp {input} > {output}")
...
Cons: No way to split several outputs (e.g. dvc run -o 1.txt -o 2.txt)

Maybe something more like printf (widely known) e.g. dvc run "cp {%d} > {%o}" -d foo -o bar could help overcome the mentioned con.

BUT:

Are there any discussions open about the complexity of dvc run though? I've heard mention about this and so if we decide to break it up into several commands then maybe this whole issue would need to be revisited after that redesign happens.

Here's at least one discussion about this on: https://discuss.dvc.org/t/simplifying-dvc-run-and-pipelines/199

efiop · 2021-10-08T19:13:01Z

Closing as stale. We've switched to dvc.yaml - focused parametrization these days.

ghost added the research label Aug 25, 2019

efiop closed this as completed Oct 8, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

run: reference dependencies and outputs in command #2437

run: reference dependencies and outputs in command #2437

ghost commented Aug 25, 2019

shcheklein commented Aug 25, 2019

jorgeorpinel commented Aug 26, 2019 •

edited

Loading

efiop commented Oct 8, 2021

run: reference dependencies and outputs in command #2437

run: reference dependencies and outputs in command #2437

Comments

ghost commented Aug 25, 2019

Ideas

shcheklein commented Aug 25, 2019

jorgeorpinel commented Aug 26, 2019 • edited Loading

efiop commented Oct 8, 2021

jorgeorpinel commented Aug 26, 2019 •

edited

Loading