Skip to content

Commit

Permalink
cmd ref: clarify ... (argparse remainder) in dvc run command ar…
Browse files Browse the repository at this point in the history
…g, and

- complete `dvc remote add` `name` arg help output same as in iterative/dvc@ac5a37c
- change "aka" for "a.k.a" throughout docs
  • Loading branch information
jorgeorpinel committed Jun 27, 2019
1 parent 99bb9ff commit 81b4400
Show file tree
Hide file tree
Showing 3 changed files with 22 additions and 13 deletions.
2 changes: 1 addition & 1 deletion static/docs/commands-reference/remote_add.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ usage: dvc remote add [-h] [--global] [--system] [--local] [-q | -v]
[-d] [-f] name url
positional arguments:
name Name.
name Name of the remote.
url URL. (See supported URLs below.)
```

Expand Down
31 changes: 20 additions & 11 deletions static/docs/commands-reference/run.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,24 +12,27 @@ usage: dvc run [-h] [-q | -v] [-d DEPS] [-o OUTS] [-O OUTS_NO_CACHE]
[--ignore-build-cache] [--remove-outs] [--no-commit]
[--outs-persist OUTS_PERSIST]
[--outs-persist-no-cache OUTS_PERSIST_NO_CACHE]
command
...
positional arguments:
command Command to execute.
```

## Description

`dvc run` provides an interface to build a computational graph (aka pipeline).
It's a way to describe commands, data inputs and intermediate results that went
into a model (or other data results). By explicitly specifying a list of
dependencies (with `-d` option) and outputs (with `-o`, `-O`, `-m`, or `-M`
options) DVC can connect individual stages (commands) into a directed acyclic
graph (DAG). `dvc repro` provides an interface to check state and reproduce this
graph later. This concept is similar to the one of the `Makefile` but DVC
captures data and caches data artifacts along the way. Check this
[example](/doc/get-started/example-pipeline) to learn more and try to build a
pipeline.
`dvc run` provides an interface to build a computational graph (a.k.a.
pipeline). It's a way to describe commands, data inputs and intermediate results
that go into creating a ML model (or other data results). By explicitly
specifying a list of dependencies (with `-d` option) and outputs (with `-o`,
`-O`, `-m`, or `-M` options) DVC can connect each individual stage (command)
into a directed acyclic graph (DAG). All the command-line input provided to
`dvc run` after the optional arguments (`-` or `--` dashed options) will become
the required `command` argument.

> Remember to wrap the `command` with `"` quotes if there are special characters
> in it like `|` (pipe) or `<`, `>` (redirection) that would otherwise apply to
> the entire `dvc run` command. E.g.
> `dvc run -d script.sh "script.sh > /dev/null 2>&1"`
Unless the `-f` options is used, by default the DVC-file name generated is
`<file>.dvc`, where `<file>` is file name of the first output (`-o`, `-O`, `-m`,
Expand All @@ -42,6 +45,12 @@ graph integrity properties before creating a new stage. For example, for every
output there should be only one stage that explicitly specifies it. There should
be no cycles, etc.

Note that `dvc repro` provides an interface to check state and reproduce this
graph later. This concept is similar to the one of the `Makefile` but DVC
captures data and caches data artifacts along the way. Check this
[example](/doc/get-started/example-pipeline) to learn more and try to build a
pipeline.

## Options

- `-d`, `--deps` - specify a file or a directory the stage depends on. Multiple
Expand Down
2 changes: 1 addition & 1 deletion static/docs/user-guide/large-dataset-optimization.md
Original file line number Diff line number Diff line change
Expand Up @@ -80,7 +80,7 @@ efficiency:
> instead deleted and then replaced with a new file, otherwise it might cause
> cache corruption – and automatic deletion of cached files by DVC.
3. **`symlink`** - symbolic (aka "soft") links are the most efficient way to
3. **`symlink`** - symbolic (a.k.a. "soft") links are the most efficient way to
link your data to cache if your repo and your cache directory are located on
different file systems/drives (i.e. repo is located on SSD for performance,
but cache dir is located on HDD for bigger storage).
Expand Down

0 comments on commit 81b4400

Please sign in to comment.