Skip to content

Commit

Permalink
replace dvc run in cmd-ref (#3223)
Browse files Browse the repository at this point in the history
  • Loading branch information
Dave Berenbaum authored and iesahin committed Apr 11, 2022
1 parent f758e26 commit f704dad
Show file tree
Hide file tree
Showing 11 changed files with 95 additions and 81 deletions.
4 changes: 2 additions & 2 deletions content/docs/command-reference/dag.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,8 +32,8 @@ final model, as well as accuracy [metrics](/doc/command-reference/metrics).

In DVC, pipeline stages and commands, their data I/O, interdependencies, and
results (intermediate or final) are specified in `dvc.yaml`, which can be
written manually or built using the helper command `dvc run`. This allows DVC to
restore one or more pipelines later (see `dvc repro`).
written manually or built using the helper command `dvc stage add`. This allows
DVC to restore one or more pipelines later (see `dvc repro`).

> DVC builds a dependency graph
> ([DAG](https://en.wikipedia.org/wiki/Directed_acyclic_graph)) to do this.
Expand Down
26 changes: 15 additions & 11 deletions content/docs/command-reference/import-url.md
Original file line number Diff line number Diff line change
Expand Up @@ -110,17 +110,19 @@ Instead of:
$ dvc import-url https://data.dvc.org/get-started/data.xml data.xml
```

It is possible to use `dvc run`, for example (HTTP URL):
It is possible to use `dvc stage add`, for example (HTTP URL):

```dvc
$ dvc run -n download_data \
-d https://data.dvc.org/get-started/data.xml \
-o data.xml \
wget https://data.dvc.org/get-started/data.xml -O data.xml
$ dvc stage add -n download_data \
-d https://data.dvc.org/get-started/data.xml \
-o data.xml \
wget https://data.dvc.org/get-started/data.xml -O data.xml
$ dvc repro
```

`dvc import-url` generates an _import `.dvc` file_ and `dvc run` a regular stage
(in `dvc.yaml`).
`dvc import-url` generates an _import `.dvc` file_ and `dvc stage add` a regular
stage (in `dvc.yaml`).

## Options

Expand Down Expand Up @@ -297,10 +299,12 @@ $ pip install -r src/requirements.txt
</details>

```dvc
$ dvc run -n prepare \
-d src/prepare.py -d data/data.xml \
-o data/prepared \
python src/prepare.py data/data.xml
$ dvc stage add -n prepare \
-d src/prepare.py -d data/data.xml \
-o data/prepared \
python src/prepare.py data/data.xml
$ dvc repro
Running command:
python src/prepare.py data/data.xml
...
Expand Down
4 changes: 2 additions & 2 deletions content/docs/command-reference/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,8 +15,8 @@ does not change directories in your terminal).
- Copy data files or dataset directories for modeling into the repository, and
track them with DVC using the `dvc add` command.
- Process the data with your own source code, using `dvc.yaml` and/or the
`dvc run` command, specifying further <abbr>outputs</abbr> that should also be
tracked by DVC after the code is executed.
`dvc stage add` command to specify further <abbr>outputs</abbr> that should
also be tracked by DVC, and executing the code using `dvc repro`.
- Sharing a <abbr>DVC repository</abbr> with the codified data
[pipeline](/doc/command-reference/dag) will not include the project's
<abbr>cache</abbr>. Use [remote storage](/doc/command-reference/remote) and
Expand Down
6 changes: 3 additions & 3 deletions content/docs/command-reference/init.md
Original file line number Diff line number Diff line change
Expand Up @@ -126,9 +126,9 @@ include:
automation like running a data pipeline using `cron`.

In this mode, DVC features related to versioning are not available. For example
automatic creation and updating of `.gitignore` files on `dvc add` or `dvc run`,
as well as `dvc diff` and `dvc metrics diff`, which require Git revisions to
compare.
automatic creation and updating of `.gitignore` files on `dvc add` or
`dvc stage add`, as well as `dvc diff` and `dvc metrics diff`, which require Git
revisions to compare.

DVC sets the `core.no_scm` config option value to `true` in the DVC
[config](/doc/command-reference/config) when initialized this way. This means
Expand Down
10 changes: 6 additions & 4 deletions content/docs/command-reference/metrics/diff.md
Original file line number Diff line number Diff line change
Expand Up @@ -88,12 +88,14 @@ all the current metrics (without comparisons).

## Examples

Start by creating a metrics file and commit it (see the `-M` option of `dvc run`
for more details):
Start by creating a metrics file and commit it (see the `-M` option of
`dvc stage add` for more details):

```dvc
$ dvc run -n eval -M metrics.json \
'echo {"AUC": 0.9643, "TP": 527} > metrics.json'
$ dvc stage add -n eval -M metrics.json \
'echo {"AUC": 0.9643, "TP": 527} > metrics.json'
$ dvc repro
$ cat metrics.json
{"AUC": 0.9643, "TP": 527}
Expand Down
18 changes: 10 additions & 8 deletions content/docs/command-reference/metrics/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ positives, etc.

This type of metrics files are typically generated by user data processing code,
and are tracked using the `-m` (`--metrics`) and `-M` (`--metrics-no-cache`)
options of `dvc run`.
options of `dvc stage add`.

In contrast to `dvc plots`, these metrics should be stored in hierarchical
files. Unlike its `dvc plots` counterpart, `dvc metrics diff` can report the
Expand Down Expand Up @@ -64,9 +64,9 @@ stages:
```
> `cache: false` above specifies that `summary.json` is not tracked or
> <abbr>cached</abbr> by DVC (`-M` option of `dvc run`). These metrics files are
> normally committed with Git instead. See `dvc.yaml` for more information on
> the file format above.
> <abbr>cached</abbr> by DVC (`-M` option of `dvc stage add`). These metrics
> files are normally committed with Git instead. See `dvc.yaml` for more
> information on the file format above.

### Supported file formats

Expand Down Expand Up @@ -106,13 +106,15 @@ First, let's imagine we have a simple [stage](/doc/command-reference/run) that
produces an `eval.json` metrics file:

```dvc
$ dvc run -n evaluate -d code/evaluate.py -M eval.json \
python code/evaluate.py
$ dvc stage add -n evaluate -d code/evaluate.py -M eval.json \
python code/evaluate.py
$ dvc repro
```

> `-M` (`--metrics-no-cache`) tells DVC to mark `eval.json` as a metrics file,
> without tracking it directly (You can track it with Git). See `dvc run` for
> more info.
> without tracking it directly (You can track it with Git). See `dvc stage add`
> for more info.

Now let's print metrics values that we are tracking in this
<abbr>project</abbr>, using `dvc metrics show`:
Expand Down
10 changes: 5 additions & 5 deletions content/docs/command-reference/params/diff.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ repository history. The differences shown by this command include the old and
new param values, along with the param name.

> Parameter dependencies are defined in the `params` field of `dvc.yaml` (e.g.
> with the the `-p` (`--params`) option of `dvc run`).
> with the the `-p` (`--params`) option of `dvc stage add`).
Without arguments, `dvc params diff` compares parameters currently present in
the <abbr>workspace</abbr> (uncommitted changes) with the latest committed
Expand Down Expand Up @@ -95,10 +95,10 @@ Define a pipeline [stage](/doc/command-reference/run) with parameter
dependencies:
```dvc
$ dvc run -n train \
-d train.py -d users.csv -o model.pkl \
-p lr,train \
python train.py
$ dvc stage add -n train \
-d train.py -d users.csv -o model.pkl \
-p lr,train \
python train.py
```

Let's now print parameter values that we are tracking in this
Expand Down
38 changes: 19 additions & 19 deletions content/docs/command-reference/params/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ dependencies: _parameters_. They usually have simple names like `epochs`,
`learning-rate`, `batch_size`, etc.

To start tracking parameters, list them under the `params` field of `dvc.yaml`
stages (manually or with the the `-p`/`--params` option of `dvc run`). For
stages (manually or with the the `-p`/`--params` option of `dvc stage add`). For
example:

```yaml
Expand Down Expand Up @@ -97,14 +97,14 @@ process:
bow: 15000
```

Using `dvc run`, define a [stage](/doc/command-reference/run) that depends on
params `lr`, `layers`, and `epochs` from the params file above. Full paths
Using `dvc stage add`, define a [stage](/doc/command-reference/run) that depends
on params `lr`, `layers`, and `epochs` from the params file above. Full paths
should be used to specify `layers` and `epochs` from the `train` group:

```dvc
$ dvc run -n train -d train.py -d users.csv -o model.pkl \
-p lr,train.epochs,train.layers \
python train.py
$ dvc stage add -n train -d train.py -d users.csv -o model.pkl \
-p lr,train.epochs,train.layers \
python train.py
```

> Note that we could use the same parameter addressing with JSON, TOML, or
Expand Down Expand Up @@ -147,9 +147,9 @@ Alternatively, the entire group of parameters `train` can be referenced, instead
of specifying each of the params separately:

```dvc
$ dvc run -n train -d train.py -d users.csv -o model.pkl \
-p lr,train \
python train.py
$ dvc stage add -n train -d train.py -d users.csv -o model.pkl \
-p lr,train \
python train.py
```

```yaml
Expand All @@ -161,12 +161,12 @@ params:

In the examples above, the default parameters file name `params.yaml` was used.
Note that this file name can be redefined using a prefix in the `-p` argument of
`dvc run`. In our case:
`dvc stage add`. In our case:

```dvc
$ dvc run -n train -d train.py -d logs/ -o users.csv -f \
-p parse_params.yaml:threshold,classes_num \
python train.py
$ dvc stage add -n train -d train.py -d logs/ -o users.csv -f \
-p parse_params.yaml:threshold,classes_num \
python train.py
```

## Examples: Print all parameters
Expand Down Expand Up @@ -234,9 +234,9 @@ The following [stage](/doc/command-reference/run) depends on params `BOOL`,
`INT`, as well as `TrainConfig`'s `EPOCHS` and `layers`:

```dvc
$ dvc run -n train -d train.py -d users.csv -o model.pkl \
-p params.py:BOOL,INT,TrainConfig.EPOCHS,TrainConfig.layers \
python train.py
$ dvc stage add -n train -d train.py -d users.csv -o model.pkl \
-p params.py:BOOL,INT,TrainConfig.EPOCHS,TrainConfig.layers \
python train.py
```

Resulting `dvc.yaml` and `dvc.lock` files (notice the `params` lists):
Expand Down Expand Up @@ -283,7 +283,7 @@ can be referenced
supported), instead of the parameters in it:

```dvc
$ dvc run -n train -d train.py -d users.csv -o model.pkl \
-p params.py:BOOL,INT,TestConfig \
python train.py
$ dvc stage add -n train -d train.py -d users.csv -o model.pkl \
-p params.py:BOOL,INT,TestConfig \
python train.py
```
8 changes: 4 additions & 4 deletions content/docs/command-reference/plots/modify.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,8 +27,8 @@ plots are generated with `dvc plot show` or `dvc plot diff`. This command sets
(or unsets) default display properties for a specific metrics file.

The path to the metrics file `target` is required. It must be listed in a
`dvc.yaml` file (see the `--plots` option of `dvc run`). `dvc plots modify` adds
the display properties to `dvc.yaml`.
`dvc.yaml` file (see the `--plots` option of `dvc stage add`).
`dvc plots modify` adds the display properties to `dvc.yaml`.

Property names are passed as [options](#options) to this command (prefixed with
`--`). These are based on the [Vega-Lite](https://vega.github.io/vega-lite/)
Expand Down Expand Up @@ -134,8 +134,8 @@ plots:

## Example: Template change

_dvc run --plots file.csv ..._ command assign the default template that needs to
be changed in many cases. A simple command changes the template:
_dvc stage add --plots file.csv ..._ command assign the default template that
needs to be changed in many cases. A simple command changes the template:

```dvc
$ dvc plots modify classes.csv --template confusion
Expand Down
50 changes: 28 additions & 22 deletions content/docs/command-reference/repro.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,8 @@ are run one after the other in the order they are defined. The failure of any
command will halt the remaining stage execution, and raises an error.

> Pipeline stages are defined in `dvc.yaml` (either manually or by using
> `dvc run`) while initial data dependencies can be registered with `dvc add`.
> `dvc stage add`) while initial data dependencies can be registered with
> `dvc add`.
`dvc repro` is similar to [Make](https://www.gnu.org/software/make/) in software
build automation, but DVC captures build requirements
Expand Down Expand Up @@ -137,8 +138,8 @@ up-to-date and only execute the final stage.
`dvc commit` to finish the operation.

- `-m`, `--metrics` - show metrics after reproduction. The target pipelines must
have at least one metrics file defined either with `dvc metrics` or by the
`-M` or `-m` options of `dvc run`
have at least one [metrics](/doc/command-reference/metrics) file defined in
`dvc.yaml`.

- `--dry` - only print the commands that would be executed without actually
executing the commands.
Expand Down Expand Up @@ -170,10 +171,10 @@ up-to-date and only execute the final stage.
stages (`A` and below) depend on `requirements.txt`, we can specify it in `A`,
and omit it in `B` and `C`.

Like with the `--force` option on `dvc run`, this is a way to force-execute
stages without changes. This can also be useful for pipelines containing
stages that produce non-deterministic (semi-random) outputs, where outputs can
vary on each execution, meaning the cache cannot be trusted for such stages.
This is a way to force-execute stages without changes. This can also be useful
for pipelines containing stages that produce non-deterministic (semi-random)
outputs, where outputs can vary on each execution, meaning the cache cannot be
trusted for such stages.

- `--downstream` - only execute the stages after the given `targets` in their
corresponding pipelines, including the target stages themselves. This option
Expand Down Expand Up @@ -213,10 +214,10 @@ best
And runs a few simple transformations to filter and count numbers:

```dvc
$ dvc run -n filter -d text.txt -o numbers.txt \
$ dvc stage add -n filter -d text.txt -o numbers.txt \
"cat text.txt | egrep '[0-9]+' > numbers.txt"
$ dvc run -n count -d numbers.txt -d process.py -M count.txt \
$ dvc stage add -n count -d numbers.txt -d process.py -M count.txt \
"python process.py numbers.txt > count.txt"
```

Expand All @@ -232,9 +233,24 @@ with open(sys.argv[1], 'r') as f:
print(num_lines)
```

The result of executing these `dvc run` commands should look like this:
The result of executing `dvc repro` should look like this (`cat` shows the
contents of a file and `tree` shows the contents of the working directory):

```dvc
$ dvc repro
Running stage 'filter':
> cat text.txt | egrep '[0-9]+' > numbers.txt
Generating lock file 'dvc.lock'
Updating lock file 'dvc.lock'
Running stage 'count':
> python process.py numbers.txt > count.txt
Updating lock file 'dvc.lock'
Use `dvc push` to send your updates to remote storage.
$ cat count.txt
2
$ tree
.
├── count.txt <---- result: "2"
Expand All @@ -248,18 +264,8 @@ $ tree
You may want to check the contents of `dvc.lock` and `count.txt` for later
reference.

Ok, now let's run `dvc repro`:

```dvc
$ dvc repro
Stage 'filter' didn't change, skipping
Stage 'count' didn't change, skipping
Data and pipelines are up to date.
```

It makes sense, since we haven't changed any of the dependencies of this
pipeline (`text.txt` and `process.py`). Now, let's imagine we want to print a
description and we add this line to the `process.py`:
Now, let's imagine we want to print a description and we add this line to the
`process.py`:

```python
...
Expand Down
2 changes: 1 addition & 1 deletion content/docs/command-reference/status.md
Original file line number Diff line number Diff line change
Expand Up @@ -57,7 +57,7 @@ description_, as detailed below:

- _always changed_ means that this is a `.dvc` file with no dependencies (see
`dvc add`) or that the stage in `dvc.yaml` has the `always_changed: true`
value set (see `--always-changed` option in `dvc run`).
value set (see `--always-changed` option in `dvc stage add`).

- _changed deps_ or _changed outs_ means that there are changes in dependencies
or outputs tracked by the stage or `.dvc` file. Depending on the use case,
Expand Down

0 comments on commit f704dad

Please sign in to comment.