diff --git a/content/docs/command-reference/dag.md b/content/docs/command-reference/dag.md index 08122c356b..a905c9b9aa 100644 --- a/content/docs/command-reference/dag.md +++ b/content/docs/command-reference/dag.md @@ -32,8 +32,8 @@ final model, as well as accuracy [metrics](/doc/command-reference/metrics). In DVC, pipeline stages and commands, their data I/O, interdependencies, and results (intermediate or final) are specified in `dvc.yaml`, which can be -written manually or built using the helper command `dvc run`. This allows DVC to -restore one or more pipelines later (see `dvc repro`). +written manually or built using the helper command `dvc stage add`. This allows +DVC to restore one or more pipelines later (see `dvc repro`). > DVC builds a dependency graph > ([DAG](https://en.wikipedia.org/wiki/Directed_acyclic_graph)) to do this. diff --git a/content/docs/command-reference/import-url.md b/content/docs/command-reference/import-url.md index 47a97d0cc7..1493e85b55 100644 --- a/content/docs/command-reference/import-url.md +++ b/content/docs/command-reference/import-url.md @@ -110,17 +110,19 @@ Instead of: $ dvc import-url https://data.dvc.org/get-started/data.xml data.xml ``` -It is possible to use `dvc run`, for example (HTTP URL): +It is possible to use `dvc stage add`, for example (HTTP URL): ```dvc -$ dvc run -n download_data \ - -d https://data.dvc.org/get-started/data.xml \ - -o data.xml \ - wget https://data.dvc.org/get-started/data.xml -O data.xml +$ dvc stage add -n download_data \ + -d https://data.dvc.org/get-started/data.xml \ + -o data.xml \ + wget https://data.dvc.org/get-started/data.xml -O data.xml + +$ dvc repro ``` -`dvc import-url` generates an _import `.dvc` file_ and `dvc run` a regular stage -(in `dvc.yaml`). +`dvc import-url` generates an _import `.dvc` file_ and `dvc stage add` a regular +stage (in `dvc.yaml`). ## Options @@ -297,10 +299,12 @@ $ pip install -r src/requirements.txt ```dvc -$ dvc run -n prepare \ - -d src/prepare.py -d data/data.xml \ - -o data/prepared \ - python src/prepare.py data/data.xml +$ dvc stage add -n prepare \ + -d src/prepare.py -d data/data.xml \ + -o data/prepared \ + python src/prepare.py data/data.xml + +$ dvc repro Running command: python src/prepare.py data/data.xml ... diff --git a/content/docs/command-reference/index.md b/content/docs/command-reference/index.md index 6f097d869f..9a3274647f 100644 --- a/content/docs/command-reference/index.md +++ b/content/docs/command-reference/index.md @@ -15,8 +15,8 @@ does not change directories in your terminal). - Copy data files or dataset directories for modeling into the repository, and track them with DVC using the `dvc add` command. - Process the data with your own source code, using `dvc.yaml` and/or the - `dvc run` command, specifying further outputs that should also be - tracked by DVC after the code is executed. + `dvc stage add` command to specify further outputs that should + also be tracked by DVC, and executing the code using `dvc repro`. - Sharing a DVC repository with the codified data [pipeline](/doc/command-reference/dag) will not include the project's cache. Use [remote storage](/doc/command-reference/remote) and diff --git a/content/docs/command-reference/init.md b/content/docs/command-reference/init.md index e67d4adc26..0f6277d988 100644 --- a/content/docs/command-reference/init.md +++ b/content/docs/command-reference/init.md @@ -126,9 +126,9 @@ include: automation like running a data pipeline using `cron`. In this mode, DVC features related to versioning are not available. For example -automatic creation and updating of `.gitignore` files on `dvc add` or `dvc run`, -as well as `dvc diff` and `dvc metrics diff`, which require Git revisions to -compare. +automatic creation and updating of `.gitignore` files on `dvc add` or +`dvc stage add`, as well as `dvc diff` and `dvc metrics diff`, which require Git +revisions to compare. DVC sets the `core.no_scm` config option value to `true` in the DVC [config](/doc/command-reference/config) when initialized this way. This means diff --git a/content/docs/command-reference/metrics/diff.md b/content/docs/command-reference/metrics/diff.md index 9474dfff79..95ef62e077 100644 --- a/content/docs/command-reference/metrics/diff.md +++ b/content/docs/command-reference/metrics/diff.md @@ -88,12 +88,14 @@ all the current metrics (without comparisons). ## Examples -Start by creating a metrics file and commit it (see the `-M` option of `dvc run` -for more details): +Start by creating a metrics file and commit it (see the `-M` option of +`dvc stage add` for more details): ```dvc -$ dvc run -n eval -M metrics.json \ - 'echo {"AUC": 0.9643, "TP": 527} > metrics.json' +$ dvc stage add -n eval -M metrics.json \ + 'echo {"AUC": 0.9643, "TP": 527} > metrics.json' + +$ dvc repro $ cat metrics.json {"AUC": 0.9643, "TP": 527} diff --git a/content/docs/command-reference/metrics/index.md b/content/docs/command-reference/metrics/index.md index 2e29872e41..72c7d19b2a 100644 --- a/content/docs/command-reference/metrics/index.md +++ b/content/docs/command-reference/metrics/index.md @@ -34,7 +34,7 @@ positives, etc. This type of metrics files are typically generated by user data processing code, and are tracked using the `-m` (`--metrics`) and `-M` (`--metrics-no-cache`) -options of `dvc run`. +options of `dvc stage add`. In contrast to `dvc plots`, these metrics should be stored in hierarchical files. Unlike its `dvc plots` counterpart, `dvc metrics diff` can report the @@ -64,9 +64,9 @@ stages: ``` > `cache: false` above specifies that `summary.json` is not tracked or -> cached by DVC (`-M` option of `dvc run`). These metrics files are -> normally committed with Git instead. See `dvc.yaml` for more information on -> the file format above. +> cached by DVC (`-M` option of `dvc stage add`). These metrics +> files are normally committed with Git instead. See `dvc.yaml` for more +> information on the file format above. ### Supported file formats @@ -106,13 +106,15 @@ First, let's imagine we have a simple [stage](/doc/command-reference/run) that produces an `eval.json` metrics file: ```dvc -$ dvc run -n evaluate -d code/evaluate.py -M eval.json \ - python code/evaluate.py +$ dvc stage add -n evaluate -d code/evaluate.py -M eval.json \ + python code/evaluate.py + +$ dvc repro ``` > `-M` (`--metrics-no-cache`) tells DVC to mark `eval.json` as a metrics file, -> without tracking it directly (You can track it with Git). See `dvc run` for -> more info. +> without tracking it directly (You can track it with Git). See `dvc stage add` +> for more info. Now let's print metrics values that we are tracking in this project, using `dvc metrics show`: diff --git a/content/docs/command-reference/params/diff.md b/content/docs/command-reference/params/diff.md index f40d0b7e3d..c6352608f7 100644 --- a/content/docs/command-reference/params/diff.md +++ b/content/docs/command-reference/params/diff.md @@ -26,7 +26,7 @@ repository history. The differences shown by this command include the old and new param values, along with the param name. > Parameter dependencies are defined in the `params` field of `dvc.yaml` (e.g. -> with the the `-p` (`--params`) option of `dvc run`). +> with the the `-p` (`--params`) option of `dvc stage add`). Without arguments, `dvc params diff` compares parameters currently present in the workspace (uncommitted changes) with the latest committed @@ -95,10 +95,10 @@ Define a pipeline [stage](/doc/command-reference/run) with parameter dependencies: ```dvc -$ dvc run -n train \ - -d train.py -d users.csv -o model.pkl \ - -p lr,train \ - python train.py +$ dvc stage add -n train \ + -d train.py -d users.csv -o model.pkl \ + -p lr,train \ + python train.py ``` Let's now print parameter values that we are tracking in this diff --git a/content/docs/command-reference/params/index.md b/content/docs/command-reference/params/index.md index 7878cb3c48..285685c1d5 100644 --- a/content/docs/command-reference/params/index.md +++ b/content/docs/command-reference/params/index.md @@ -22,7 +22,7 @@ dependencies: _parameters_. They usually have simple names like `epochs`, `learning-rate`, `batch_size`, etc. To start tracking parameters, list them under the `params` field of `dvc.yaml` -stages (manually or with the the `-p`/`--params` option of `dvc run`). For +stages (manually or with the the `-p`/`--params` option of `dvc stage add`). For example: ```yaml @@ -97,14 +97,14 @@ process: bow: 15000 ``` -Using `dvc run`, define a [stage](/doc/command-reference/run) that depends on -params `lr`, `layers`, and `epochs` from the params file above. Full paths +Using `dvc stage add`, define a [stage](/doc/command-reference/run) that depends +on params `lr`, `layers`, and `epochs` from the params file above. Full paths should be used to specify `layers` and `epochs` from the `train` group: ```dvc -$ dvc run -n train -d train.py -d users.csv -o model.pkl \ - -p lr,train.epochs,train.layers \ - python train.py +$ dvc stage add -n train -d train.py -d users.csv -o model.pkl \ + -p lr,train.epochs,train.layers \ + python train.py ``` > Note that we could use the same parameter addressing with JSON, TOML, or @@ -147,9 +147,9 @@ Alternatively, the entire group of parameters `train` can be referenced, instead of specifying each of the params separately: ```dvc -$ dvc run -n train -d train.py -d users.csv -o model.pkl \ - -p lr,train \ - python train.py +$ dvc stage add -n train -d train.py -d users.csv -o model.pkl \ + -p lr,train \ + python train.py ``` ```yaml @@ -161,12 +161,12 @@ params: In the examples above, the default parameters file name `params.yaml` was used. Note that this file name can be redefined using a prefix in the `-p` argument of -`dvc run`. In our case: +`dvc stage add`. In our case: ```dvc -$ dvc run -n train -d train.py -d logs/ -o users.csv -f \ - -p parse_params.yaml:threshold,classes_num \ - python train.py +$ dvc stage add -n train -d train.py -d logs/ -o users.csv -f \ + -p parse_params.yaml:threshold,classes_num \ + python train.py ``` ## Examples: Print all parameters @@ -234,9 +234,9 @@ The following [stage](/doc/command-reference/run) depends on params `BOOL`, `INT`, as well as `TrainConfig`'s `EPOCHS` and `layers`: ```dvc -$ dvc run -n train -d train.py -d users.csv -o model.pkl \ - -p params.py:BOOL,INT,TrainConfig.EPOCHS,TrainConfig.layers \ - python train.py +$ dvc stage add -n train -d train.py -d users.csv -o model.pkl \ + -p params.py:BOOL,INT,TrainConfig.EPOCHS,TrainConfig.layers \ + python train.py ``` Resulting `dvc.yaml` and `dvc.lock` files (notice the `params` lists): @@ -283,7 +283,7 @@ can be referenced supported), instead of the parameters in it: ```dvc -$ dvc run -n train -d train.py -d users.csv -o model.pkl \ - -p params.py:BOOL,INT,TestConfig \ - python train.py +$ dvc stage add -n train -d train.py -d users.csv -o model.pkl \ + -p params.py:BOOL,INT,TestConfig \ + python train.py ``` diff --git a/content/docs/command-reference/plots/modify.md b/content/docs/command-reference/plots/modify.md index 2792ba0e92..e3579910ec 100644 --- a/content/docs/command-reference/plots/modify.md +++ b/content/docs/command-reference/plots/modify.md @@ -27,8 +27,8 @@ plots are generated with `dvc plot show` or `dvc plot diff`. This command sets (or unsets) default display properties for a specific metrics file. The path to the metrics file `target` is required. It must be listed in a -`dvc.yaml` file (see the `--plots` option of `dvc run`). `dvc plots modify` adds -the display properties to `dvc.yaml`. +`dvc.yaml` file (see the `--plots` option of `dvc stage add`). +`dvc plots modify` adds the display properties to `dvc.yaml`. Property names are passed as [options](#options) to this command (prefixed with `--`). These are based on the [Vega-Lite](https://vega.github.io/vega-lite/) @@ -134,8 +134,8 @@ plots: ## Example: Template change -_dvc run --plots file.csv ..._ command assign the default template that needs to -be changed in many cases. A simple command changes the template: +_dvc stage add --plots file.csv ..._ command assign the default template that +needs to be changed in many cases. A simple command changes the template: ```dvc $ dvc plots modify classes.csv --template confusion diff --git a/content/docs/command-reference/repro.md b/content/docs/command-reference/repro.md index 1a98a9774b..28bfa4c906 100644 --- a/content/docs/command-reference/repro.md +++ b/content/docs/command-reference/repro.md @@ -30,7 +30,8 @@ are run one after the other in the order they are defined. The failure of any command will halt the remaining stage execution, and raises an error. > Pipeline stages are defined in `dvc.yaml` (either manually or by using -> `dvc run`) while initial data dependencies can be registered with `dvc add`. +> `dvc stage add`) while initial data dependencies can be registered with +> `dvc add`. `dvc repro` is similar to [Make](https://www.gnu.org/software/make/) in software build automation, but DVC captures build requirements @@ -137,8 +138,8 @@ up-to-date and only execute the final stage. `dvc commit` to finish the operation. - `-m`, `--metrics` - show metrics after reproduction. The target pipelines must - have at least one metrics file defined either with `dvc metrics` or by the - `-M` or `-m` options of `dvc run` + have at least one [metrics](/doc/command-reference/metrics) file defined in + `dvc.yaml`. - `--dry` - only print the commands that would be executed without actually executing the commands. @@ -170,10 +171,10 @@ up-to-date and only execute the final stage. stages (`A` and below) depend on `requirements.txt`, we can specify it in `A`, and omit it in `B` and `C`. - Like with the `--force` option on `dvc run`, this is a way to force-execute - stages without changes. This can also be useful for pipelines containing - stages that produce non-deterministic (semi-random) outputs, where outputs can - vary on each execution, meaning the cache cannot be trusted for such stages. + This is a way to force-execute stages without changes. This can also be useful + for pipelines containing stages that produce non-deterministic (semi-random) + outputs, where outputs can vary on each execution, meaning the cache cannot be + trusted for such stages. - `--downstream` - only execute the stages after the given `targets` in their corresponding pipelines, including the target stages themselves. This option @@ -213,10 +214,10 @@ best And runs a few simple transformations to filter and count numbers: ```dvc -$ dvc run -n filter -d text.txt -o numbers.txt \ +$ dvc stage add -n filter -d text.txt -o numbers.txt \ "cat text.txt | egrep '[0-9]+' > numbers.txt" -$ dvc run -n count -d numbers.txt -d process.py -M count.txt \ +$ dvc stage add -n count -d numbers.txt -d process.py -M count.txt \ "python process.py numbers.txt > count.txt" ``` @@ -232,9 +233,24 @@ with open(sys.argv[1], 'r') as f: print(num_lines) ``` -The result of executing these `dvc run` commands should look like this: +The result of executing `dvc repro` should look like this (`cat` shows the +contents of a file and `tree` shows the contents of the working directory): ```dvc +$ dvc repro +Running stage 'filter': +> cat text.txt | egrep '[0-9]+' > numbers.txt +Generating lock file 'dvc.lock' +Updating lock file 'dvc.lock' + +Running stage 'count': +> python process.py numbers.txt > count.txt +Updating lock file 'dvc.lock' +Use `dvc push` to send your updates to remote storage. + +$ cat count.txt +2 + $ tree . ├── count.txt <---- result: "2" @@ -248,18 +264,8 @@ $ tree You may want to check the contents of `dvc.lock` and `count.txt` for later reference. -Ok, now let's run `dvc repro`: - -```dvc -$ dvc repro -Stage 'filter' didn't change, skipping -Stage 'count' didn't change, skipping -Data and pipelines are up to date. -``` - -It makes sense, since we haven't changed any of the dependencies of this -pipeline (`text.txt` and `process.py`). Now, let's imagine we want to print a -description and we add this line to the `process.py`: +Now, let's imagine we want to print a description and we add this line to the +`process.py`: ```python ... diff --git a/content/docs/command-reference/status.md b/content/docs/command-reference/status.md index c644873e92..7c7775336c 100644 --- a/content/docs/command-reference/status.md +++ b/content/docs/command-reference/status.md @@ -57,7 +57,7 @@ description_, as detailed below: - _always changed_ means that this is a `.dvc` file with no dependencies (see `dvc add`) or that the stage in `dvc.yaml` has the `always_changed: true` - value set (see `--always-changed` option in `dvc run`). + value set (see `--always-changed` option in `dvc stage add`). - _changed deps_ or _changed outs_ means that there are changes in dependencies or outputs tracked by the stage or `.dvc` file. Depending on the use case,