Skip to content

Commit

Permalink
ref: exp init improvements (#3071)
Browse files Browse the repository at this point in the history
* ref: first copy edits to exp init

* ref: clarify exp init explanations

* ref: clarify `exp init` option descriptions

* ref: re-describe `exp init` + reorder in nav and `exp` help
per #3071 (review)

* ref: clarify params.yaml is needed only with defaults in params.yaml
per #3071 (comment)

* ref: clarify what --interactive prompts user for
per #3071 (review)

* ref: link from exp init to config section and
mention --explicit avoids a params.yaml file too.

* ref: simplify exp init --explicit explanation

* ref: explain why params.yaml are required (by default) in exp init
per #3071 (comment)

* ref: copy edits to exp init

* ref: add simple example to exp init
rel. #3071 (comment)

* ref: use model training example in exp init
per #3071 (review)

* ref: shorten sample block
  • Loading branch information
jorgeorpinel authored Dec 23, 2021
1 parent b41f6b0 commit 33a303c
Show file tree
Hide file tree
Showing 4 changed files with 172 additions and 93 deletions.
14 changes: 7 additions & 7 deletions content/docs/command-reference/exp/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,19 +26,19 @@ usage: dvc exp [-h] [-q | -v]
positional arguments:
COMMAND
init Quickly setup any project to use DVC Experiments.
run Reproduce complete or partial experiment pipelines.
show Print experiments.
apply Apply the changes from an experiment to your
workspace.
diff Show changes between experiments in the DVC
repository.
run Reproduce complete or partial experiment pipelines.
gc Garbage collect unneeded experiments.
branch Promote an experiment to a Git branch.
list List local and remote experiments.
apply Apply the changes from an experiment to your
workspace.
branch Promote an experiment to a Git branch.
remove Remove local experiments.
gc Garbage collect unneeded experiments.
push Push a local experiment to a Git remote.
pull Pull an experiment from a Git remote.
remove Remove local experiments.
init Codify project using DVC metafiles to run experiments.
```

## Description
Expand Down
229 changes: 153 additions & 76 deletions content/docs/command-reference/exp/init.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,6 @@
# exp init

Codify project using [DVC metafiles](/doc/user-guide/project-structure) to run
[experiments](/doc/user-guide/experiment-management).
Quickly setup any project to use [DVC Experiments].

> Requires a <abbr>DVC repository</abbr>, created with `git init` and
> `dvc init`.
Expand All @@ -19,43 +18,60 @@ usage: dvc exp init [-h] [-q | -v] [--run] [--interactive] [-f]

## Description

`dvc exp init` helps you quickly get started with experiments. It reduces
boilerplate for initializing [pipeline](/doc/command-reference/dag) stages in a
`dvc.yaml` file by assuming defaults about the location of your data,
[parameters](/doc/command-reference/params), source code, models,
[metrics](/doc/command-reference/metrics) and
[plots](/doc/command-reference/plots), which can be customized through config.
`dvc exp init` helps you get started with DVC Experiments quickly. It reduces
boilerplate DVC procedures by creating a `dvc.yaml` file that assumes standard
locations of your input data, <abbr>parameters</abbr>, source code, models,
<abbr>metrics</abbr> and [plots](/doc/command-reference/plots). These locations
can be customized through the [options](#options) below or via
[configuration](/doc/command-reference/config#exp).

It also offers guided `--interactive` mode for creating a stage to be
[`exp run`](/doc/command-reference/exp/run) later. `dvc exp init` supports
creating different types of stages, eg: `dl` if you are doing deep learning,
which uses [dvclive](/doc/dvclive) to monitor and checkpoint progress during
training of machine learning models.
Repository structure assumed by default:

This command is intended to be a quick way to start running experiments. To
create more complex stages and pipelines, use `dvc stage add`.
```
├── data/
├── metrics.json
├── models/
├── params.yaml # required
├── plots/
└── src/
```

> Note that `dvc exp init` expects at least a `params.yaml` file present. DVC
> reads it to find parameters to include in the [stage definition]. It can
> however be omitted when using the `--explicit` and/or `-i` flags.
> 📖 More context in [Experiments Overview].
You must always provide a command that runs your experiment(s). It can be given
either directly [as an argument](#the-command-argument), or by using the
`--interactive` (`-i`) mode which will prompt you for it. This command will be
wrapped as a <abbr>stage</abbr> that `dvc exp run` can execute.

[experiments overview]:
/doc/user-guide/experiment-management/experiments-overview
Different types of stages are supported, such as `dl` (deep learning) which uses
[DVCLive](/doc/dvclive) to monitor [checkpoints] during training of ML models.

> `dvc exp init` is intended as a quick way to start running [DVC Experiments].
> See the `dvc.yaml` specification for complex data pipelines.
[stage definition]:
/doc/user-guide/project-structure/pipelines-files#stage-entries
[checkpoints]: /doc/user-guide/experiment-management/checkpoints
[dvc experiments]: /doc/user-guide/experiment-management/experiments-overview

### The `command` argument

The `command` argument is optional, if you are using `--interactive` mode. The
`command` sent to `dvc exp init` can be anything your terminal would accept and
run directly, for example a shell built-in, expression, or binary found in
`PATH`. Please remember that any flags sent after the `command` are interpreted
by the command itself, not by `dvc exp init`.
The command given to `dvc exp init` can be anything your system terminal would
accept and run directly, for example a shell built-in, an expression, or a
binary found in `PATH`. Please note that any flags sent after the `command`
argument will normally become part of that command itself and ignored by
`dvc exp init` (so provide it last).

⚠️ While DVC is platform-agnostic, the commands defined in your
[pipeline](/doc/command-reference/dag) stages may only work on some operating
systems and require certain software packages to be installed.
⚠️ While DVC is platform-agnostic, commands defined in `dvc.yaml` (`cmd` field)
may only work on some operating systems and require certain software packages or
libraries in the environment.

Wrap the command with double quotes `"` if there are special characters in it
like `|` (pipe) or `<`, `>` (redirection), otherwise they would apply to
`dvc exp init` itself. Use single quotes `'` instead if there are environment
variables in it that should be evaluated dynamically. Examples:
Surround the command with double quotes `"` if it includes special characters
like `|` or `<`, `>` -- otherwise they would apply to `dvc exp init` itself. Use
single quotes `'` instead if there are environment variables in it that should
be evaluated dynamically.

```dvc
$ dvc exp init "./a_script.sh > /dev/null 2>&1"
Expand All @@ -64,71 +80,132 @@ $ dvc exp init './another_script.sh $MYENVVAR'

## Options

- `-i`, `--interactive` - prompts user for the command to execute and different
paths for tracking outputs and dependencies, unless they are provided through
arguments explicitly. Interactive mode allows users to set those locations
from default values or omit them.
- `-i`, `--interactive` - prompts user for a command that runs your
experiment(s) (see [details](#the-command-argument)) and to confirm or define
the paths that conform your repo's structure.

- `--explicit` - `dvc exp init` assumes default location of your outputs and
dependencies (which can be overriden from the config). By using `--explicit`,
it will not use those default values while initializing experiments. In
`--interactive` mode, prompt won't set default value and all the values for
the prompt needs to be explicitly provided, or omitted.
- `-n <stage>`, `--name <stage>` - specify a custom name for the stage generated
by this command. The default is `train`. It can only contain letters, numbers,
dash `-` and underscore `_` (same as `dvc stage add --name`).

- `--code` - override the a path to your source file or directory which your
experiment depends on. The default is `src` directory for your code.
- `--run` - automatically run the experiment after creating the stage (same as
`dvc exp run`).

- `--data` - override the path to your data file or directory to track, which
your experiment depends on. The default is `data` directory.
- `--type` - selects the type of the stage to create. Currently it provides two
alternatives: `dl` and `default` (no need to specify this one).

- `--params` - override the path to
[parameter dependencies](/doc/command-reference/params) which your experiment
depends on. The default parameters file name is `params.yaml`. Note that
`dvc exp init` may fail if the parameters file does not exist at the time of
the invocation, as DVC reads the file to find parameters to track for the
stage.
`dl` stages are intended for use in deep-learning scenarios, where metrics and
plots are tracked with [DVCLive](/doc/dvclive). This also supports logging
[checkpoints](/doc/command-reference/exp/run#checkpoints) during the training
of DL models.

- `--model` - override the path to your models file or directory to track, which
your experiment produces. `dvc exp init` assumes `models` directory by
default.
- `--code` - set the path to the file or directory where the source code that
your experiment depends on can be found (if any). Overrides other
configuration and default value (`src/`).

- `--metrics` - override the path to metrics file to track, which your
experiment produces. Default is `metrics.json` file.
- `--params` - set the path to the file or directory where the
</abbr>parameters</abbr> that your experiment depends on can be found.
Overrides other configuration and default value (`params.yaml`).

- `--plots` - override the path to plots file or directory, which your
experiment produces. The default is `plots`.
> Note that `dvc exp init` will fail if the params file does not exist. This
> is because DVC reads it to find params to include in the [stage definition].
- `--live` - override the directory `path` for [DVCLive](/doc/dvclive), which
your experiment will write logs to. The default is `dvclive` directory, which
only comes to effect when used with `--type=dl`.
- `--data` - set the path to the data file or directory that your experiment
depends on can be found (if any). Overrides other configuration and default
value (`data/`).

- `--type` - selects the type of the stage to create. Currently it provides two
different kinds of stages: `default` and `dl`. If unspecified, `default` stage
is created.
- `--model` - set the path to the file or directory where the model(s) produced
by your experiment can be found (if any). Overrides other configuration and
default value (`models/`).

`default` stage creates a stage with `metrics` and `plots` tracked by DVC
itself, and does not track live-created artifacts (unless explicitly
specified).
> 💡 This could be used for any artifacts produced by your experiment.
`dl` stage is intended for use in deep-learning scenarios, where metrics and
plots are tracked by [dvclive](/doc/dvclive) and supports tracking progress
while training a deep-learning model with
[checkpoints](/doc/command-reference/exp/run#checkpoints).
- `--metrics` - set the path to the file or directory where the metrics produced
by your experiment can be found (if any). Overrides other configuration and
default value (`metrics.json`).

- `-n <stage>`, `--name <stage>` - specify a custom name for the stage generated
by this command (e.g. `-n train`). The default is `train`.
- `--plots` - set the path to the file or directory where the plots produced by
your experiment can be found (if any). Overrides other configuration and
default value (`plots/`).

Note that the stage name can only contain letters, numbers, dash `-` and
underscore `_`.
- `--live` - configure the `path` directory for [DVCLive](/doc/dvclive). This is
where experiment logs will be written. Overrides other configuration and
default value (`dvclive/`).

- `-f`, `--force` - overwrite an existing stage in `dvc.yaml` file without
asking for confirmation.
> This only has an effect when used with `--type=dl`.
- `--run` - runs the experiment after initializing it.
- `--explicit` - do not assume default locations of project dependencies and
outputs. You'll have to provide specific locations via other options or
`dvc config exp`. In `--interactive` this removes default values from prompts.

- `-f`, `--force` - overwrite an existing stage in `dvc.yaml` file without
asking for confirmation (same as `dvc stage add --force`).

- `-h`, `--help` - prints the usage/help message, and exit.

- `-q`, `--quiet` - do not write anything to standard output. Exit with 0 if no
problems arise, otherwise 1.

- `-v`, `--verbose` - displays detailed tracing information.

## Example: interactive mode

Let's prepare an ML model training script to start running experiments on it.
The easiest route is using interactive mode and answering a few questions:

```dvc
$ dvc exp init --interactive
This command will guide you to set up a train stage in dvc.yaml...
Command to execute: python src/train.py
Enter the paths for dependencies and outputs of the command.
DVC assumes the following workspace structure:
├── data
├── metrics.json
├── models
├── params.yaml
├── plots
└── src
Path to a code file/directory [src, n to omit]: src/train.py
Path to a data file/directory [data, n to omit]: data/features
Path to a model file/directory [models, n to omit]: models/predict.h5
Path to a parameters file [params.yaml, n to omit]:
Path to a metrics file [metrics.json, n to omit]:
Path to a plots file/directory [plots, n to omit]: n
...
```

In this example the code, data, and model locations were specified above to
avoid using the defaults (which are too broad). `params.yaml` and `metrics.json`
are accepted (pressed Enter) for <abbr>parameters</abbr> and
<abbr>metrics</abbr>. Plots are omitted (entered `n`) as none will be written.

The resulting `dvc.yaml` file codifies the meta-information you provided in
DVC's format:

```yaml
train:
cmd: python src/train.py
deps:
- data/features
- src/train.py
params:
- epochs
outs:
- models/predict.h5
metrics:
- metrics.json:
cache: false
```
> Notes:
>
> - `train` is the default stage name unless you provide one with the `--name`
> option.
> - The `epochs` param was obtained from the `params.yaml` file. Any other param
> keys found there would all be listed under `params:` automatically.

The next step would be to tune `params.yaml` or improve `src/train.py` directly,
and start [running experiments](/doc/command-reference/exp/run).
16 changes: 8 additions & 8 deletions content/docs/sidebar.json
Original file line number Diff line number Diff line change
Expand Up @@ -254,6 +254,10 @@
"slug": "exp",
"source": "exp/index.md",
"children": [
{
"label": "exp init",
"slug": "init"
},
{
"label": "exp run",
"slug": "run"
Expand All @@ -262,14 +266,14 @@
"label": "exp show",
"slug": "show"
},
{
"label": "exp init",
"slug": "init"
},
{
"label": "exp diff",
"slug": "diff"
},
{
"label": "exp list",
"slug": "list"
},
{
"label": "exp apply",
"slug": "apply"
Expand All @@ -293,10 +297,6 @@
{
"label": "exp pull",
"slug": "pull"
},
{
"label": "exp list",
"slug": "list"
}
]
},
Expand Down
6 changes: 4 additions & 2 deletions content/docs/user-guide/project-structure/pipelines-files.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,8 +20,8 @@ so you may modify, write, or generate stages and pipelines on your own.
## Stages

The `stages` list contains a list of user-defined stages. Here's a simple one
named `transpose`:
The list of `stages` contains one or more user-defined stages. Here's a simple
one named `transpose`:

```yaml
stages:
Expand All @@ -33,6 +33,8 @@ stages:
- columns.txt
```
> See also `dvc stage add`, a helper command to write stages in `dvc.yaml`.

The most important part of a stage it's the terminal command(s) it executes
(`cmd` field). This is what DVC runs when the stage is reproduced (see
`dvc repro`).
Expand Down

0 comments on commit 33a303c

Please sign in to comment.