Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ref: exp init improvements #3071

Merged
merged 18 commits into from
Dec 23, 2021
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
77 changes: 46 additions & 31 deletions content/docs/command-reference/exp/init.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
# exp init

Codify project using [DVC metafiles](/doc/user-guide/project-structure) to run
jorgeorpinel marked this conversation as resolved.
Show resolved Hide resolved
Codify an existing project using
[DVC metafiles](/doc/user-guide/project-structure) to run
[experiments](/doc/user-guide/experiment-management).

## Synopsis
Expand All @@ -17,37 +18,52 @@ usage: dvc exp init [-h] [-q | -v] [--run] [--interactive] [-f]
## Description
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Somewhere in the description, it might be useful to explain that dvc exp init by default expects that input data, parameters, and source code paths exist before running an experiment, and that the command is expected to generate models, metrics, and plots.

Copy link
Contributor Author

@jorgeorpinel jorgeorpinel Dec 21, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's more or a problem for exp run though, @dberenbaum (already linked form the --run option). But should we state that exp run (and repro for that matter) expect that the stage definition and code are good? Hopefully its evident.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's more or a problem for exp run though

Well that's the point of exp init, right? Better to have users understand what's needed up front than to have them run exp init only to fail on exp run.

Doesn't need to be part of this PR. It could also be handled by some of the suggested changes to the core command rather than the docs.

Copy link
Contributor Author

@jorgeorpinel jorgeorpinel Dec 22, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better to have users understand what's needed up front than to have them run exp init only to fail on exp run.

I'm not sure. If users need to read that on the cmd ref., is exp init serving as a self-explanatory command? I think those notes should be in the command's output fist (and added to this doc as a result of that product update).


`dvc exp init` helps you quickly get started with experiments. It reduces
boilerplate for initializing [pipeline](/doc/command-reference/dag) stages in a
`dvc.yaml` file by assuming defaults about the location of your data,
[parameters](/doc/command-reference/params), source code, models,
[metrics](/doc/command-reference/metrics) and
[plots](/doc/command-reference/plots), which can be customized through config.
boilerplate DVC procedures by creating a `dvc.yaml` file that assumes default
location of your input data, <abbr>parameters</abbr>, source code, models,
<abbr>metrics</abbr> and [plots](/doc/command-reference/plots). These locations
can be customized through the options of this command or via config files.
Standard repository structure:

It also offers guided `--interactive` mode for creating a stage to be
[`exp run`](/doc/command-reference/exp/run) later. `dvc exp init` supports
creating different types of stages, eg: `dl` if you are doing deep learning,
which uses [dvclive](/doc/dvclive) to monitor and checkpoint progress during
training of machine learning models.
```
├── data/
├── metrics.json
├── models/
├── params.yaml # required
├── plots/
└── src/
```

> Note that `params.yaml` is the only required file (see `dvc params`).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

side note ... for the sake of my education: why is it required?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To parse the parameters and put them into dvc.yaml.

Copy link
Contributor Author

@jorgeorpinel jorgeorpinel Dec 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah but even if you don't have params an empty file is still required.

Related to iterative/dvc#6446 (comment)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's also related to iterative/dvc#6605. The params section of dvc.yaml currently expects keys from a file (see below), not a filename itself, so all the parameters need to be defined before creating the stage. If we want to stop requiring this file at the time of dvc exp init, we need to change params to accept filenames.

stages:
  train:
    cmd: python train.py
    deps:
    - data
    - src
    params:
    - params.py:
      - batch_size
      - data_path
      - epochs
      - latent_dim
      - num_samples

Copy link
Contributor Author

@jorgeorpinel jorgeorpinel Dec 11, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about generating a dvc.yaml without params if there's no parmas.yaml file? This is a core discussion though! Moved to iterative/dvc#6446 (comment)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I misunderstood you, and now I think it's actually more of a docs issue 🤔 . It is possible to use dvc exp init without params. You can either use --explicit and not provide --params, or you can use dvc exp init -i and type n at the params prompt.

The note here is to indicate that if you do use either the default or some other params path, then that path must exist and be parseable to extract the parameters for the stage. So the current note probably needs to be reworded.

Copy link
Contributor Author

@jorgeorpinel jorgeorpinel Dec 14, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

UPDATE: Looks like the discussion about empty params.yaml files is in a few core repo tickets now (e.g. iterative/dvc#7138) so I won't keep discussing here (still a bit confused though).

It is possible to use dvc exp init without params...
So the current note probably needs to be reworded.

Clarified in 2273431.


To use this feature, provide a `command` argument or use the `--interactive`
jorgeorpinel marked this conversation as resolved.
Show resolved Hide resolved
(`-i`) mode and answer a few prompts (most of them optional). This wraps the
command that runs your experiment(s) as a <abbr>stage</abbr> that `dvc exp run`
can execute.

Different types of stages are supported, such as `dl` (deep learning) which uses
[DVSLive](/doc/dvclive) to monitor [checkpoints] during training of ML models.

This command is intended to be a quick way to start running experiments. To
create more complex stages and pipeliens, use `dvc stage add`.
> This command is intended as a quick way to start running experiments. To
> codify complex data pipelines, see the `dvc.yaml` specification.

[checkpoints]: /doc/user-guide/experiment-management/checkpoints

### The `command` argument

The `command` argument is optional, if you are using `--interactive` mode. The
Comment on lines 59 to -45
Copy link
Contributor Author

@jorgeorpinel jorgeorpinel Dec 10, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW a question I have is whether to keep this section or move command under Options like we did for targets in https://dvc.org/doc/command-reference/repro#options. I personally like the section more but I remember we discussed it (cc @shcheklein) this and using Options was picked, so for consistency I'd move this under Options as well.

From #3015 (review)

`command` sent to `dvc exp init` can be anything your terminal would accept and
run directly, for example a shell built-in, expression, or binary found in
`PATH`. Please remember that any flags sent after the `command` are interpreted
by the command itself, not by `dvc exp init`.
The `command` given to `dvc exp init` can be anything your system terminal would
accept and run directly, for example a shell built-in, an expression, or a
binary found in `PATH`. Please note that any flags sent after the `command` will
typically become part of the command itself and ignored by `dvc exp init` (so
put the command last).

⚠️ While DVC is platform-agnostic, the commands defined in your
[pipeline](/doc/command-reference/dag) stages may only work on some operating
systems and require certain software packages to be installed.
systems and require certain software packages or libraries.

Wrap the command with double quotes `"` if there are special characters in it
like `|` (pipe) or `<`, `>` (redirection), otherwise they would apply to
`dvc exp init` itself. Use single quotes `'` instead if there are environment
variables in it that should be evaluated dynamically. Examples:
Surround the command with double quotes `"` if it includes special characters
like `|` or `<`, `>` -- otherwise they would apply to `dvc exp init` itself. Use
single quotes `'` instead if there are environment variables in it that should
be evaluated dynamically.

```dvc
$ dvc exp init "./a_script.sh > /dev/null 2>&1"
Expand All @@ -56,10 +72,13 @@ $ dvc exp init './another_script.sh $MYENVVAR'

## Options

- `-i`, `--interactive` - prompts user for the command to execute and different
paths for tracking outputs and dependencies, unless they are provided through
arguments explicitly. Interactive mode allows users to set those locations
from default values or omit them.
- `-i`, `--interactive` - prompts user for the `command` to execute and for the
jorgeorpinel marked this conversation as resolved.
Show resolved Hide resolved
different paths where dependencies and outputs can be found, unless they are
provided through arguments explicitly. Interactive mode allows users to set
those locations from default values or omit them.

- `--type` - selects the type of the stage to create. Currently it provides two
alternatives: `dl` and `default` (no need to specify this one).

- `--explicit` - `dvc exp init` assumes default location of your outputs and
dependencies (which can be overriden from the config). By using `--explicit`,
Expand Down Expand Up @@ -94,10 +113,6 @@ $ dvc exp init './another_script.sh $MYENVVAR'
your experiment will write logs to. The default is `dvclive` directory, which
only comes to effect when used with `--type=dl`.

- `--type` - selects the type of the stage to create. Currently it provides two
different kinds of stages: `default` and `dl`. If unspecified, `default` stage
is created.

`default` stage creates a stage with `metrics` and `plots` tracked by DVC
itself, and does not track live-created artifacts (unless explicitly
specified).
Expand Down
8 changes: 4 additions & 4 deletions content/docs/sidebar.json
Original file line number Diff line number Diff line change
Expand Up @@ -252,6 +252,10 @@
"slug": "exp",
"source": "exp/index.md",
"children": [
{
"label": "exp init",
"slug": "init"
},
{
"label": "exp run",
"slug": "run"
Expand All @@ -260,10 +264,6 @@
"label": "exp show",
"slug": "show"
},
{
"label": "exp init",
"slug": "init"
},
{
"label": "exp diff",
"slug": "diff"
Expand Down
6 changes: 4 additions & 2 deletions content/docs/user-guide/project-structure/pipelines-files.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,8 +20,8 @@ so you may modify, write, or generate stages and pipelines on your own.

## Stages

The `stages` list contains a list of user-defined stages. Here's a simple one
named `transpose`:
The list of `stages` contains one or more user-defined stages. Here's a simple
one named `transpose`:

```yaml
stages:
Expand All @@ -33,6 +33,8 @@ stages:
- columns.txt
```

> See also `dvc stage add`, a helper command to write stages in `dvc.yaml`.

The most important part of a stage it's the terminal command(s) it executes
(`cmd` field). This is what DVC runs when the stage is reproduced (see
`dvc repro`).
Expand Down