Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ref: misc. improvements around "stage" concept #3235

Merged
merged 4 commits into from
Jan 29, 2022
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions content/docs/command-reference/freeze.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# freeze

Freeze [stages](/doc/command-reference/stage) until `dvc unfreeze` is used on
Freeze [stages](/doc/command-reference/run) until `dvc unfreeze` is used on
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are for consistency (only place where stage ref is linked insted of run) for now. Wil be properly addressed in #2883.

them. Frozen stages are never executed by `dvc repro`.

## Synopsis
Expand All @@ -14,7 +14,7 @@ positional arguments:

## Description

`dvc freeze` causes the [stages](/doc/command-reference/stage) indicated as
`dvc freeze` causes the [stages](/doc/command-reference/run) indicated as
`targets` to be considered _not changed_ by `dvc status` and `dvc repro`. Stage
reproduction will not regenerate <abbr>outputs</abbr> of frozen stages, even if
their <abbr>dependencies</abbr> have changed, and even if `--force` is used.
Expand Down
13 changes: 8 additions & 5 deletions content/docs/command-reference/import-url.md
Original file line number Diff line number Diff line change
Expand Up @@ -100,9 +100,12 @@ DVC supports several types of external locations (protocols):
necessary to track if the specified URL changed.

Another way to understand the `dvc import-url` command is as a shortcut for
generating a pipeline stage with and external dependency. This is discussed in
the [External Dependencies](/doc/user-guide/external-dependencies)
documentation, where an alternative is demonstrated for each of these schemes.
generating a pipeline [stage](/doc/command-reference/run) with and external
jorgeorpinel marked this conversation as resolved.
Show resolved Hide resolved
dependency.

> This is discussed in the
> [External Dependencies](/doc/user-guide/external-dependencies) documentation,
> where an alternative is demonstrated for each of these schemes.

Instead of:

Expand All @@ -121,8 +124,8 @@ $ dvc stage add -n download_data \
$ dvc repro
```

`dvc import-url` generates an _import `.dvc` file_ and `dvc stage add` a regular
stage (in `dvc.yaml`).
`dvc import-url` generates an _import `.dvc` file_ while `dvc stage add`
produces a regular stage in `dvc.yaml`.

## Options

Expand Down
24 changes: 11 additions & 13 deletions content/docs/command-reference/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,16 +10,14 @@ does not change directories in your terminal).

## Typical DVC workflow

- In an existing Git repository, initialize a <abbr>DVC project</abbr> with
`dvc init`.
- Copy data files or dataset directories for modeling into the repository, and
track them with DVC using the `dvc add` command.
- Process the data with your own source code, using `dvc.yaml` and/or the
`dvc stage add` command to specify further <abbr>outputs</abbr> that should
Comment on lines 11 to -18
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ended up rewriting this whole section 😅

also be tracked by DVC, and executing the code using `dvc repro`.
- Sharing a <abbr>DVC repository</abbr> with the codified data
[pipeline](/doc/command-reference/dag) will not include the project's
<abbr>cache</abbr>. Use [remote storage](/doc/command-reference/remote) and
`dvc push` to share this cache (data tracked by DVC).
- Use `dvc repro` to automatically reproduce your full pipeline iteratively as
input data or source code change.
- Initialize a <abbr>DVC project</abbr> in a Git repo with `dvc init`.
- Copy data files or dataset directories for modeling into the project and use
`dvc add` to tell DVC to <abbr>cache</abbr> and track them.
- Create a simple `dvc.yaml` file to codify a data processing
[pipeline](/doc/command-reference/dag). It uses your own source code and
specifies further data <abbr>outputs</abbr> for DVC to control.
- Execute or restore any version of your pipeline using `dvc repro`, or
experiment on it with `dvc exp` features.
- Sharing the <abbr>repository</abbr> will not include locally cached data. Use
[remote storage](/doc/command-reference/remote) with `dvc push` and `dvc pull`
to share data artifacts.
2 changes: 1 addition & 1 deletion content/docs/user-guide/experiment-management/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -79,7 +79,7 @@ until your experiments are made [persistent].

Every time you [reproduce](/doc/command-reference/repro) a pipeline with DVC, it
logs the unique signature of each stage run (in `.dvc/cache/runs` by default).
If it never happened before, the stage command(s) are executed normally. Every
If it never happened before, its command(s) are executed normally. Every
subsequent time a [stage](/doc/command-reference/run) runs under the same
conditions, the previous results can be restored instantly, without wasting time
or computing resources.
Expand Down