Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: add "machine learning pipeline" references, expand context in cmd ref, user guide [SEO] #1915

Merged
merged 4 commits into from
Nov 10, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 6 additions & 5 deletions content/docs/command-reference/dag.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,9 +20,9 @@ A data pipeline, in general, is a series of data processing
input and produce an <abbr>output</abbr>). A pipeline may produce intermediate
data, and has a final result.

Data processing or ML pipelines typically start with large raw datasets, include
intermediate featurization and training stages, and produce a final model, as
well as accuracy [metrics](/doc/command-reference/metrics).
Data science and machine learning pipelines typically start with large raw
datasets, include intermediate featurization and training stages, and produce a
final model, as well as accuracy [metrics](/doc/command-reference/metrics).

In DVC, pipeline stages and commands, their data I/O, interdependencies, and
results (intermediate or final) are specified in `dvc.yaml`, which can be
Expand Down Expand Up @@ -78,9 +78,10 @@ example in Bash, we could add the following line to `~/.bashrc`:
export DVC_PAGER=more
```

## Examples
## Example: Visualize a DVC Pipeline

Visualize DVC pipeline:
Visualize the prepare, featurize, train, and evaluate stages of a pipeline as
defined in `dvc.yaml`:

```dvc
$ dvc dag
Expand Down
7 changes: 4 additions & 3 deletions content/docs/command-reference/repro.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,9 @@ results.
> (either manually or by using `dvc run`) while initial data dependencies can be
> registered with `dvc add`.

To get hands-on experience with data science and machine learning pipelines, see
[Get Started: Data Pipelines](/doc/start/data-pipelines).

This command is similar to [Make](https://www.gnu.org/software/make/) in
software build automation, but DVC captures build requirements
([dependencies and outputs](/doc/command-reference/run#dependencies-and-outputs))
Expand Down Expand Up @@ -175,9 +178,7 @@ up-to-date and only execute the final stage.

## Examples

For simplicity, let's build a pipeline defined below. (If you want get your
hands-on something more real, see this short
[pipeline tutorial](/doc/start/data-pipelines)). It takes this `text.txt` file:
Let's build and reproduce a simple pipeline. It takes this `text.txt` file:

```
dvc
Expand Down
7 changes: 5 additions & 2 deletions content/docs/command-reference/run.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,8 +23,11 @@ positional arguments:

`dvc run` is a helper for creating or updating
[pipeline](/doc/command-reference/dag) stages in a `dvc.yaml` file (located in
the current working directory). _Stages_ represent individual data processes,
including their input and resulting outputs.
the current working directory).

_Stages_ represent individual data processes, including their input and
resulting outputs. They can be combined to capture simple data workflows,
organize data science projects, or build detailed machine learning pipelines.

A stage name is required and can be provided using the `-n` (`--name`) option.
The other available [options](#options) are mostly meant to describe different
Expand Down
4 changes: 2 additions & 2 deletions content/docs/user-guide/dvc-files-and-directories.md
Original file line number Diff line number Diff line change
Expand Up @@ -100,8 +100,8 @@ and `dvc commit` commands, but not when a `.dvc` file is overwritten by

## `dvc.yaml` file

`dvc.yaml` files describe data pipelines, similar to how
[Makefiles](https://www.gnu.org/software/make/manual/make.html#Introduction)
`dvc.yaml` files describe data science or machine learning pipelines, similar to
how [Makefiles](https://www.gnu.org/software/make/manual/make.html#Introduction)
work for building software. Its YAML structure contains a list of stages which
can be written manually or generated by user code.

Expand Down