From 4a110afc5e7f84fed082afe235811a4e9c36d2ad Mon Sep 17 00:00:00 2001
From: jeremydesroches <18587991+jeremydesroches@users.noreply.github.com>
Date: Sun, 8 Nov 2020 22:21:04 -0700
Subject: [PATCH 1/4] Add "machine learning pipeline" references, outline of
changes.
---
content/docs/command-reference/dag.md | 7 ++++---
content/docs/command-reference/repro.md | 8 +++++---
content/docs/command-reference/run.md | 7 +++++--
content/docs/user-guide/dvc-files-and-directories.md | 3 ++-
4 files changed, 16 insertions(+), 9 deletions(-)
diff --git a/content/docs/command-reference/dag.md b/content/docs/command-reference/dag.md
index 23da655edf..94ec019d17 100644
--- a/content/docs/command-reference/dag.md
+++ b/content/docs/command-reference/dag.md
@@ -20,7 +20,7 @@ A data pipeline, in general, is a series of data processing
input and produce an output). A pipeline may produce intermediate
data, and has a final result.
-Data processing or ML pipelines typically start with large raw datasets, include
+Data science and ML pipelines typically start with large raw datasets, include
intermediate featurization and training stages, and produce a final model, as
well as accuracy [metrics](/doc/command-reference/metrics).
@@ -78,9 +78,10 @@ example in Bash, we could add the following line to `~/.bashrc`:
export DVC_PAGER=more
```
-## Examples
+## Example: Visualize a DVC Pipeline
-Visualize DVC pipeline:
+Visualize the prepare, featurize, train, and evaluate stages of a machine
+learning pipeline as defined in `dvc.yaml`:
```dvc
$ dvc dag
diff --git a/content/docs/command-reference/repro.md b/content/docs/command-reference/repro.md
index 6ddf7788a4..15f32adbb7 100644
--- a/content/docs/command-reference/repro.md
+++ b/content/docs/command-reference/repro.md
@@ -175,9 +175,11 @@ up-to-date and only execute the final stage.
## Examples
-For simplicity, let's build a pipeline defined below. (If you want get your
-hands-on something more real, see this short
-[pipeline tutorial](/doc/start/data-pipelines)). It takes this `text.txt` file:
+To get hands-on experience with data science and machine learning pipelines, see
+[Get Started: Data Pipelines](/doc/start/data-pipelines).
+
+To demonstrate `dvc repro`, let's build and reproduce the simple pipeline below.
+It takes this `text.txt` file:
```
dvc
diff --git a/content/docs/command-reference/run.md b/content/docs/command-reference/run.md
index e9e19633cf..5ea3e2b8bc 100644
--- a/content/docs/command-reference/run.md
+++ b/content/docs/command-reference/run.md
@@ -23,8 +23,11 @@ positional arguments:
`dvc run` is a helper for creating or updating
[pipeline](/doc/command-reference/dag) stages in a `dvc.yaml` file (located in
-the current working directory). _Stages_ represent individual data processes,
-including their input and resulting outputs.
+the current working directory).
+
+_Stages_ represent individual data processes, including their input and
+resulting outputs. Combine stages to capture simple data workflows, organize
+data science projects, or build detailed machine learning pipelines.
A stage name is required and can be provided using the `-n` (`--name`) option.
The other available [options](#options) are mostly meant to describe different
diff --git a/content/docs/user-guide/dvc-files-and-directories.md b/content/docs/user-guide/dvc-files-and-directories.md
index 09e2ff8ee7..ba366b7d22 100644
--- a/content/docs/user-guide/dvc-files-and-directories.md
+++ b/content/docs/user-guide/dvc-files-and-directories.md
@@ -109,7 +109,8 @@ can be written manually or generated by user code.
> `dvc.yaml`. Additionally, a `dvc.lock` file is also created or updated by
> `dvc run` and `dvc repro`, to record the pipeline state.
-Here's a comprehensive `dvc.yaml` example:
+Here's a comprehensive example of a machine learning pipeline, described in
+`dvc.yaml`:
```yaml
stages:
From aacd352aeeb22576aac7f941b66f5319ddfd3bc1 Mon Sep 17 00:00:00 2001
From: jeremydesroches <18587991+jeremydesroches@users.noreply.github.com>
Date: Mon, 9 Nov 2020 22:24:36 -0700
Subject: [PATCH 2/4] Update content/docs/command-reference/repro.md
Co-authored-by: Jorge Orpinel
---
content/docs/command-reference/repro.md | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/content/docs/command-reference/repro.md b/content/docs/command-reference/repro.md
index 15f32adbb7..c4b84ecd8b 100644
--- a/content/docs/command-reference/repro.md
+++ b/content/docs/command-reference/repro.md
@@ -178,7 +178,7 @@ up-to-date and only execute the final stage.
To get hands-on experience with data science and machine learning pipelines, see
[Get Started: Data Pipelines](/doc/start/data-pipelines).
-To demonstrate `dvc repro`, let's build and reproduce the simple pipeline below.
+Let's build and reproduce a simple pipeline.
It takes this `text.txt` file:
```
From f273885765665637f619abde3118edd346d46f4f Mon Sep 17 00:00:00 2001
From: jeremydesroches <18587991+jeremydesroches@users.noreply.github.com>
Date: Mon, 9 Nov 2020 22:33:00 -0700
Subject: [PATCH 3/4] Update content/docs/command-reference/run.md
Co-authored-by: Jorge Orpinel
---
content/docs/command-reference/run.md | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/content/docs/command-reference/run.md b/content/docs/command-reference/run.md
index 5ea3e2b8bc..8efa5ae8c1 100644
--- a/content/docs/command-reference/run.md
+++ b/content/docs/command-reference/run.md
@@ -26,7 +26,7 @@ positional arguments:
the current working directory).
_Stages_ represent individual data processes, including their input and
-resulting outputs. Combine stages to capture simple data workflows, organize
+resulting outputs. They can be combined to capture simple data workflows, organize
data science projects, or build detailed machine learning pipelines.
A stage name is required and can be provided using the `-n` (`--name`) option.
From f5b5a9a5e4bd5648a16f4d87bba7afed0abf9ed3 Mon Sep 17 00:00:00 2001
From: jeremydesroches <18587991+jeremydesroches@users.noreply.github.com>
Date: Mon, 9 Nov 2020 23:00:24 -0700
Subject: [PATCH 4/4] Updated placements and reverted edits based on feedback.
---
content/docs/command-reference/dag.md | 10 +++++-----
content/docs/command-reference/repro.md | 9 ++++-----
content/docs/command-reference/run.md | 4 ++--
content/docs/user-guide/dvc-files-and-directories.md | 7 +++----
4 files changed, 14 insertions(+), 16 deletions(-)
diff --git a/content/docs/command-reference/dag.md b/content/docs/command-reference/dag.md
index 94ec019d17..2fb7e9c5b4 100644
--- a/content/docs/command-reference/dag.md
+++ b/content/docs/command-reference/dag.md
@@ -20,9 +20,9 @@ A data pipeline, in general, is a series of data processing
input and produce an output). A pipeline may produce intermediate
data, and has a final result.
-Data science and ML pipelines typically start with large raw datasets, include
-intermediate featurization and training stages, and produce a final model, as
-well as accuracy [metrics](/doc/command-reference/metrics).
+Data science and machine learning pipelines typically start with large raw
+datasets, include intermediate featurization and training stages, and produce a
+final model, as well as accuracy [metrics](/doc/command-reference/metrics).
In DVC, pipeline stages and commands, their data I/O, interdependencies, and
results (intermediate or final) are specified in `dvc.yaml`, which can be
@@ -80,8 +80,8 @@ export DVC_PAGER=more
## Example: Visualize a DVC Pipeline
-Visualize the prepare, featurize, train, and evaluate stages of a machine
-learning pipeline as defined in `dvc.yaml`:
+Visualize the prepare, featurize, train, and evaluate stages of a pipeline as
+defined in `dvc.yaml`:
```dvc
$ dvc dag
diff --git a/content/docs/command-reference/repro.md b/content/docs/command-reference/repro.md
index c4b84ecd8b..27eb2c6651 100644
--- a/content/docs/command-reference/repro.md
+++ b/content/docs/command-reference/repro.md
@@ -30,6 +30,9 @@ results.
> (either manually or by using `dvc run`) while initial data dependencies can be
> registered with `dvc add`.
+To get hands-on experience with data science and machine learning pipelines, see
+[Get Started: Data Pipelines](/doc/start/data-pipelines).
+
This command is similar to [Make](https://www.gnu.org/software/make/) in
software build automation, but DVC captures build requirements
([dependencies and outputs](/doc/command-reference/run#dependencies-and-outputs))
@@ -175,11 +178,7 @@ up-to-date and only execute the final stage.
## Examples
-To get hands-on experience with data science and machine learning pipelines, see
-[Get Started: Data Pipelines](/doc/start/data-pipelines).
-
-Let's build and reproduce a simple pipeline.
-It takes this `text.txt` file:
+Let's build and reproduce a simple pipeline. It takes this `text.txt` file:
```
dvc
diff --git a/content/docs/command-reference/run.md b/content/docs/command-reference/run.md
index 8efa5ae8c1..b897b2a5db 100644
--- a/content/docs/command-reference/run.md
+++ b/content/docs/command-reference/run.md
@@ -26,8 +26,8 @@ positional arguments:
the current working directory).
_Stages_ represent individual data processes, including their input and
-resulting outputs. They can be combined to capture simple data workflows, organize
-data science projects, or build detailed machine learning pipelines.
+resulting outputs. They can be combined to capture simple data workflows,
+organize data science projects, or build detailed machine learning pipelines.
A stage name is required and can be provided using the `-n` (`--name`) option.
The other available [options](#options) are mostly meant to describe different
diff --git a/content/docs/user-guide/dvc-files-and-directories.md b/content/docs/user-guide/dvc-files-and-directories.md
index ba366b7d22..ea235edacb 100644
--- a/content/docs/user-guide/dvc-files-and-directories.md
+++ b/content/docs/user-guide/dvc-files-and-directories.md
@@ -100,8 +100,8 @@ and `dvc commit` commands, but not when a `.dvc` file is overwritten by
## `dvc.yaml` file
-`dvc.yaml` files describe data pipelines, similar to how
-[Makefiles](https://www.gnu.org/software/make/manual/make.html#Introduction)
+`dvc.yaml` files describe data science or machine learning pipelines, similar to
+how [Makefiles](https://www.gnu.org/software/make/manual/make.html#Introduction)
work for building software. Its YAML structure contains a list of stages which
can be written manually or generated by user code.
@@ -109,8 +109,7 @@ can be written manually or generated by user code.
> `dvc.yaml`. Additionally, a `dvc.lock` file is also created or updated by
> `dvc run` and `dvc repro`, to record the pipeline state.
-Here's a comprehensive example of a machine learning pipeline, described in
-`dvc.yaml`:
+Here's a comprehensive `dvc.yaml` example:
```yaml
stages: