diff --git a/content/docs/start/data-pipelines.md b/content/docs/start/data-pipelines.md
index 3d0b22198a..03ead92b40 100644
--- a/content/docs/start/data-pipelines.md
+++ b/content/docs/start/data-pipelines.md
@@ -6,21 +6,21 @@ version, and reproduce your data science and machine learning workflows.'
# Get Started: Data Pipelines
->> ⁉️ It may be worthwhile to start with the question: "Why pipelines?"
+> > ⁉️ It may be worthwhile to start with the question: "Why pipelines?"
Versioning large data files and directories for data science is great, but not
enough. How is data filtered, transformed, or used to train ML models? DVC
introduces a mechanism to capture _data pipelines_ — series of data processes
that produce a final result.
->> ⁉️ What is data process? Why do we tie "pipelines" with "code" or "data"
->> here? They are more general ideas, we can have a pipeline that downloads data
->> from a URL using `wget` and checks whether it has changed, for example. (like
->> `dvc get` or `dvc import-url`, but simpler.)
+> > ⁉️ What is data process? Why do we tie "pipelines" with "code" or "data"
+> > here? They are more general ideas, we can have a pipeline that downloads
+> > data from a URL using `wget` and checks whether it has changed, for example.
+> > (like `dvc get` or `dvc import-url`, but simpler.)
->> I see that we are introducing pipelines to an ML/DS audience, but the idea is
->> more general and I believe we can tell this here. It's also possible to tell
->> this within ML/DS context in broader terms.
+> > I see that we are introducing pipelines to an ML/DS audience, but the idea
+> > is more general and I believe we can tell this here. It's also possible to
+> > tell this within ML/DS context in broader terms.
DVC pipelines and their data can also be easily versioned (using Git). This
allows you to better organize projects, and reproduce your workflow and results
@@ -30,38 +30,38 @@ machine learning pipeline.
[etl]: https://en.wikipedia.org/wiki/Extract,_transform,_load
->> We need a figure here.
+> > We need a figure here.
Watch and learn, or follow along with the code example below!
https://youtu.be/71IGzyH95UY
->> ✍️ DVC has features to handle pipelines easily. You can create stages
->> associated with commands, code, data and (hyper)parameters. It can run the
->> commands, and cache the outputs. DVC handles relationships
->> between these stages, so when these associated
->> elements change, the stage is invalidated and run. If no dependencies are
->> changed, it can report this and reuse the cached results.
+> > ✍️ DVC has features to handle pipelines easily. You can create stages
+> > associated with commands, code, data and (hyper)parameters. It can run the
+> > commands, and cache the outputs. DVC handles relationships between these
+> > stages, so when these associated elements change, the stage is invalidated
+> > and run. If no dependencies are changed, it can report this and reuse the
+> > cached results.
Use `dvc stage add` to create _stages_. These represent processes (source code
tracked with Git) which form the steps of a _pipeline_.
->> ⁉️ Adding _data process_ to the concepts doesn't seem to serve well. Instead
->> we can continue like: "Stages represent commands to run, along with their
->> dependencies like data and code files, and outputs like model and plot files."
+> > ⁉️ Adding _data process_ to the concepts doesn't seem to serve well. Instead
+> > we can continue like: "Stages represent commands to run, along with their
+> > dependencies like data and code files, and outputs like model and plot
+> > files."
->> ⁉️ I believe we don't need the following sentence if we write as the previous
->> one.
+> > ⁉️ I believe we don't need the following sentence if we write as the
+> > previous one.
-Stages also connect code
-to its corresponding data _input_ and _output_.
+Stages also connect code to its corresponding data _input_ and _output_.
### ⚙️ Expand to download example code.
->> ⁉️ I think it might be easier to grasp the concept if we use a simpler
->> pipeline with 3 stages, with not many parameters, metrics and such.
+> > ⁉️ I think it might be easier to grasp the concept if we use a simpler
+> > pipeline with 3 stages, with not many parameters, metrics and such.
Get the sample code like this:
@@ -94,8 +94,7 @@ Please also add or commit the source code directory with Git at this point.
-
->> ⁉️ The first stage we create may be a simpler one.
+> > ⁉️ The first stage we create may be a simpler one.
```dvc
$ dvc stage add -n prepare \
@@ -105,7 +104,7 @@ $ dvc stage add -n prepare \
python src/prepare.py data/data.xml
```
->> ⁉️ We can move `dvc.yaml` discussion in a hidden section.
+> > ⁉️ We can move `dvc.yaml` discussion in a hidden section.
A `dvc.yaml` file is generated. It includes information about the command we run
(`python src/prepare.py data/data.xml`), its dependencies, and
@@ -115,10 +114,10 @@ A `dvc.yaml` file is generated. It includes information about the command we run
### 💡 Expand to see what happens under the hood.
->> ⁉️ I think, the short descriptions of options can be in the main text instead
->> of `dvc.yaml` above. Also, the project should contain a simple pipeline that
->> starts with `-d` and `-o`, then add `-p`, `-m` to the mix in a later stage.
->> The first example of `stage add` is too complex here.
+> > ⁉️ I think, the short descriptions of options can be in the main text
+> > instead of `dvc.yaml` above. Also, the project should contain a simple
+> > pipeline that starts with `-d` and `-o`, then add `-p`, `-m` to the mix in a
+> > later stage. The first example of `stage add` is too complex here.
The command options used above mean the following:
@@ -181,39 +180,39 @@ stages:
->> ⁉️ The following information can also be hidden, or deleted. We assume this
->> GS trail will be standalone, no need to mention `dvc add` here.
+> > ⁉️ The following information can also be hidden, or deleted. We assume this
+> > GS trail will be standalone, no need to mention `dvc add` here.
There's no need to use `dvc add` for DVC to track stage outputs (`data/prepared`
-in this case); `dvc stage add` and `dvc exp run` takes care of this. You only need
-to run `dvc push` if you want to save them to
+in this case); `dvc stage add` and `dvc exp run` takes care of this. You only
+need to run `dvc push` if you want to save them to
[remote storage](/doc/start/data-and-model-versioning#storing-and-sharing),
(usually along with `git commit` to version `dvc.yaml` itself).
->> ⁉️ Here, it may be more natural to tell the Run-Cache and `dvc push` as
->> pushing "pipeline artifacts" instead of "storing and sharing".
+> > ⁉️ Here, it may be more natural to tell the Run-Cache and `dvc push` as
+> > pushing "pipeline artifacts" instead of "storing and sharing".
->> `dvc push` can
->> push the individual stages, and their associated code and data, so you don't
->> have to re-run them in other machines.
+> > `dvc push` can push the individual stages, and their associated code and
+> > data, so you don't have to re-run them in other machines.
## Dependency graphs (DAGs)
-By using `dvc stage add` multiple times, and specifying outputs of a
-stage as dependencies of another one, we can describe a sequence of
-commands which gets to a desired result. This is what we call a _data pipeline_
-or [_dependency graph_](https://en.wikipedia.org/wiki/Directed_acyclic_graph).
+By using `dvc stage add` multiple times, and specifying outputs of
+a stage as dependencies of another one, we can describe a sequence
+of commands which gets to a desired result. This is what we call a _data
+pipeline_ or
+[_dependency graph_](https://en.wikipedia.org/wiki/Directed_acyclic_graph).
->> ⁉️ All pipelines are DAGs, but not all DAGs are pipelines, so these two are
->> not identical. DAG reference seems complicating, rather than simplifying to
->> me.
+> > ⁉️ All pipelines are DAGs, but not all DAGs are pipelines, so these two are
+> > not identical. DAG reference seems complicating, rather than simplifying to
+> > me.
Let's create a second stage chained to the outputs of `prepare`, to perform
feature extraction:
->> ⁉️ The second stage is almost identical with the first. It may be necessary
->> for the project, but pedagogically we're spending reader's attention here
->> unnecessarily here.
+> > ⁉️ The second stage is almost identical with the first. It may be necessary
+> > for the project, but pedagogically we're spending reader's attention here
+> > unnecessarily here.
```dvc
$ dvc stage add -n featurize \
@@ -227,7 +226,6 @@ The `dvc.yaml` file is updated automatically and should include two stages now.
-
### 💡 Expand to see what happens under the hood.
The changes to the `dvc.yaml` should look like this:
@@ -262,11 +260,11 @@ The changes to the `dvc.yaml` should look like this:
### ⚙️ Expand to add more stages.
->> ⁉️ Another pipeline from the same. The first three stages look almost
->> identical.
+> > ⁉️ Another pipeline from the same. The first three stages look almost
+> > identical.
-Let's add the training itself. Nothing new this time; just the same `dvc stage add`
-command with the same set of options:
+Let's add the training itself. Nothing new this time; just the same
+`dvc stage add` command with the same set of options:
```dvc
$ dvc stage add -n train \
@@ -276,8 +274,8 @@ $ dvc stage add -n train \
python src/train.py data/features model.pkl
```
->> ⁉️ The wording below is a bit _distrustful._ In case of an error, DVC should
->> report it.
+> > ⁉️ The wording below is a bit _distrustful._ In case of an error, DVC should
+> > report it.
Please check the `dvc.yaml` again, it should have one more stage now.
@@ -305,7 +303,7 @@ parameters for the training stage:
1. Open `params.yaml` and change `n_est` to `100`, and
2. (re)run `dvc exp run`.
->> Link to experiments trail here
+> > Link to experiments trail here
You should see:
@@ -321,8 +319,8 @@ the intermediate results are being reused.
Now, let's change it back to `50` and run `dvc exp run` again:
->> It looks these manual changes are a bit tedious. We can replace these with
->> code or data changes that can't be captured with `dvc exp run -S`
+> > It looks these manual changes are a bit tedious. We can replace these with
+> > code or data changes that can't be captured with `dvc exp run -S`
```dvc
$ dvc exp run
@@ -371,8 +369,8 @@ stages:
-DVC pipelines (`dvc.yaml` file, `dvc stage add`, and `dvc exp run` commands) solve a few
-important problems:
+DVC pipelines (`dvc.yaml` file, `dvc stage add`, and `dvc exp run` commands)
+solve a few important problems:
- _Automation_: run a sequence of steps in a "smart" way which makes iterating
on your project faster. DVC automatically determines which parts of a project