diff --git a/content/docs/start/data-management/data-pipelines.md b/content/docs/start/data-management/data-pipelines.md
index dd271b1b85..4bcf2b518b 100644
--- a/content/docs/start/data-management/data-pipelines.md
+++ b/content/docs/start/data-management/data-pipelines.md
@@ -14,38 +14,40 @@ https://youtu.be/71IGzyH95UY
 
 </details>
 
-Versioning large data files and directories for data science is great, but not
-enough. How is data filtered, transformed, or used to train ML models? DVC
-introduces a mechanism to capture _data pipelines_ — series of data processes
-that produce a final result.
-
-DVC pipelines and their data can also be easily versioned (using Git). This
-allows you to better organize projects, and reproduce your workflow and results
-later — exactly as they were built originally! For example, you could capture a
-simple ETL workflow, organize a data science project, or build a detailed
-machine learning pipeline.
-
-Later on, we will find DVC manages the execution of
+Versioning large data files and directories for data science is powerful, but
+often not enough. Data needs to be filtered, cleaned, and transformed before
+training ML models - for that purpose DVC introduces a build system to define,
+execute and track _data pipelines_ — a series of data processing stages, that
+produce a final result.
+
+**💫 DVC is a "Makefile" system for machine learning projects!**
+
+DVC pipelines are versioned using Git, and allow you to better organize projects
+and reproduce complete workflows and results at will. You could capture a simple
+ETL workflow, organize your project, or build a complex DAG (Directed Acyclic
+Graph) pipeline.
+
+Later, we will find DVC allows you to manage
 [machine learning experiments](/doc/start/experiments/experiment-pipelines) on
 top of these pipelines - controlling their execution, injecting parameters, etc.
 
-## Pipeline stages
+## Setup
 
-Use `dvc stage add` to create _stages_. These represent processes (source code
-tracked with Git) which form the steps of a _pipeline_. Stages also connect code
-to its corresponding data _input_ and _output_. Let's transform a Python script
-into a [stage](/doc/command-reference/stage):
+Working inside an [initialized DVC project](/doc/start#initializing-a-project),
+let's get some sample code for the next steps:
+
+```cli
+$ wget https://code.dvc.org/get-started/code.zip
+$ unzip code.zip && rm -f code.zip
+```
 
 <details>
 
-### ⚙️ Expand to download example code.
+### 💡 Expand to inspect project structure
 
 Get the sample code like this:
 
 ```cli
-$ wget https://code.dvc.org/get-started/code.zip
-$ unzip code.zip
-$ rm -f code.zip
 $ tree
 .
 ├── params.yaml
@@ -57,28 +59,47 @@ $ tree
     └── train.py
 ```
 
-Now let's install the requirements:
+</details>
 
-> We **strongly** recommend creating a
-> [virtual environment](https://python.readthedocs.io/en/stable/library/venv.html)
-> first.
+The DVC tracked data needed to run this example can be downloaded using
+`dvc get`:
 
 ```cli
-$ pip install -r src/requirements.txt
+$ dvc get https://github.com/iterative/dataset-registry \
+          get-started/data.xml -o data/data.xml
 ```
 
-Please also add or commit the source code directory with Git at this point.
+Now, let's go through some usual project setup steps (virtualenv, requirements,
+Git).
 
-<admon type="info">
+First, create and use a
+[virtual environment](https://python.readthedocs.io/en/stable/library/venv.html)
+(it's not a must, but we **strongly** recommend it):
 
-The data needed to run this example can be found [in a previous page].
+```cli
+$ virtualenv venv && echo "venv" > .gitignore
+$ source venv/bin/activate
+```
 
-</admon>
+Next, install the Python requirements:
 
-[in a previous page]:
-  /doc/start/data-management/data-versioning#expand-to-get-an-example-dataset
+```cli
+$ pip install -r src/requirements.txt
+```
 
-</details>
+Finally, this is a good time to commit our code to Git:
+
+```cli
+$ git add .github/ data/ params.yaml src .gitignore
+$ git commit -m "Initial commit"
+```
+
+## Pipeline stages
+
+Use `dvc stage add` to create _stages_. These represent processing steps
+(usually scripts/code tracked with Git) and combine to form the _pipeline_.
+Stages allow connecting code to its corresponding data _input_ and _output_.
+Let's transform a Python script into a [stage](/doc/command-reference/stage):
 
 ```cli
 $ dvc stage add -n prepare \
@@ -92,24 +113,29 @@ A `dvc.yaml` file is generated. It includes information about the command we
 want to run (`python src/prepare.py data/data.xml`), its
 <abbr>dependencies</abbr>, and <abbr>outputs</abbr>.
 
-DVC uses these metafiles to track the data used and produced by the stage, so
-there's no need to use `dvc add` on `data/prepared`
-[manually](/doc/start/data-management/data-versioning).
+<admon type="tip">
+
+DVC uses the pipeline definition to **automatically track** the data used and
+produced by any stage, so there's no need to manually run `dvc add` for
+`data/prepared`!
+
+</admon>
 
 <details id="stage-expand-to-see-what-happens-under-the-hood">
 
-### 💡 Expand to see what happens under the hood.
+### 💡 Expand to get a peek under the hood
 
-The command options used above mean the following:
+Details on the command options used above:
 
 - `-n prepare` specifies a name for the stage. If you open the `dvc.yaml` file
   you will see a section named `prepare`.
 
 - `-p prepare.seed,prepare.split` defines special types of dependencies —
-  [parameters](/doc/command-reference/params). We'll get to them later in the
+  [parameters](/doc/command-reference/params). Any stage can depend on parameter
+  values from a parameters file (`params.yaml` by default). We'll discuss those
+  more in the
   [Metrics, Parameters, and Plots](/doc/start/data-management/metrics-parameters-plots)
-  page, but the idea is that the stage can depend on field values from a
-  parameters file (`params.yaml` by default):
+  page.
 
 ```yaml
 prepare:
@@ -118,13 +144,15 @@ prepare:
 ```
 
 - `-d src/prepare.py` and `-d data/data.xml` mean that the stage depends on
-  these files to work. Notice that the source code itself is marked as a
-  dependency. If any of these files change later, DVC will know that this stage
-  needs to be [reproduced](#reproduce).
+  these files (dependencies) to work. Notice that the source code itself is
+  marked as a dependency as well. If any of these files change, DVC will know
+  that this stage needs to be [reproduced](#reproduce) when the pipeline is
+  executed.
 
 - `-o data/prepared` specifies an output directory for this script, which writes
-  two files in it. This is how the <abbr>workspace</abbr> should look like after
-  the run:
+  two files in it.
+
+  This is how the <abbr>workspace</abbr> looks like after the run:
 
   ```git
    .
@@ -162,22 +190,17 @@ stages:
 
 </details>
 
-Once you added a stage, you can run the pipeline with `dvc repro`. Next, you can
-use `dvc push` if you wish to save all the data [to remote storage] (usually
-along with `git commit` to version DVC metafiles).
-
-[to remote storage]:
-  /doc/start/data-management/data-versioning#storing-and-sharing
+Once you've added a stage, you can run the pipeline with `dvc repro`.
 
 ## Dependency graphs
 
 By using `dvc stage add` multiple times, defining <abbr>outputs</abbr> of a
 stage as <abbr>dependencies</abbr> of another, we can describe a sequence of
-commands which gets to some desired result. This is what we call a [dependency
-graph] and it's what forms a cohesive pipeline.
+dependent commands which gets to some desired result. This is what we call a
+[dependency graph] which forms a full cohesive pipeline.
 
-Let's create a second stage chained to the outputs of `prepare`, to perform
-feature extraction:
+Let's create a 2nd stage chained to the outputs of `prepare`, to perform feature
+extraction:
 
 ```cli
 $ dvc stage add -n featurize \
@@ -187,49 +210,9 @@ $ dvc stage add -n featurize \
                 python src/featurization.py data/prepared data/features
 ```
 
-The `dvc.yaml` file is updated automatically and should include two stages now.
-
-<details id="pipeline-expand-to-see-what-happens-under-the-hood">
-
-### 💡 Expand to see what happens under the hood.
-
-The changes to the `dvc.yaml` should look like this:
-
-```git
- stages:
-   prepare:
-     cmd: python src/prepare.py data/data.xml
-     deps:
-     - data/data.xml
-     - src/prepare.py
-     params:
-     - prepare.seed
-     - prepare.split
-     outs:
-     - data/prepared
-+  featurize:
-+    cmd: python src/featurization.py data/prepared data/features
-+    deps:
-+    - data/prepared
-+    - src/featurization.py
-+    params:
-+    - featurize.max_features
-+    - featurize.ngrams
-+    outs:
-+    - data/features
-```
-
-Note that you can create and edit `dvc.yaml` files manually instead of using
-helper `dvc stage add`.
+The `dvc.yaml` file will now be updated to include the two stages.
 
-</details>
-
-<details>
-
-### ⚙️ Expand to add more stages.
-
-Let's add the training itself. Nothing new this time; just the same
-`dvc stage add` command with the same set of options:
+And finally, let's add a 3rd `train` stage:
 
 ```cli
 $ dvc stage add -n train \
@@ -239,61 +222,36 @@ $ dvc stage add -n train \
                 python src/train.py data/features model.pkl
 ```
 
-Please check the `dvc.yaml` again, it should have one more stage now.
+Finally, our `dvc.yaml` should have all 3 stages.
 
-</details>
-
-This should be a good time to commit the changes with Git. These include
-`.gitignore`, `dvc.lock`, and `dvc.yaml` — which describe our pipeline.
+<admon type="tip">
 
-## Reproduce
-
-The whole point of creating this `dvc.yaml` file is the ability to easily
-reproduce a pipeline:
+This would be a good time to commit the changes with Git. These include
+`.gitignore`(s) and `dvc.yaml` — which describes our pipeline.
 
 ```cli
-$ dvc repro
+$ git add .gitignore data/.gitignore dvc.yaml
+$ git commit -m "pipeline defined"
 ```
 
-<details>
-
-### ⚙️ Expand to have some fun with it.
-
-Let's try to play a little bit with it. First, let's try to change one of the
-parameters for the training stage:
-
-1. Open `params.yaml` and change `n_est` to `100`, and
-2. (re)run `dvc repro`.
-
-You should see:
+</admon>
 
-```cli
-$ dvc repro
-Stage 'prepare' didn't change, skipping
-Stage 'featurize' didn't change, skipping
-Running stage 'train' with command: ...
-```
+Great! Now we're ready to run the pipeline.
 
-DVC detected that only `train` should be run, and skipped everything else! All
-the intermediate results are being reused.
+## Reproducing
 
-Now, let's change it back to `50` and run `dvc repro` again:
+The pipeline definition in `dvc.yaml` allow us to easily reproduce the pipeline:
 
 ```cli
 $ dvc repro
-Stage 'prepare' didn't change, skipping
-Stage 'featurize' didn't change, skipping
 ```
 
-As before, there was no need to rerun `prepare`, `featurize`, etc. But this time
-it also doesn't rerun `train`! The previous run with the same set of inputs
-(parameters & data) was saved in DVC's <abbr>run cache</abbr>, and reused here.
-
-</details>
+You'll notice a `dvc.lock` (a "state file") was created to capture the
+reproduction's results.
 
 <details id="repro-expand-to-see-what-happens-under-the-hood">
 
-### 💡 Expand to see what happens under the hood.
+### 💡 Expand to get a peek under the hood
 
 `dvc repro` relies on the [dependency graph] of stages defined in `dvc.yaml`,
 and uses `dvc.lock` to determine what exactly needs to be run.
@@ -336,26 +294,54 @@ state of the workspace.
 
 </details>
 
-DVC pipelines (`dvc.yaml` file, `dvc stage add`, and `dvc repro` commands) solve
-a few important problems:
+It's good practice to immediately commit `dvc.lock` to Git after its creation or
+modification, to record the current state & results:
 
-- _Automation_: run a sequence of steps in a "smart" way which makes iterating
-  on your project faster. DVC automatically determines which parts of a project
-  need to be run, and it caches "runs" and their results to avoid unnecessary
-  reruns.
-- _Reproducibility_: `dvc.yaml` and `dvc.lock` files describe what data to use
-  and which commands will generate the pipeline results (such as an ML model).
-  Storing these files in Git makes it easy to version and share.
-- [_Continuous Delivery and Continuous Integration (CI/CD) for ML_](/doc/use-cases/ci-cd-for-machine-learning):
-  describing projects in way that can be reproduced (built) is the first
-  necessary step before introducing CI/CD systems. See our sister project
-  [CML](https://cml.dev) for some examples.
+```cli
+$ git add dvc.lock && git commit -m "first pipeline repro"
+```
+
+<details>
+
+### ⚙️ Learn how to parametrize and use cached results
+
+Let's try to have a little bit of fun with it. First, change one of the
+parameters for the training stage:
+
+1. Open `params.yaml` and change `n_est` to `100`, and
+2. (re)run `dvc repro`.
+
+You will see:
+
+```cli
+$ dvc repro
+Stage 'prepare' didn't change, skipping
+Stage 'featurize' didn't change, skipping
+Running stage 'train' with command: ...
+```
+
+DVC detected that only `train` should be run, and skipped everything else! All
+the intermediate results are being reused.
+
+Now, let's change it back to `50` and run `dvc repro` again:
+
+```cli
+$ dvc repro
+Stage 'prepare' didn't change, skipping
+Stage 'featurize' didn't change, skipping
+```
+
+As before, there was no need to rerun `prepare`, `featurize`, etc. But this time
+it also doesn't rerun `train`! The previous run with the same set of inputs
+(parameters & data) was saved in DVC's <abbr>run cache</abbr>, and was reused.
 
-## Visualize
+</details>
+
+## Visualizing
 
 Having built our pipeline, we need a good way to understand its structure.
-Seeing a graph of connected stages would help. DVC lets you do so without
-leaving the terminal!
+Visualizing it as a graph of connected stages helps with that. DVC lets you do
+so without leaving the terminal!
 
 ```cli
 $ dvc dag
@@ -376,5 +362,25 @@ $ dvc dag
           +-------+
 ```
 
-> Refer to `dvc dag` to explore other ways this command can visualize a
-> pipeline.
+<admon icon="book">
+
+Refer to `dvc dag` to explore other ways this command can visualize a pipeline.
+
+</admon>
+
+## Summary
+
+DVC pipelines (`dvc.yaml` file, `dvc stage add`, and `dvc repro` commands) solve
+a few important problems:
+
+- _Automation_: run a sequence of steps in a "smart" way which makes iterating
+  on your project faster. DVC automatically determines which parts of a project
+  need to be run, and it caches "runs" and their results to avoid unnecessary
+  reruns.
+- _Reproducibility_: `dvc.yaml` and `dvc.lock` files describe what data to use
+  and which commands will generate the pipeline results (such as an ML model).
+  Storing these files in Git makes it easy to version and share.
+- [_Continuous Delivery and Continuous Integration (CI/CD) for ML_](/doc/use-cases/ci-cd-for-machine-learning):
+  describing projects in a way that can be built and reproduced is the first
+  necessary step before introducing CI/CD systems. See our sister project
+  [CML](https://cml.dev) for some examples.
diff --git a/content/docs/start/data-management/data-versioning.md b/content/docs/start/data-management/data-versioning.md
index a64c788cbf..062fb1be07 100644
--- a/content/docs/start/data-management/data-versioning.md
+++ b/content/docs/start/data-management/data-versioning.md
@@ -15,18 +15,20 @@ https://youtu.be/kLKBcPonMYw
 
 </details>
 
-How cool would it be to make Git handle arbitrarily large files and directories
-with the same performance it has with small code files? Imagine cloning a
-repository and seeing data files and machine learning models in the workspace.
-Or switching to a different version of a 100Gb file in less than a second with a
-`git checkout`. Think "Git for data".
+How cool would it be to track large datasets and machine learning models
+alongside your code, sidestepping all the limitations of storing it in Git?
+Imagine cloning a repository and immediately seeing your datasets, checkpoints
+and models staged in your workspace. Imagine switching to a different version of
+a 100Gb file in less than a second with a `git checkout`.
 
-<details>
+**💫 DVC is your _"Git for data"_!**
 
-### ⚙️ Expand to get an example dataset.
+## Tracking data
 
-Having initialized a project in the previous section, we can get the data file
-(which we'll be using later) like this:
+Working inside an [initialized project](/doc/start#initializing-a-project)
+directory, let's pick a piece of data to work with. We'll use an example
+`data.xml` file, though any text or binary file (or directory) will do. Start by
+running:
 
 ```cli
 $ dvc get https://github.com/iterative/dataset-registry \
@@ -35,42 +37,41 @@ $ dvc get https://github.com/iterative/dataset-registry \
 
 <admon type="info">
 
-We use the fancy `dvc get` command to jump ahead a bit and show how a Git repo
-becomes a source for datasets or models — what we call a [data registry].
-`dvc get` can download any file or directory tracked in a <abbr>DVC
+We used `dvc get` above to show how DVC can turn any Git repo into a "[data
+registry]". `dvc get` can download any file or directory tracked in a <abbr>DVC
 repository</abbr>.
 
 [data registry]: /doc/use-cases/data-registry
 
 </admon>
 
-</details>
-
-To start tracking a file or directory, use `dvc add`:
+Use `dvc add` to start tracking the dataset file:
 
 ```cli
 $ dvc add data/data.xml
 ```
 
 DVC stores information about the added file in a special `.dvc` file named
-`data/data.xml.dvc` -- a small text file with a human-readable [format]. This
-metadata file is a placeholder for the original data, and can be easily
-versioned like source code with Git:
+`data/data.xml.dvc`. This small, human-readable metadata file acts as a
+placeholder for the original data for the purpose of Git tracking.
+
+Next, run the following commands to track changes in Git:
 
 ```cli
 $ git add data/data.xml.dvc data/.gitignore
 $ git commit -m "Add raw data"
 ```
 
-The data, meanwhile, is listed in `.gitignore`.
+Now the _metadata about your data_ is versioned alongside your source code,
+while the original data file was added to `.gitignore`.
 
-<details id="add-click-to-see-what-happens-under-the-hood">
+<details id="add-click-to-get-a-peek-under-the-hood">
 
-### 💡 Click to see what happens under the hood.
+### 💡 Expand to get a peek under the hood
 
 `dvc add` moved the data to the project's <abbr>cache</abbr>, and
-<abbr>linked</abbr> it back to the <abbr>workspace</abbr>. The `.dvc/cache`
-should look like this:
+<abbr>linked</abbr> it back to the <abbr>workspace</abbr>. The `.dvc/cache` will
+look like this:
 
 ```
 .dvc/cache
@@ -90,28 +91,21 @@ outs:
 
 </details>
 
-[format]: /doc/user-guide/project-structure/dvc-files
-
 ## Storing and sharing
 
-You can upload DVC-tracked data or models with `dvc push`. This requires setting
-up [remote storage] first, for example on Amazon S3:
-
-[remote storage]: /doc/user-guide/data-management/remote-storage
-
-```cli
-$ dvc remote add -d storage s3://mybucket/dvcstore
-$ dvc push
-```
-
-<details>
+You can upload DVC-tracked data to a variety of storage systems (remote or
+local) referred to as
+["remotes"](/doc/user-guide/data-management/remote-storage). For simplicity, we
+will use a "local remote" for this guide, which is just a directory in the local
+file system.
 
-### ⚠️ That didn't work!
+### Configuring a remote
 
-Instead of the S3 remote in the next block, use this "local remote" (another
-directory in the local file system) to try `dvc push`:
+Before pushing data to a remote we need to set it up using the `dvc remote add`
+command:
 
 <toggle>
+
 <tab title="Mac/Linux">
 
 ```cli
@@ -130,21 +124,42 @@ $ dvc remote add -d myremote %TEMP%\dvcstore
 </tab>
 </toggle>
 
-<admon type="info">
+<admon icon="info">
 
-DVC supports many remote [storage types], including Amazon S3, SSH, Google
+DVC supports many remote [storage types], including Amazon S3, NFS,SSH, Google
 Drive, Azure Blob Storage, and HDFS.
 
+An example for a common use case is configuring an [Amazon S3] remote:
+
+```cli
+$ dvc remote add -d storage s3://mybucket/dvcstore
+```
+
+For this to work, you'll need an AWS account and credentials set up to allow
+access.
+
+To learn more about storage remotes, see the [Remote Storage Guide].
+
+[Amazon S3]: /doc/user-guide/data-management/remote-storage/amazon-s3
 [storage types]:
   /doc/user-guide/data-management/remote-storage#supported-storage-types
+[Remote Storage Guide]: /doc/user-guide/data-management/remote-storage
 
 </admon>
 
 </details>
 
-<details id="push-click-to-see-what-happens-under-the-hood">
+### Uploading data
+
+Now that a storage remote was configured, run `dvc push` to upload data:
 
-### 💡 Click to see what happens under the hood.
+```cli
+$ dvc push
+```
+
+<details id="push-click-to-get-a-peek-under-the-hood">
+
+#### 💡 Expand to get a peek under the hood
 
 `dvc push` copied the data <abbr>cached</abbr> locally to the remote storage we
 set up earlier. The remote storage directory should look like this:
@@ -161,21 +176,30 @@ If you prefer to keep human-readable filenames, you can use [cloud versioning].
 
 </details>
 
-Usually, we also want to `git commit` (and `git push`) the project config
-changes.
+Usually, we would also want to Git track any code changes that led to the data
+change ( `git add`, `git commit` and `git push` ).
+
+### Retrieving data
+
+Once DVC-tracked data and models are stored remotely, they can be downloaded
+with `dvc pull` when needed (e.g. in other copies of this <abbr>project</abbr>).
+Usually, we run it after `git pull` or `git clone`.
 
-## Retrieving
+Let's try this now:
 
-Having DVC-tracked data and models stored remotely, it can be downloaded with
-`dvc pull` when needed (e.g. in other copies of this <abbr>project</abbr>).
-Usually, we run it after `git clone` and `git pull`.
+```cli
+$ dvc pull
+```
 
 <details>
 
-### ⚙️ Expand to delete locally cached data.
+#### Expand to simulate a "fresh pull"
 
-If you've run `dvc push` successfully, empty the <abbr>cache</abbr> and delete
-`data/data.xml` for `dvc pull` to have an effect:
+After running `dvc push` above, the `dvc pull` command afterwards was
+short-circuited by DVC for efficiency. The project's `data/data.xml` file, our
+<abbr>cache</abbr> and the remote storage were all already in sync. We need to
+empty the <abbr>cache</abbr> and delete `data/data.xml` from our project if we
+want to have DVC actually moving data around. Let's do that now:
 
 <toggle>
 <tab title="Mac/Linux">
@@ -196,29 +220,18 @@ $ del data\data.xml
 </tab>
 </toggle>
 
-</details>
+Now we can run `dvc pull` to retrieve the data from the remote:
 
 ```cli
 $ dvc pull
 ```
 
-<admon icon="book">
-
-See [Remote Storage] for more information on remote storage.
-
-</admon>
-
-## Making changes
-
-When you make a change to a file or directory, run `dvc add` again to track the
-latest version:
-
-<details>
+</details>
 
-### ⚙️ Expand to make some changes.
+## Making local changes
 
-Let's say we obtained more data from some external source. We can pretend this
-is the case by doubling the dataset:
+Next, let's say we obtained more data from some external source. We will
+simulate this by doubling the dataset contents:
 
 <toggle>
 <tab title="Mac/Linux">
@@ -239,13 +252,14 @@ $ type %TEMP%\data.xml >> data\data.xml
 </tab>
 </toggle>
 
-</details>
+After modifying the data, run `dvc add` again to track the latest version:
 
 ```cli
 $ dvc add data/data.xml
 ```
 
-Usually you would also run `dvc push` and `git commit` to save the changes:
+Now we can run `dvc push` to upload the changes to the remote storage, followed
+by a `git commit` to track them:
 
 ```cli
 $ dvc push
@@ -254,17 +268,16 @@ $ git commit data/data.xml.dvc -m "Dataset updates"
 
 ## Switching between versions
 
-The regular workflow is to use `git checkout` first (to switch a branch or
-checkout a `.dvc` file version) and then run `dvc checkout` to sync data:
+A commonly used workflow is to use `git checkout` to switch to a branch or
+checkout a specific `.dvc` file revision, followed by a `dvc checkout` to sync
+data into your <abbr>workspace</abbr>:
 
 ```cli
 $ git checkout <...>
 $ dvc checkout
 ```
 
-<details>
-
-### ⚙️ Expand to get the previous version of the dataset.
+## Return to a previous version of the dataset
 
 Let's go back to the original version of the data:
 
@@ -280,33 +293,20 @@ of the dataset was already saved):
 $ git commit data/data.xml.dvc -m "Revert dataset updates"
 ```
 
-</details>
-
-Yes, DVC is technically not a version control system! Git itself provides that
-layer. DVC in turn manipulates `.dvc` files, whose contents define the data file
-versions. DVC also synchronizes DVC-tracked data in the <abbr>workspace</abbr>
-efficiently to match them.
-
-## Discovering and accessing data
-
-DVC helps you with accessing and using your data artifacts from outside of the
-project where they are versioned, and your tracked data can be imported and
-fetched from anywhere. For example, you may want to download a specific version
-of an ML model to a deployment server or import a dataset into another project.
-To learn about how DVC allows you to do this, see the
-[discovering and accessing data guide](/doc/user-guide/data-management/discovering-and-accessing-data).
+<admon type="info">
 
-## Large datasets versioning
+As you can see, DVC is technically not a version control system by itself! It
+manipulates `.dvc` files, whose contents define the data file versions. Git is
+already used to version your code, and now it can also version your data
+alongside it.
 
-In cases where you process very large datasets, you need an efficient mechanism
-(in terms of space and performance) to share a lot of data, including different
-versions. Do you use network attached storage (NAS)? Or a large external volume?
-You can learn more about advanced workflows using these links:
+</admon>
 
-- A [shared cache](/doc/user-guide/how-to/share-a-dvc-cache) can be set up to
-  store, version and access a lot of data on a large shared volume efficiently.
-- An advanced scenario is to track and version data directly on the remote
-  storage (e.g. S3, SSH). See [Managing External Data] to learn more.
+### Discovering and accessing data
 
-[managing external data]:
-  https://dvc.org/doc/user-guide/data-management/managing-external-data
+Your tracked data can be imported and fetched from anywhere using DVC. For
+example, you may want to download a specific version of an ML model to a
+deployment server or import a dataset into another project like we did at the
+[top of this chapter](/doc/start/data-management/data-versioning?tab=Mac-Linux#tracking-data).
+To learn about how DVC allows you to do this, see
+[Discovering and Accessing Data Guide](/doc/user-guide/data-management/discovering-and-accessing-data).
diff --git a/content/docs/start/data-management/metrics-parameters-plots.md b/content/docs/start/data-management/metrics-parameters-plots.md
index 41278d2b82..9975b6a8bc 100644
--- a/content/docs/start/data-management/metrics-parameters-plots.md
+++ b/content/docs/start/data-management/metrics-parameters-plots.md
@@ -46,7 +46,7 @@ $ dvc repro
 
 <details>
 
-### 💡 Expand to see what happens under the hood.
+### 💡 Expand to get a peek under the hood
 
 The `-O` option here specifies an output that will not be <abbr>cached</abbr> by
 DVC, and `-M` specifies a metrics file (that will also not be cached).
@@ -117,7 +117,7 @@ eval/live/metrics.json  0.94496          0.97723           0.96191         0.987
 
 ## Visualizing plots
 
-The stage also writes different files with data that can be graphed:
+The `evaluate` stage also writes different files with data that can be graphed:
 
 - [DVCLive]-generated [`roc_curve`] and [`confusion_matrix`] values in the
   `eval/live/plots` directory.
@@ -160,9 +160,9 @@ plots:
   - eval/importance.png
 ```
 
-To render them, you can run `dvc plots show` (shown below), which generates an
-HTML file you can open in a browser. Or you can load your project in VS Code and
-use the [DVC Extension]'s [Plots Dashboard].
+To render them, run `dvc plots show` (shown below), which generates an HTML file
+you can open in a browser. Or you can load your project in VS Code and use the
+[DVC Extension]'s [Plots Dashboard].
 
 ```cli
 $ dvc plots show
diff --git a/content/docs/start/index.md b/content/docs/start/index.md
index bde323bc70..7d371e7c1a 100644
--- a/content/docs/start/index.md
+++ b/content/docs/start/index.md
@@ -11,7 +11,8 @@ pipelines and metrics, and manage experiments.'
 ## Get Started with DVC
 -->
 
-Before we begin, let's prepare a project for this guide
+Before we begin, settle on a directory for this guide. Everything we will do
+will be self contained there.
 
 <details>
 
@@ -35,8 +36,11 @@ This directory name is used in our
 
 </details>
 
-Assuming DVC is already [installed](/doc/install), initialize it by running
-`dvc init` inside a Git project:
+## Initializing a project
+
+Inside your chosen directory, we will use our current working directory as a
+<abbr>DVC project</abbr>. Let's initialize it by running `dvc init` inside a Git
+project:
 
 ```cli
 $ dvc init
diff --git a/content/docs/user-guide/data-management/discovering-and-accessing-data.md b/content/docs/user-guide/data-management/discovering-and-accessing-data.md
index 86845415a8..8083eaec7d 100644
--- a/content/docs/user-guide/data-management/discovering-and-accessing-data.md
+++ b/content/docs/user-guide/data-management/discovering-and-accessing-data.md
@@ -81,7 +81,7 @@ bring in changes from the data source later using `dvc update`.
 
 <details>
 
-### 💡 Expand to see what happens under the hood.
+### 💡 Expand to get a peek under the hood
 
 <admon type="info">
 
diff --git a/content/docs/user-guide/data-management/remote-storage/index.md b/content/docs/user-guide/data-management/remote-storage/index.md
index aa0bbcf255..060aeb67f8 100644
--- a/content/docs/user-guide/data-management/remote-storage/index.md
+++ b/content/docs/user-guide/data-management/remote-storage/index.md
@@ -1,9 +1,10 @@
 # Remote Storage
 
-_DVC remotes_ provide optional/additional storage to back up and share your data
-and ML models. For example, you can download data artifacts created by
-colleagues without spending time and resources to regenerate them locally. See
-also `dvc push` and `dvc pull`.
+_DVC remotes_ provide access to external storage locations to track and share
+your data and ML models. Usually, those will be shared between devices or team
+members who are working on a project. For example, you can download data
+artifacts created by colleagues without spending time and resources to
+regenerate them locally. See also `dvc push` and `dvc pull`.
 
 <admon type="info">