diff --git a/content/docs/start/data-management/data-pipelines.md b/content/docs/start/data-management/data-pipelines.md
index dd271b1b85..4bcf2b518b 100644
--- a/content/docs/start/data-management/data-pipelines.md
+++ b/content/docs/start/data-management/data-pipelines.md
@@ -14,38 +14,40 @@ https://youtu.be/71IGzyH95UY
-Versioning large data files and directories for data science is great, but not
-enough. How is data filtered, transformed, or used to train ML models? DVC
-introduces a mechanism to capture _data pipelines_ — series of data processes
-that produce a final result.
-
-DVC pipelines and their data can also be easily versioned (using Git). This
-allows you to better organize projects, and reproduce your workflow and results
-later — exactly as they were built originally! For example, you could capture a
-simple ETL workflow, organize a data science project, or build a detailed
-machine learning pipeline.
-
-Later on, we will find DVC manages the execution of
+Versioning large data files and directories for data science is powerful, but
+often not enough. Data needs to be filtered, cleaned, and transformed before
+training ML models - for that purpose DVC introduces a build system to define,
+execute and track _data pipelines_ — a series of data processing stages, that
+produce a final result.
+
+**💫 DVC is a "Makefile" system for machine learning projects!**
+
+DVC pipelines are versioned using Git, and allow you to better organize projects
+and reproduce complete workflows and results at will. You could capture a simple
+ETL workflow, organize your project, or build a complex DAG (Directed Acyclic
+Graph) pipeline.
+
+Later, we will find DVC allows you to manage
[machine learning experiments](/doc/start/experiments/experiment-pipelines) on
top of these pipelines - controlling their execution, injecting parameters, etc.
-## Pipeline stages
+## Setup
-Use `dvc stage add` to create _stages_. These represent processes (source code
-tracked with Git) which form the steps of a _pipeline_. Stages also connect code
-to its corresponding data _input_ and _output_. Let's transform a Python script
-into a [stage](/doc/command-reference/stage):
+Working inside an [initialized DVC project](/doc/start#initializing-a-project),
+let's get some sample code for the next steps:
+
+```cli
+$ wget https://code.dvc.org/get-started/code.zip
+$ unzip code.zip && rm -f code.zip
+```
-### ⚙️ Expand to download example code.
+### 💡 Expand to inspect project structure
Get the sample code like this:
```cli
-$ wget https://code.dvc.org/get-started/code.zip
-$ unzip code.zip
-$ rm -f code.zip
$ tree
.
├── params.yaml
@@ -57,28 +59,47 @@ $ tree
└── train.py
```
-Now let's install the requirements:
+
-> We **strongly** recommend creating a
-> [virtual environment](https://python.readthedocs.io/en/stable/library/venv.html)
-> first.
+The DVC tracked data needed to run this example can be downloaded using
+`dvc get`:
```cli
-$ pip install -r src/requirements.txt
+$ dvc get https://github.com/iterative/dataset-registry \
+ get-started/data.xml -o data/data.xml
```
-Please also add or commit the source code directory with Git at this point.
+Now, let's go through some usual project setup steps (virtualenv, requirements,
+Git).
-
+First, create and use a
+[virtual environment](https://python.readthedocs.io/en/stable/library/venv.html)
+(it's not a must, but we **strongly** recommend it):
-The data needed to run this example can be found [in a previous page].
+```cli
+$ virtualenv venv && echo "venv" > .gitignore
+$ source venv/bin/activate
+```
-
+Next, install the Python requirements:
-[in a previous page]:
- /doc/start/data-management/data-versioning#expand-to-get-an-example-dataset
+```cli
+$ pip install -r src/requirements.txt
+```
-
+Finally, this is a good time to commit our code to Git:
+
+```cli
+$ git add .github/ data/ params.yaml src .gitignore
+$ git commit -m "Initial commit"
+```
+
+## Pipeline stages
+
+Use `dvc stage add` to create _stages_. These represent processing steps
+(usually scripts/code tracked with Git) and combine to form the _pipeline_.
+Stages allow connecting code to its corresponding data _input_ and _output_.
+Let's transform a Python script into a [stage](/doc/command-reference/stage):
```cli
$ dvc stage add -n prepare \
@@ -92,24 +113,29 @@ A `dvc.yaml` file is generated. It includes information about the command we
want to run (`python src/prepare.py data/data.xml`), its
dependencies, and outputs.
-DVC uses these metafiles to track the data used and produced by the stage, so
-there's no need to use `dvc add` on `data/prepared`
-[manually](/doc/start/data-management/data-versioning).
+
+
+DVC uses the pipeline definition to **automatically track** the data used and
+produced by any stage, so there's no need to manually run `dvc add` for
+`data/prepared`!
+
+
-### 💡 Expand to see what happens under the hood.
+### 💡 Expand to get a peek under the hood
-The command options used above mean the following:
+Details on the command options used above:
- `-n prepare` specifies a name for the stage. If you open the `dvc.yaml` file
you will see a section named `prepare`.
- `-p prepare.seed,prepare.split` defines special types of dependencies —
- [parameters](/doc/command-reference/params). We'll get to them later in the
+ [parameters](/doc/command-reference/params). Any stage can depend on parameter
+ values from a parameters file (`params.yaml` by default). We'll discuss those
+ more in the
[Metrics, Parameters, and Plots](/doc/start/data-management/metrics-parameters-plots)
- page, but the idea is that the stage can depend on field values from a
- parameters file (`params.yaml` by default):
+ page.
```yaml
prepare:
@@ -118,13 +144,15 @@ prepare:
```
- `-d src/prepare.py` and `-d data/data.xml` mean that the stage depends on
- these files to work. Notice that the source code itself is marked as a
- dependency. If any of these files change later, DVC will know that this stage
- needs to be [reproduced](#reproduce).
+ these files (dependencies) to work. Notice that the source code itself is
+ marked as a dependency as well. If any of these files change, DVC will know
+ that this stage needs to be [reproduced](#reproduce) when the pipeline is
+ executed.
- `-o data/prepared` specifies an output directory for this script, which writes
- two files in it. This is how the workspace should look like after
- the run:
+ two files in it.
+
+ This is how the workspace looks like after the run:
```git
.
@@ -162,22 +190,17 @@ stages:
-Once you added a stage, you can run the pipeline with `dvc repro`. Next, you can
-use `dvc push` if you wish to save all the data [to remote storage] (usually
-along with `git commit` to version DVC metafiles).
-
-[to remote storage]:
- /doc/start/data-management/data-versioning#storing-and-sharing
+Once you've added a stage, you can run the pipeline with `dvc repro`.
## Dependency graphs
By using `dvc stage add` multiple times, defining outputs of a
stage as dependencies of another, we can describe a sequence of
-commands which gets to some desired result. This is what we call a [dependency
-graph] and it's what forms a cohesive pipeline.
+dependent commands which gets to some desired result. This is what we call a
+[dependency graph] which forms a full cohesive pipeline.
-Let's create a second stage chained to the outputs of `prepare`, to perform
-feature extraction:
+Let's create a 2nd stage chained to the outputs of `prepare`, to perform feature
+extraction:
```cli
$ dvc stage add -n featurize \
@@ -187,49 +210,9 @@ $ dvc stage add -n featurize \
python src/featurization.py data/prepared data/features
```
-The `dvc.yaml` file is updated automatically and should include two stages now.
-
-
-
-### 💡 Expand to see what happens under the hood.
-
-The changes to the `dvc.yaml` should look like this:
-
-```git
- stages:
- prepare:
- cmd: python src/prepare.py data/data.xml
- deps:
- - data/data.xml
- - src/prepare.py
- params:
- - prepare.seed
- - prepare.split
- outs:
- - data/prepared
-+ featurize:
-+ cmd: python src/featurization.py data/prepared data/features
-+ deps:
-+ - data/prepared
-+ - src/featurization.py
-+ params:
-+ - featurize.max_features
-+ - featurize.ngrams
-+ outs:
-+ - data/features
-```
-
-Note that you can create and edit `dvc.yaml` files manually instead of using
-helper `dvc stage add`.
+The `dvc.yaml` file will now be updated to include the two stages.
-
-
-
-
-### ⚙️ Expand to add more stages.
-
-Let's add the training itself. Nothing new this time; just the same
-`dvc stage add` command with the same set of options:
+And finally, let's add a 3rd `train` stage:
```cli
$ dvc stage add -n train \
@@ -239,61 +222,36 @@ $ dvc stage add -n train \
python src/train.py data/features model.pkl
```
-Please check the `dvc.yaml` again, it should have one more stage now.
+Finally, our `dvc.yaml` should have all 3 stages.
-
-
-This should be a good time to commit the changes with Git. These include
-`.gitignore`, `dvc.lock`, and `dvc.yaml` — which describe our pipeline.
+
-## Reproduce
-
-The whole point of creating this `dvc.yaml` file is the ability to easily
-reproduce a pipeline:
+This would be a good time to commit the changes with Git. These include
+`.gitignore`(s) and `dvc.yaml` — which describes our pipeline.
```cli
-$ dvc repro
+$ git add .gitignore data/.gitignore dvc.yaml
+$ git commit -m "pipeline defined"
```
-
-
-### ⚙️ Expand to have some fun with it.
-
-Let's try to play a little bit with it. First, let's try to change one of the
-parameters for the training stage:
-
-1. Open `params.yaml` and change `n_est` to `100`, and
-2. (re)run `dvc repro`.
-
-You should see:
+
-```cli
-$ dvc repro
-Stage 'prepare' didn't change, skipping
-Stage 'featurize' didn't change, skipping
-Running stage 'train' with command: ...
-```
+Great! Now we're ready to run the pipeline.
-DVC detected that only `train` should be run, and skipped everything else! All
-the intermediate results are being reused.
+## Reproducing
-Now, let's change it back to `50` and run `dvc repro` again:
+The pipeline definition in `dvc.yaml` allow us to easily reproduce the pipeline:
```cli
$ dvc repro
-Stage 'prepare' didn't change, skipping
-Stage 'featurize' didn't change, skipping
```
-As before, there was no need to rerun `prepare`, `featurize`, etc. But this time
-it also doesn't rerun `train`! The previous run with the same set of inputs
-(parameters & data) was saved in DVC's run cache, and reused here.
-
-
+You'll notice a `dvc.lock` (a "state file") was created to capture the
+reproduction's results.
-### 💡 Expand to see what happens under the hood.
+### 💡 Expand to get a peek under the hood
`dvc repro` relies on the [dependency graph] of stages defined in `dvc.yaml`,
and uses `dvc.lock` to determine what exactly needs to be run.
@@ -336,26 +294,54 @@ state of the workspace.
-DVC pipelines (`dvc.yaml` file, `dvc stage add`, and `dvc repro` commands) solve
-a few important problems:
+It's good practice to immediately commit `dvc.lock` to Git after its creation or
+modification, to record the current state & results:
-- _Automation_: run a sequence of steps in a "smart" way which makes iterating
- on your project faster. DVC automatically determines which parts of a project
- need to be run, and it caches "runs" and their results to avoid unnecessary
- reruns.
-- _Reproducibility_: `dvc.yaml` and `dvc.lock` files describe what data to use
- and which commands will generate the pipeline results (such as an ML model).
- Storing these files in Git makes it easy to version and share.
-- [_Continuous Delivery and Continuous Integration (CI/CD) for ML_](/doc/use-cases/ci-cd-for-machine-learning):
- describing projects in way that can be reproduced (built) is the first
- necessary step before introducing CI/CD systems. See our sister project
- [CML](https://cml.dev) for some examples.
+```cli
+$ git add dvc.lock && git commit -m "first pipeline repro"
+```
+
+
+
+### ⚙️ Learn how to parametrize and use cached results
+
+Let's try to have a little bit of fun with it. First, change one of the
+parameters for the training stage:
+
+1. Open `params.yaml` and change `n_est` to `100`, and
+2. (re)run `dvc repro`.
+
+You will see:
+
+```cli
+$ dvc repro
+Stage 'prepare' didn't change, skipping
+Stage 'featurize' didn't change, skipping
+Running stage 'train' with command: ...
+```
+
+DVC detected that only `train` should be run, and skipped everything else! All
+the intermediate results are being reused.
+
+Now, let's change it back to `50` and run `dvc repro` again:
+
+```cli
+$ dvc repro
+Stage 'prepare' didn't change, skipping
+Stage 'featurize' didn't change, skipping
+```
+
+As before, there was no need to rerun `prepare`, `featurize`, etc. But this time
+it also doesn't rerun `train`! The previous run with the same set of inputs
+(parameters & data) was saved in DVC's run cache, and was reused.
-## Visualize
+
+
+## Visualizing
Having built our pipeline, we need a good way to understand its structure.
-Seeing a graph of connected stages would help. DVC lets you do so without
-leaving the terminal!
+Visualizing it as a graph of connected stages helps with that. DVC lets you do
+so without leaving the terminal!
```cli
$ dvc dag
@@ -376,5 +362,25 @@ $ dvc dag
+-------+
```
-> Refer to `dvc dag` to explore other ways this command can visualize a
-> pipeline.
+
+
+Refer to `dvc dag` to explore other ways this command can visualize a pipeline.
+
+
+
+## Summary
+
+DVC pipelines (`dvc.yaml` file, `dvc stage add`, and `dvc repro` commands) solve
+a few important problems:
+
+- _Automation_: run a sequence of steps in a "smart" way which makes iterating
+ on your project faster. DVC automatically determines which parts of a project
+ need to be run, and it caches "runs" and their results to avoid unnecessary
+ reruns.
+- _Reproducibility_: `dvc.yaml` and `dvc.lock` files describe what data to use
+ and which commands will generate the pipeline results (such as an ML model).
+ Storing these files in Git makes it easy to version and share.
+- [_Continuous Delivery and Continuous Integration (CI/CD) for ML_](/doc/use-cases/ci-cd-for-machine-learning):
+ describing projects in a way that can be built and reproduced is the first
+ necessary step before introducing CI/CD systems. See our sister project
+ [CML](https://cml.dev) for some examples.
diff --git a/content/docs/start/data-management/data-versioning.md b/content/docs/start/data-management/data-versioning.md
index a64c788cbf..062fb1be07 100644
--- a/content/docs/start/data-management/data-versioning.md
+++ b/content/docs/start/data-management/data-versioning.md
@@ -15,18 +15,20 @@ https://youtu.be/kLKBcPonMYw
-How cool would it be to make Git handle arbitrarily large files and directories
-with the same performance it has with small code files? Imagine cloning a
-repository and seeing data files and machine learning models in the workspace.
-Or switching to a different version of a 100Gb file in less than a second with a
-`git checkout`. Think "Git for data".
+How cool would it be to track large datasets and machine learning models
+alongside your code, sidestepping all the limitations of storing it in Git?
+Imagine cloning a repository and immediately seeing your datasets, checkpoints
+and models staged in your workspace. Imagine switching to a different version of
+a 100Gb file in less than a second with a `git checkout`.
-
+**💫 DVC is your _"Git for data"_!**
-### ⚙️ Expand to get an example dataset.
+## Tracking data
-Having initialized a project in the previous section, we can get the data file
-(which we'll be using later) like this:
+Working inside an [initialized project](/doc/start#initializing-a-project)
+directory, let's pick a piece of data to work with. We'll use an example
+`data.xml` file, though any text or binary file (or directory) will do. Start by
+running:
```cli
$ dvc get https://github.com/iterative/dataset-registry \
@@ -35,42 +37,41 @@ $ dvc get https://github.com/iterative/dataset-registry \
-We use the fancy `dvc get` command to jump ahead a bit and show how a Git repo
-becomes a source for datasets or models — what we call a [data registry].
-`dvc get` can download any file or directory tracked in a DVC
+We used `dvc get` above to show how DVC can turn any Git repo into a "[data
+registry]". `dvc get` can download any file or directory tracked in a DVC
repository.
[data registry]: /doc/use-cases/data-registry
-
-
-To start tracking a file or directory, use `dvc add`:
+Use `dvc add` to start tracking the dataset file:
```cli
$ dvc add data/data.xml
```
DVC stores information about the added file in a special `.dvc` file named
-`data/data.xml.dvc` -- a small text file with a human-readable [format]. This
-metadata file is a placeholder for the original data, and can be easily
-versioned like source code with Git:
+`data/data.xml.dvc`. This small, human-readable metadata file acts as a
+placeholder for the original data for the purpose of Git tracking.
+
+Next, run the following commands to track changes in Git:
```cli
$ git add data/data.xml.dvc data/.gitignore
$ git commit -m "Add raw data"
```
-The data, meanwhile, is listed in `.gitignore`.
+Now the _metadata about your data_ is versioned alongside your source code,
+while the original data file was added to `.gitignore`.
-
+
-### 💡 Click to see what happens under the hood.
+### 💡 Expand to get a peek under the hood
`dvc add` moved the data to the project's cache, and
-linked it back to the workspace. The `.dvc/cache`
-should look like this:
+linked it back to the workspace. The `.dvc/cache` will
+look like this:
```
.dvc/cache
@@ -90,28 +91,21 @@ outs:
-[format]: /doc/user-guide/project-structure/dvc-files
-
## Storing and sharing
-You can upload DVC-tracked data or models with `dvc push`. This requires setting
-up [remote storage] first, for example on Amazon S3:
-
-[remote storage]: /doc/user-guide/data-management/remote-storage
-
-```cli
-$ dvc remote add -d storage s3://mybucket/dvcstore
-$ dvc push
-```
-
-
+You can upload DVC-tracked data to a variety of storage systems (remote or
+local) referred to as
+["remotes"](/doc/user-guide/data-management/remote-storage). For simplicity, we
+will use a "local remote" for this guide, which is just a directory in the local
+file system.
-### ⚠️ That didn't work!
+### Configuring a remote
-Instead of the S3 remote in the next block, use this "local remote" (another
-directory in the local file system) to try `dvc push`:
+Before pushing data to a remote we need to set it up using the `dvc remote add`
+command:
+
```cli
@@ -130,21 +124,42 @@ $ dvc remote add -d myremote %TEMP%\dvcstore
-
+
-DVC supports many remote [storage types], including Amazon S3, SSH, Google
+DVC supports many remote [storage types], including Amazon S3, NFS,SSH, Google
Drive, Azure Blob Storage, and HDFS.
+An example for a common use case is configuring an [Amazon S3] remote:
+
+```cli
+$ dvc remote add -d storage s3://mybucket/dvcstore
+```
+
+For this to work, you'll need an AWS account and credentials set up to allow
+access.
+
+To learn more about storage remotes, see the [Remote Storage Guide].
+
+[Amazon S3]: /doc/user-guide/data-management/remote-storage/amazon-s3
[storage types]:
/doc/user-guide/data-management/remote-storage#supported-storage-types
+[Remote Storage Guide]: /doc/user-guide/data-management/remote-storage
-
+### Uploading data
+
+Now that a storage remote was configured, run `dvc push` to upload data:
-### 💡 Click to see what happens under the hood.
+```cli
+$ dvc push
+```
+
+
+
+#### 💡 Expand to get a peek under the hood
`dvc push` copied the data cached locally to the remote storage we
set up earlier. The remote storage directory should look like this:
@@ -161,21 +176,30 @@ If you prefer to keep human-readable filenames, you can use [cloud versioning].
-Usually, we also want to `git commit` (and `git push`) the project config
-changes.
+Usually, we would also want to Git track any code changes that led to the data
+change ( `git add`, `git commit` and `git push` ).
+
+### Retrieving data
+
+Once DVC-tracked data and models are stored remotely, they can be downloaded
+with `dvc pull` when needed (e.g. in other copies of this project).
+Usually, we run it after `git pull` or `git clone`.
-## Retrieving
+Let's try this now:
-Having DVC-tracked data and models stored remotely, it can be downloaded with
-`dvc pull` when needed (e.g. in other copies of this project).
-Usually, we run it after `git clone` and `git pull`.
+```cli
+$ dvc pull
+```
-### ⚙️ Expand to delete locally cached data.
+#### Expand to simulate a "fresh pull"
-If you've run `dvc push` successfully, empty the cache and delete
-`data/data.xml` for `dvc pull` to have an effect:
+After running `dvc push` above, the `dvc pull` command afterwards was
+short-circuited by DVC for efficiency. The project's `data/data.xml` file, our
+cache and the remote storage were all already in sync. We need to
+empty the cache and delete `data/data.xml` from our project if we
+want to have DVC actually moving data around. Let's do that now:
@@ -196,29 +220,18 @@ $ del data\data.xml
-
+Now we can run `dvc pull` to retrieve the data from the remote:
```cli
$ dvc pull
```
-
-
-See [Remote Storage] for more information on remote storage.
-
-
-
-## Making changes
-
-When you make a change to a file or directory, run `dvc add` again to track the
-latest version:
-
-
+
-### ⚙️ Expand to make some changes.
+## Making local changes
-Let's say we obtained more data from some external source. We can pretend this
-is the case by doubling the dataset:
+Next, let's say we obtained more data from some external source. We will
+simulate this by doubling the dataset contents:
@@ -239,13 +252,14 @@ $ type %TEMP%\data.xml >> data\data.xml
-
+After modifying the data, run `dvc add` again to track the latest version:
```cli
$ dvc add data/data.xml
```
-Usually you would also run `dvc push` and `git commit` to save the changes:
+Now we can run `dvc push` to upload the changes to the remote storage, followed
+by a `git commit` to track them:
```cli
$ dvc push
@@ -254,17 +268,16 @@ $ git commit data/data.xml.dvc -m "Dataset updates"
## Switching between versions
-The regular workflow is to use `git checkout` first (to switch a branch or
-checkout a `.dvc` file version) and then run `dvc checkout` to sync data:
+A commonly used workflow is to use `git checkout` to switch to a branch or
+checkout a specific `.dvc` file revision, followed by a `dvc checkout` to sync
+data into your workspace:
```cli
$ git checkout <...>
$ dvc checkout
```
-
-
-### ⚙️ Expand to get the previous version of the dataset.
+## Return to a previous version of the dataset
Let's go back to the original version of the data:
@@ -280,33 +293,20 @@ of the dataset was already saved):
$ git commit data/data.xml.dvc -m "Revert dataset updates"
```
-
-
-Yes, DVC is technically not a version control system! Git itself provides that
-layer. DVC in turn manipulates `.dvc` files, whose contents define the data file
-versions. DVC also synchronizes DVC-tracked data in the workspace
-efficiently to match them.
-
-## Discovering and accessing data
-
-DVC helps you with accessing and using your data artifacts from outside of the
-project where they are versioned, and your tracked data can be imported and
-fetched from anywhere. For example, you may want to download a specific version
-of an ML model to a deployment server or import a dataset into another project.
-To learn about how DVC allows you to do this, see the
-[discovering and accessing data guide](/doc/user-guide/data-management/discovering-and-accessing-data).
+
-## Large datasets versioning
+As you can see, DVC is technically not a version control system by itself! It
+manipulates `.dvc` files, whose contents define the data file versions. Git is
+already used to version your code, and now it can also version your data
+alongside it.
-In cases where you process very large datasets, you need an efficient mechanism
-(in terms of space and performance) to share a lot of data, including different
-versions. Do you use network attached storage (NAS)? Or a large external volume?
-You can learn more about advanced workflows using these links:
+
-- A [shared cache](/doc/user-guide/how-to/share-a-dvc-cache) can be set up to
- store, version and access a lot of data on a large shared volume efficiently.
-- An advanced scenario is to track and version data directly on the remote
- storage (e.g. S3, SSH). See [Managing External Data] to learn more.
+### Discovering and accessing data
-[managing external data]:
- https://dvc.org/doc/user-guide/data-management/managing-external-data
+Your tracked data can be imported and fetched from anywhere using DVC. For
+example, you may want to download a specific version of an ML model to a
+deployment server or import a dataset into another project like we did at the
+[top of this chapter](/doc/start/data-management/data-versioning?tab=Mac-Linux#tracking-data).
+To learn about how DVC allows you to do this, see
+[Discovering and Accessing Data Guide](/doc/user-guide/data-management/discovering-and-accessing-data).
diff --git a/content/docs/start/data-management/metrics-parameters-plots.md b/content/docs/start/data-management/metrics-parameters-plots.md
index 41278d2b82..9975b6a8bc 100644
--- a/content/docs/start/data-management/metrics-parameters-plots.md
+++ b/content/docs/start/data-management/metrics-parameters-plots.md
@@ -46,7 +46,7 @@ $ dvc repro
-### 💡 Expand to see what happens under the hood.
+### 💡 Expand to get a peek under the hood
The `-O` option here specifies an output that will not be cached by
DVC, and `-M` specifies a metrics file (that will also not be cached).
@@ -117,7 +117,7 @@ eval/live/metrics.json 0.94496 0.97723 0.96191 0.987
## Visualizing plots
-The stage also writes different files with data that can be graphed:
+The `evaluate` stage also writes different files with data that can be graphed:
- [DVCLive]-generated [`roc_curve`] and [`confusion_matrix`] values in the
`eval/live/plots` directory.
@@ -160,9 +160,9 @@ plots:
- eval/importance.png
```
-To render them, you can run `dvc plots show` (shown below), which generates an
-HTML file you can open in a browser. Or you can load your project in VS Code and
-use the [DVC Extension]'s [Plots Dashboard].
+To render them, run `dvc plots show` (shown below), which generates an HTML file
+you can open in a browser. Or you can load your project in VS Code and use the
+[DVC Extension]'s [Plots Dashboard].
```cli
$ dvc plots show
diff --git a/content/docs/start/index.md b/content/docs/start/index.md
index bde323bc70..7d371e7c1a 100644
--- a/content/docs/start/index.md
+++ b/content/docs/start/index.md
@@ -11,7 +11,8 @@ pipelines and metrics, and manage experiments.'
## Get Started with DVC
-->
-Before we begin, let's prepare a project for this guide
+Before we begin, settle on a directory for this guide. Everything we will do
+will be self contained there.
@@ -35,8 +36,11 @@ This directory name is used in our
-Assuming DVC is already [installed](/doc/install), initialize it by running
-`dvc init` inside a Git project:
+## Initializing a project
+
+Inside your chosen directory, we will use our current working directory as a
+DVC project. Let's initialize it by running `dvc init` inside a Git
+project:
```cli
$ dvc init
diff --git a/content/docs/user-guide/data-management/discovering-and-accessing-data.md b/content/docs/user-guide/data-management/discovering-and-accessing-data.md
index 86845415a8..8083eaec7d 100644
--- a/content/docs/user-guide/data-management/discovering-and-accessing-data.md
+++ b/content/docs/user-guide/data-management/discovering-and-accessing-data.md
@@ -81,7 +81,7 @@ bring in changes from the data source later using `dvc update`.
-### 💡 Expand to see what happens under the hood.
+### 💡 Expand to get a peek under the hood
diff --git a/content/docs/user-guide/data-management/remote-storage/index.md b/content/docs/user-guide/data-management/remote-storage/index.md
index aa0bbcf255..060aeb67f8 100644
--- a/content/docs/user-guide/data-management/remote-storage/index.md
+++ b/content/docs/user-guide/data-management/remote-storage/index.md
@@ -1,9 +1,10 @@
# Remote Storage
-_DVC remotes_ provide optional/additional storage to back up and share your data
-and ML models. For example, you can download data artifacts created by
-colleagues without spending time and resources to regenerate them locally. See
-also `dvc push` and `dvc pull`.
+_DVC remotes_ provide access to external storage locations to track and share
+your data and ML models. Usually, those will be shared between devices or team
+members who are working on a project. For example, you can download data
+artifacts created by colleagues without spending time and resources to
+regenerate them locally. See also `dvc push` and `dvc pull`.