iterative · jorgeorpinel · Nov 20, 2019 · Nov 21, 2019 · Nov 21, 2019 · Nov 21, 2019
diff --git a/static/docs/command-reference/get.md b/static/docs/command-reference/get.md
@@ -163,7 +163,7 @@ different names, and not currently tracked by Git:
 $ git status
 ...
 Untracked files:
-  (use "git add <file>..." to include in what will be committed)
+  (use "git add <file> ..." to include in what will be committed)
 
 	model.bigrams.pkl
 	model.monograms.pkl

diff --git a/static/docs/command-reference/install.md b/static/docs/command-reference/install.md
@@ -155,7 +155,7 @@ checkout the `6-featurization` tag:
 $ git checkout 6-featurization
 Note: checking out '6-featurization'.
 
-You are in 'detached HEAD' state.  ...
+You are in 'detached HEAD' state...
 
 $ dvc status
 
@@ -216,7 +216,7 @@ We can now repeat the command run earlier, to see the difference.
 $ git checkout 6-featurization
 Note: checking out '6-featurization'.
 
-You are in 'detached HEAD' state. ...
+You are in 'detached HEAD' state...
 
 HEAD is now at d13ba9a add featurization stage
 
@@ -257,8 +257,7 @@ helpfully informs us the workspace is out of sync. We should therefore run the
 
 ```dvc
 $ dvc repro evaluate.dvc
-
-... much output
+...
 To track the changes with git run:
 
     git add featurize.dvc train.dvc evaluate.dvc

diff --git a/static/docs/tutorials/deep/reproducibility.md b/static/docs/tutorials/deep/reproducibility.md
@@ -34,7 +34,7 @@ $ dvc repro model.p.dvc
 $ dvc repro
 ```
 
-Tries to reproduce the same pipeline... But there is still nothing to reproduce.
+Tries to reproduce the same pipeline, but there is still nothing to reproduce.
 
 ## Adding bigrams
 

diff --git a/static/docs/use-cases/data-registry.md b/static/docs/use-cases/data-registry.md
@@ -7,30 +7,24 @@ tracking of datasets and any other <abbr>data artifacts</abbr>.
 
 With the aim to enable reusability of these versioned artifacts between
 different projects (similar to package management systems, but for data), DVC
-also includes the `dvc get`, `dvc import`, and `dvc update` commands. For
-example, project A may use a data file to begin its data
-[pipeline](/doc/command-reference/pipeline), but project B also requires this
-same file; Instead of
-[adding it](/doc/command-reference/add#example-single-file) it to both projects,
-B can simply import it from A. Furthermore, the version of the data file
-imported to B can be an older iteration than what's currently used in A.
+also includes the `dvc get`, `dvc import`, and `dvc update` commands. This means
+that a project can depend on data from an external <abbr>DVC project</abbr>.
 
 Keeping this in mind, we could build a <abbr>DVC project</abbr> dedicated to
 tracking and versioning datasets (or any kind of large files). This way we would
-have a repository that has all the metadata and change history for the project's
-data. We can see who updated what, and when; use pull requests to update data
-the same way you do with code; and we don't need ad-hoc conventions to store
-different data versions. Other projects can share the data in the registry by
-downloading (`dvc get`) or importing (`dvc import`) them for use in different
-data processes.
+have a repository with all the metadata and history of changes in the project's
+data. We could see who updated what, and when, use pull requests to update data
+(the same way we do with code), and avoid ad-hoc conventions to store different
+data versions. This is what we call a data registry. Other projects can share
+datasets in a registry by downloading (`dvc get`) or importing (`dvc import`)
+them for use in different data processes.
 
-The advantages of using a DVC **data registry** project are:
+Advantages of using a DVC **data registry** project:
 
 - Data as code: Improve _lifecycle management_ with versioning of simple
   directory structures (like Git for your cloud storage), without ad-hoc
-  conventions. Leverage Git and Git hosting features such as change history,
-  branching, pull requests, reviews, and even continuous deployment of ML
-  models.
+  conventions. Leverage Git and Git hosting features such as commits, branching,
+  pull requests, reviews, and even continuous deployment of ML models.
 - Reusability: Reproduce and organize _feature stores_ with a simple CLI
   (`dvc get` and `dvc import` commands, similar to software package management
   systems like `pip`).
@@ -49,29 +43,30 @@ The advantages of using a DVC **data registry** project are:
 
 ## Example
 
-A dataset we use for several of our examples and tutorials is one containing
-2800 images of cats and dogs. We partitioned the dataset in two for our
-[Versioning Tutorial](/doc/tutorials/versioning), and backed up the parts on a
-storage server, downloading them with `wget` in our examples. This setup was
-then revised to download the dataset with `dvc get` instead, so we created the
-[dataset-registry](https://github.com/iterative/dataset-registry)) repository, a
-<abbr>DVC project</abbr> hosted on GitHub, to version the dataset (see its
+A dataset we commonly use for several of our examples and tutorials contains
+2800 images of cats and dogs, which was split it in two for our
+[Versioning Tutorial](/doc/tutorials/versioning). Originally, the parts were
+backed up on a storage server, and downloaded with
+[`wget`](https://www.gnu.org/software/wget/). This was then revised in order to
+download the parts with `dvc get` instead, so we created the
+[dataset-registry](https://github.com/iterative/dataset-registry)
+<abbr>project</abbr> to version the dataset (in the
 [`tutorial/ver`](https://github.com/iterative/dataset-registry/tree/master/tutorial/ver)
 directory).
 
-However, there are a few problems with the way this dataset is structured. Most
-importantly, this single dataset is tracked by 2 different
-[DVC-files](/doc/user-guide/dvc-file-format), instead of 2 versions of the same
-one, which would better reflect the intentions of this dataset... Fortunately,
-we have also prepared an improved alternative in the
+However, there's a few problems with the way that dataset is versioned. Most
+importantly, this split dataset is tracked by 2 different
+[DVC-files](/doc/user-guide/dvc-file-format) (one for each part), instead of 2
+versions of a single DVC-file. An initial version could have the first part
+only, while an update would have the entire, unified dataset. Fortunately, we
+have also prepared this improved alternative in the
 [`use-cases/`](https://github.com/iterative/dataset-registry/tree/master/use-cases)
 directory of the same <abbr>DVC repository</abbr>.
 
-To create a
-[first version](https://github.com/iterative/dataset-registry/tree/cats-dogs-v1/use-cases)
+To create the
+[initial version](https://github.com/iterative/dataset-registry/tree/cats-dogs-v1/use-cases)
 of our dataset, we extracted the first part into the `use-cases/cats-dogs`
-directory (illustrated below), and ran `dvc add use-cases/cats-dogs` to
-[track the entire directory](https://dvc.org/doc/command-reference/add#example-directory).
+directory, illustrated below:
 
 ```dvc
 $ tree use-cases/cats-dogs --filelimit 3
@@ -85,7 +80,10 @@ use-cases/cats-dogs
         └── dogs [400 image files]
 ```
 
-In a local DVC project, we could have obtained this dataset at this point with
+Then we ran `dvc add use-cases/cats-dogs` to
+[track the entire directory](https://dvc.org/doc/command-reference/add#example-directory).
+
+At this point, we could have obtained this dataset in another DVC project with
 the following command:
 
 ```dvc
@@ -95,15 +93,16 @@ $ dvc import [email protected]:iterative/dataset-registry.git \
 
 > Note that unlike `dvc get`, which can be used from any directory, `dvc import`
 > always needs to run from an [initialized](/doc/command-reference/init) DVC
-> project.
+> project. Remember also that with both commands, the data comes from the source
+> project's remote storage, not from the Git repository itself.
 
 <details>
 
 ### Expand for actionable command (optional)
 
 The command above is meant for informational purposes only. If you actually run
-it in a DVC project, although it should work, it will import the latest version
-of `use-cases/cats-dogs` from `dataset-registry`. The following command would
+it, although it will work, it will import the latest version of
+`use-cases/cats-dogs` from `dataset-registry`. The following command would
 actually bring in the version in question:
 
 ```dvc
@@ -117,54 +116,52 @@ See the `dvc import` command reference for more details on the `--rev`
 
 </details>
 
-Importing keeps the connection between the local project and the source data
-registry where we are downloading the dataset from. This is achieved by creating
-a particular kind of [DVC-file](/doc/user-guide/dvc-file-format) that uses the
-`repo` field (a.k.a. _import stage_). (This file can be used for versioning the
-import with Git.)
+Importing keeps the connection between the local <abbr>project</abbr> and the
+data source (registry <abbr>repository</abbr>). This is achieved by creating a
+particular kind of [DVC-file](/doc/user-guide/dvc-file-format) (a.k.a. _import
+stage_) that includes a `repo` field. (This file can be used staged and
+committed with Git.)
 
 > For a sample DVC-file resulting from `dvc import`, refer to
 > [this example](/doc/command-reference/import#example-data-registry).
 
-Back in our **dataset-registry** project, a
+Back in our **dataset-registry** project, the
 [second version](https://github.com/iterative/dataset-registry/tree/cats-dogs-v2/use-cases)
 of our dataset was created by extracting the second part, with 1000 additional
-images (500 cats, 500 dogs), into the same directory structure. Then, we simply
-ran `dvc add use-cases/cats-dogs` again.
+images (500 cats, 500 dogs) on top of the existing directory structure. Then, we
+simply ran `dvc add use-cases/cats-dogs` again.
 
-In our local project, all we have to do in order to obtain this latest version
-of the dataset is to run:
+All we would have to do in order to obtain this latest version in another
+project where the first version was previously imported, is to run:
 
 ```dvc
 $ dvc update cats-dogs.dvc
 ```
 
-This is possible because of the connection that the import stage saved among
-local and source projects, as explained earlier.
-
 <details>
 
 ### Expand for actionable command (optional)
 
-As with the previous hidden note, actually trying the commands above should
-produced the expected results, but not for obvious reasons. Specifically, the
-initial `dvc import` command would have already obtained the latest version of
-the dataset (as noted before), so this `dvc update` is unnecessary and won't
-have an effect.
+As with the previous hidden note, actually trying the command above will produce
+the desired results, but not for obvious reasons. The initial `dvc import`
+command would have already obtained the latest version of the dataset (as noted
+before), so this `dvc update` is unnecessary and won't have any effect.
 
-If you ran the `dvc import --rev cats-dogs-v1 ...` command instead, its import
-stage (DVC-file) would be fixed to that Git tag (`cats-dogs-v1`). In order to
-update it, do not use `dvc update`. Instead, re-import the data by using the
-original import command (without `--rev`). Refer to
-[this example](http://localhost:3000/doc/command-reference/import#example-fixed-revisions-re-importing)
-for more information.
+And if you ran the `dvc import --rev cats-dogs-v1 ...` command instead, its
+import stage (DVC-file) would be
+[fixed to that revision](/doc/command-reference/import#example-fixed-revisions-re-importing)
+(`cats-dogs-v1` tag), so `dvc update` would also be ineffective. In order to
+actually "update" it, re-import the data instead, by now running the initial
+import command (the one without `--rev`):
 
-</details>
+```dvc
+$ dvc import [email protected]:iterative/dataset-registry.git \
+             use-cases/cats-dogs
+```
 
-This downloads new and changed files in `cats-dogs/` from the source project,
-and updates the metadata in the import stage DVC-file.
+</details>
 
-As an extra detail, notice that so far our local project is working only with a
-local <abbr>cache</abbr>. It has no need to setup a
-[remotes](/doc/command-reference/remote) to [pull](/doc/command-reference/pull)
-or [push](/doc/command-reference/push) this dataset.
+This is possible because of the connection that the import stage saved among
+local and source projects, as explained earlier. The update downloads new and
+changed files in `cats-dogs/` based on the source project, and updates the
+metadata in the import stage DVC-file.