iterative · kurianbenoy · Jun 5, 2020 · Jun 5, 2020 · Jun 8, 2020 · Jun 5, 2020
diff --git a/content/blog/2019-03-05-march-19-dvc-heartbeat.md b/content/blog/2019-03-05-march-19-dvc-heartbeat.md
@@ -144,9 +144,9 @@ liking and see your data files listed there.
 ### Q: [Managing data and pipelines with DVC on HDFS](https://discordapp.com/channels/485586884165107732/485596304961962003/545562334983356426)
 
 With DVC, you could connect your data sources from HDFS with your pipeline in
-your local project, by simply specifying it as an external dependency. For
-example let’s say your script `process.cmd` works on an input file on HDFS and
-then downloads a result to your local workspace, then with DVC it could look
+your local project, by specifying it as an external dependency. For example
+let’s say your script `process.cmd` works on an input file on HDFS and then
+downloads a result to your local workspace, then with DVC it could look
 something like:
 
 ```dvc

diff --git a/content/blog/2019-05-21-may-19-dvc-heartbeat.md b/content/blog/2019-05-21-may-19-dvc-heartbeat.md
@@ -256,9 +256,9 @@ $ dvc metrics show metrics.json \
 
 There are a few options to add a new dependency:
 
-- simply opening a file with your favorite editor and adding a dependency there
-  without md5. DVC will understand that that stage is changed and will re-run
-  and re-calculate md5 checksums during the next DVC repro;
+- opening a file with your favorite editor and adding a dependency there without
+  md5. DVC will understand that that stage is changed and will re-run and
+  re-calculate md5 checksums during the next DVC repro;
 
 - use `dvc run --no-exec` is another option. It will rewrite the existing file
   for you with new parameters.

diff --git a/content/blog/2020-02-17-a-public-reddit-dataset.md b/content/blog/2020-02-17-a-public-reddit-dataset.md
@@ -110,7 +110,7 @@ you'll need to [install DVC](https://dvc.org/doc/install); one of the simplest
 ways is `pip install dvc`.
 
 Say you have a directory on your local machine where you plan to build some
-analysis scripts. Simply run
+analysis scripts. You run:
 
 ```dvc
 $ dvc get https://github.com/iterative/aita_dataset \
@@ -225,7 +225,7 @@ $ dvc import https://github.com/iterative/aita_dataset \
 ```
 
 Then, because the dataset in your workspace is linked to our dataset repository,
-you can update it by simply running:
+you can update it by running:
 
 ```dvc
 $ dvc update aita_clean.csv
@@ -317,10 +317,10 @@ refine these existing methods. And there’s almost certainly room to push the
 state of the art in asshole detection!
 
 If you're interested in learning more about using Reddit data, check out
-[pushshift.io](https://pushshift.io/), a database that contains basically all of
-Reddit's content (so why make this dataset? I wanted to remove some of the
-barriers to analyzing text from r/AmItheAsshole by providing an
-already-processed and cleaned version of the data that can be downloaded with a
-line of code; pushshift takes some work). You might use pushshift's API and/or
-praw to augment this dataset in some way- perhaps to compare activity in this
-subreddit with another, or broader patterns on Reddit.
+[pushshift.io](https://pushshift.io/), a database that contains all of Reddit's
+content (so why make this dataset? I wanted to remove some of the barriers to
+analyzing text from r/AmItheAsshole by providing an already-processed and
+cleaned version of the data that can be downloaded with a line of code;
+pushshift takes some work). You might use pushshift's API and/or praw to augment
+this dataset in some way- perhaps to compare activity in this subreddit with
+another, or broader patterns on Reddit.
diff --git a/content/blog/2020-04-16-april-20-community-gems.md b/content/blog/2020-04-16-april-20-community-gems.md
@@ -106,7 +106,7 @@ $ dvc pull process_data_stage.dvc
 You can also use `dvc pull` at the level of individual files. This might be
 needed if your DVC pipeline file creates 10 outputs, for example, and you only
 want to pull one (say, `model.pkl`, your trained model) from remote DVC storage.
-You'd simply run
+You'd run:
 
 ```dvc
 $ dvc pull model.pkl

diff --git a/content/docs/api-reference/open.md b/content/docs/api-reference/open.md
@@ -113,7 +113,7 @@ should handle the event-driven parsing of the document in this case.) This
 increases the performance of the code (minimizing memory usage), and is
 typically faster than loading the whole data into memory.
 
-> If you just needed to load the complete file contents into memory, you can use
+> If you wanted to load the complete file contents into memory, you can use
 > `dvc.api.read()` instead:
 >
 > ```py
@@ -127,7 +127,7 @@ typically faster than loading the whole data into memory.
 
 ## Example: Accessing private repos
 
-This is just a matter of using the right `repo` argument, for example an SSH URL
+The key for this is to use the right `repo` argument, for example an SSH URL
 (requires that the
 [credentials are configured](https://help.github.com/en/github/authenticating-to-github/connecting-to-github-with-ssh)
 locally):

diff --git a/content/docs/command-reference/checkout.md b/content/docs/command-reference/checkout.md
@@ -102,8 +102,8 @@ be pulled from remote storage using `dvc pull`.
 
 ## Examples
 
-Let's employ a simple <abbr>workspace</abbr> with some data, code, ML models,
-pipeline stages, such as the <abbr>DVC project</abbr> created for the
+Let's create a <abbr>workspace</abbr> with some data, code, ML models, pipeline
+stages, such as the <abbr>DVC project</abbr> created for the
 [Get Started](/doc/tutorials/get-started). Then we can see what happens with
 `git checkout` and `dvc checkout` as we switch from tag to tag.
 
@@ -151,8 +151,8 @@ baseline-experiment     <- First simple version of the model
 bigrams-experiment      <- Uses bigrams to improve the model
 ```
 
-We can now just run `dvc checkout` that will update the most recent `model.pkl`,
-`data.xml`, and other files that are tracked by DVC. The model file hash
+We can now run `dvc checkout` to update the most recent `model.pkl`, `data.xml`,
+and other files that are tracked by DVC. The model file hash
 `662eb7f64216d9c2c1088d0a5e2c6951` will be used in the `train.dvc`
 [stage file](/doc/command-reference/run):
 

diff --git a/content/docs/command-reference/commit.md b/content/docs/command-reference/commit.md
@@ -44,7 +44,7 @@ further detailed below.
   other change that doesn't cause changed stage outputs. However, DVC will
   notice that some <abbr>dependencies</abbr> and have changed, and expect you to
   reproduce the whole pipeline. If you're sure no pipeline results would change,
-  just use `dvc commit` to force update the related DVC-files and cache.
+  use `dvc commit` to force update the related DVC-files and cache.
 
 Let's take a look at what is happening in the first scenario closely. Normally
 DVC commands like `dvc add`, `dvc repro` or `dvc run` commit the data to the
@@ -95,8 +95,8 @@ reproducibility in those cases.
 
 ## Examples
 
-Let's employ a simple <abbr>workspace</abbr> with some data, code, ML models,
-pipeline stages, such as the <abbr>DVC project</abbr> created for the
+Let's create a <abbr>workspace</abbr> with some data, code, ML models, pipeline
+stages, such as the <abbr>DVC project</abbr> created for the
 [Get Started](/doc/tutorials/get-started). Then we can see what happens with
 `git commit` and `dvc commit` in different situations.
 

diff --git a/content/docs/command-reference/import-url.md b/content/docs/command-reference/import-url.md
@@ -29,7 +29,7 @@ external data source changes. Example scenarios:
 - A shared dataset on a remote storage that is managed and updated outside DVC.
 
 > Note that `dvc get-url` corresponds to the first step this command performs
-> (just download the file or directory).
+> (just downloads the file or directory).
 
 The `dvc import-url` command helps the user create such an external data
 dependency without having to manually copying files from the supported remote
@@ -78,7 +78,7 @@ Specific explanations:
   is necessary to track if the specified remote file (URL) changed to download
   it again.
 
-- `remote://myremote/path/to/file` notation just means that a DVC
+- `remote://myremote/path/to/file` notation means that a DVC
   [remote](/doc/command-reference/remote) `myremote` is defined and when DVC is
   running. DVC automatically expands this URL into a regular S3, SSH, GS, etc
   URL by appending `/path/to/file` to the `myremote`'s configured base path.

diff --git a/content/docs/command-reference/install.md b/content/docs/command-reference/install.md
@@ -262,7 +262,7 @@ matching what is referenced by the DVC-files.
 To follow this example, start with the same workspace as before, making sure it
 is not in a _detached HEAD_ state by running `git checkout master`.
 
-If we simply edit one of the code files:
+Let's imagine we have modified the file `src/featurization.py`:
 
 ```dvc
 $ vi src/featurization.py

diff --git a/content/docs/command-reference/list.md b/content/docs/command-reference/list.md
@@ -19,8 +19,8 @@ positional arguments:
 DVC, by effectively replacing data files, models, directories with DVC-files
 (`.dvc`), hides actual locations and names. This means that you don't see data
 files when you browse a <abbr>DVC repository</abbr> on Git hosting (e.g.
-Github), you just see the DVC-files. This makes it hard to navigate the project
-to find <abbr>data artifacts</abbr> for use with `dvc get`, `dvc import`, or
+Github), you see the DVC-files. This makes it hard to navigate the project to
+find <abbr>data artifacts</abbr> for use with `dvc get`, `dvc import`, or
 `dvc.api`.
 
 `dvc list` prints a virtual view of a DVC repository, as if files and

diff --git a/content/docs/command-reference/metrics/show.md b/content/docs/command-reference/metrics/show.md
@@ -32,7 +32,7 @@ compares them with a previous version.
 ## Options
 
 - `-a`, `--all-branches` - print metric file contents in all Git branches
-  instead of just those present in the current workspace. It can be used to
+  instead of using those present in the current workspace. It can be used to
   compare different experiments. Note that this can be combined with `-T` below,
   for example using the `-aT` flag.
 

diff --git a/content/docs/command-reference/pull.md b/content/docs/command-reference/pull.md
@@ -35,7 +35,7 @@ The default remote is used (see `dvc config core.remote`) unless the `--remote`
 option is used. See `dvc remote` for more information on how to configure a
 remote.
 
-With no arguments, just `dvc pull` or `dvc pull --remote <name>`, it downloads
+With no arguments, use `dvc pull` or `dvc pull --remote <name>`, it downloads
 only the files (or directories) missing from the workspace by searching all
 [DVC-files](/doc/user-guide/dvc-file-format) currently in the
 <abbr>project</abbr>. It will not download files associated with earlier commits
@@ -59,7 +59,7 @@ reflinks or hardlinks to put it in the workspace without copying. See
 ## Options
 
 - `-a`, `--all-branches` - determines the files to download by examining
-  DVC-files in all Git branches instead of just those present in the current
+  DVC-files in all Git branches instead of those present in the current
   workspace. It's useful if branches are used to track experiments or project
   checkpoints. Note that this can be combined with `-T` below, for example using
   the `-aT` flag.
@@ -94,7 +94,7 @@ reflinks or hardlinks to put it in the workspace without copying. See
 
 - `-j <number>`, `--jobs <number>` - number of threads to run simultaneously to
   handle the downloading of files from the remote. The default value is
-  `4 * cpu_count()`. For SSH remotes, the default is just `4`. Using more jobs
+  `4 * cpu_count()`. For SSH remotes, the default value is `4`. Using more jobs
   may improve the total download speed if a combination of small and large files
   are being fetched.
 
@@ -136,7 +136,7 @@ The workspace looks almost like in this
 └── train.dvc
 ```
 
-We can now just run `dvc pull` to download the most recent `data/data.xml`,
+We can now run `dvc pull` to download the most recent `data/data.xml`,
 `model.pkl`, and other DVC-tracked files into the <abbr>workspace</abbr>:
 
 ```dvc

diff --git a/content/docs/command-reference/push.md b/content/docs/command-reference/push.md
@@ -54,9 +54,9 @@ none are specified on the command line nor in the configuration. The default
 remote is used (see `dvc config core.remote`) unless the `--remote` option is
 used. See `dvc remote` for more information on how to configure a remote.
 
-With no arguments, just `dvc push` or `dvc push --remote REMOTE`, it uploads
-only the files (or directories) that are new in the local repository to remote
-storage. It will not upload files associated with earlier commits in the
+With no arguments, `dvc push` or `dvc push --remote REMOTE`, it uploads only the
+files (or directories) that are new in the local repository to remote storage.
+It will not upload files associated with earlier commits in the
 <abbr>repository</abbr> (if using Git), nor will it upload files that have not
 changed.
 
@@ -73,7 +73,7 @@ to push.
 ## Options
 
 - `-a`, `--all-branches` - determines the files to upload by examining DVC-files
-  in all Git branches instead of just those present in the current workspace.
+  in all Git branches instead of using files present in the current workspace.
   It's useful if branches are used to track experiments or project checkpoints.
   Note that this can be combined with `-T` below, for example using the `-aT`
   flag.
@@ -103,7 +103,7 @@ to push.
 
 - `-j <number>`, `--jobs <number>` - number of threads to run simultaneously to
   handle the uploading of files from the remote. The default value is
-  `4 * cpu_count()`. For SSH remotes, the default is just `4`. Using more jobs
+  `4 * cpu_count()`. For SSH remotes, the default value is `4`. Using more jobs
   may improve the total download speed if a combination of small and large files
   are being fetched.
 

diff --git a/content/docs/command-reference/remote/add.md b/content/docs/command-reference/remote/add.md
@@ -197,9 +197,8 @@ $ dvc remote add -d myremote "azure://"
 
 To start using a GDrive remote, fist add it with a
 [valid URL format](/doc/user-guide/setup-google-drive-remote#url-format). Then
-simply use any DVC command that needs it (e.g. `dvc pull`, `dvc fetch`,
-`dvc push`), and follow the instructions to connect your Google Drive with DVC.
-For example:
+use any DVC command that needs it (e.g. `dvc pull`, `dvc fetch`, `dvc push`),
+and follow the instructions to connect your Google Drive with DVC. For example:
 
 ```dvc
 $ dvc remote add -d myremote gdrive://0AIac4JZqHhKmUk9PDA/dvcstore

diff --git a/content/docs/command-reference/status.md b/content/docs/command-reference/status.md
@@ -107,11 +107,10 @@ workspace) is different from remote storage. Bringing the two into sync requires
   (specified in the `core.remote` config option).
 
 - `-a`, `--all-branches` - compares cache content against all Git branches
-  instead of just the current workspace. This basically runs the same status
-  command in every branch of this repo. The corresponding branches are shown in
-  the status output. Applies only if `--cloud` or a `-r` remote is specified.
-  Note that this can be combined with `-T` below, for example using the `-aT`
-  flag.
+  instead of the current workspace. This basically runs the same status command
+  in every branch of this repo. The corresponding branches are shown in the
+  status output. Applies only if `--cloud` or a `-r` remote is specified. Note
+  that this can be combined with `-T` below, for example using the `-aT` flag.
 
 - `-T`, `--all-tags` - same as `-a` above, but applies to Git tags as well as
   the workspace. Note that both options can be combined, for example using the

diff --git a/content/docs/command-reference/update.md b/content/docs/command-reference/update.md
@@ -70,8 +70,8 @@ Importing 'model.pkl ([email protected]:iterative/example-get-started)'
 As DVC mentions, the import stage (DVC-file) `model.pkl.dvc` is created. This
 [stage file](/doc/command-reference/run) is frozen by default though, so to
 [reproduce](/doc/command-reference/repro) it, we would need to run
-`dvc unfreeze` on it first, then `dvc repro` (and `dvc freeze` again). Let's
-just run `dvc update` on it instead:
+`dvc unfreeze` on it first, then `dvc repro` (and `dvc freeze` again). Let's run
+`dvc update` on it instead:
 
 ```dvc
 $ dvc update model.pkl.dvc

diff --git a/content/docs/tutorials/get-started/data-access.md b/content/docs/tutorials/get-started/data-access.md
@@ -25,11 +25,11 @@ cats-dogs.dvc
 The benefit of this command over browsing a Git hosting website is that the list
 includes files and directories tracked by **both Git and DVC**.
 
-## Just download it
+## Download it
 
-One way is to simply download the data with `dvc get`. This is useful when
-working outside of a <abbr>DVC project</abbr> environment, for example in an
-automated ML model deployment task:
+One way is to download the data with `dvc get`. This is useful when working
+outside of a <abbr>DVC project</abbr> environment, for example in an automated
+ML model deployment task:
 
 ```dvc
 $ dvc get https://github.com/iterative/dataset-registry \

diff --git a/content/docs/tutorials/get-started/data-pipelines.md b/content/docs/tutorials/get-started/data-pipelines.md
@@ -163,9 +163,9 @@ This would be a good point to commit the changes with Git. This includes any
 
 ## Reproduce
 
-Imagine you're just cloning the <abbr>repository</abbr> created so far, in
-another computer. It's extremely easy for anyone to reproduce the result
-end-to-end, by using `dvc repro`.
+Imagine you're cloning the <abbr>repository</abbr> created so far, in another
+computer. It's extremely easy for anyone to reproduce the result end-to-end, by
+using `dvc repro`.
 
 <details>
 
@@ -198,7 +198,7 @@ executes the necessary commands to rebuild all the pipeline
 ## Visualize
 
 Having built our pipeline, we need a good way to understand its structure.
-Seeing a graph of connected stage files would help. DVC lets you do just that,
+Seeing a graph of connected stage files would help. DVC lets you do that,
 without leaving the terminal!
 
 ```dvc

diff --git a/content/docs/tutorials/get-started/data-versioning.md b/content/docs/tutorials/get-started/data-versioning.md
@@ -228,8 +228,8 @@ after `git clone` and `git pull`.
 
 ### 👉 Expand to simulate a fresh clone of this repo
 
-Let's just remove the directory added so far, both from <abbr>workspace</abbr>
-and <abbr>cache</abbr>:
+Let's remove the directory added so far, both from <abbr>workspace</abbr> and
+<abbr>cache</abbr>:
 
 ```dvc
 $ rm -f datadir .dvc/cache/a3/04afb96060aad90176268345e10355

diff --git a/content/docs/tutorials/get-started/experiments.md b/content/docs/tutorials/get-started/experiments.md
@@ -139,7 +139,7 @@ back and forth. To find the best-performing experiment or track the progress,
 described in one of the previous sections).
 
 Let's run evaluate for the latest `bigrams` experiment we created earlier. It
-mostly takes just running the `dvc repro`:
+mostly takes running the `dvc repro`:
 
 ```dvc
 $ git checkout master

diff --git a/content/docs/tutorials/pipelines.md b/content/docs/tutorials/pipelines.md
@@ -183,9 +183,9 @@ outs:
     persist: false
 ```
 
-Just like the DVC-file we created earlier with `dvc add`, this stage file uses
-`md5` hashes (that point to the <abbr>cache</abbr>) to describe and version
-control dependencies and outputs. Output `data/Posts.xml` file is saved as
+Like the DVC-file we created earlier with `dvc add`, this stage file uses `md5`
+hashes (that point to the <abbr>cache</abbr>) to describe and version control
+dependencies and outputs. Output `data/Posts.xml` file is saved as
 `.dvc/cache/a3/04afb96060aad90176268345e10355` and linked (or copied) to the
 <abbr>workspace</abbr>, as well as added to `.gitignore`.
 
@@ -331,8 +331,8 @@ $ dvc metrics show
 
 It's time to save our [pipeline](/doc/command-reference/pipeline). You can
 confirm that we do not tack files or raw datasets with Git, by using the
-`git status` command. We are just saving a snapshot of the DVC-files that
-describe data, transformations (stages), and relationships between them.
+`git status` command. We are saving a snapshot of the DVC-files that describe
+data, transformations (stages), and relationships between them.
 
 ```dvc
 $ git add *.dvc auc.metric data/.gitignore

diff --git a/content/docs/understanding-dvc/how-it-works.md b/content/docs/understanding-dvc/how-it-works.md
@@ -84,7 +84,7 @@
   $ cd myrepo
   $ git pull # download tracked data from remote storage
   $ dvc checkout # checkout data files
-  $ ls -l data/ # You just got gigabytes of data through Git and DVC:
+  $ ls -l data/ # You downloaded gigabytes of data through Git and DVC:
 
   total 1017488
   -r--------  2 501  staff   273M Jan 27 03:48 Posts-test.tsv

diff --git a/content/docs/use-cases/shared-development-server.md b/content/docs/use-cases/shared-development-server.md
@@ -103,7 +103,7 @@ $ git commit -m "process clean data"
 $ git push
 ```
 
-And now you can just as easily make their work appear in your workspace with:
+And now you can make their previous work appear in your workspace with:
 
 ```dvc
 $ git pull

diff --git a/content/docs/user-guide/external-dependencies.md b/content/docs/user-guide/external-dependencies.md
@@ -35,8 +35,8 @@ directory.
 ## Examples
 
 As examples, let's take a look at a [stage](/doc/command-reference/run) that
-simply moves a local file from an external location, producing a `data.txt.dvc`
-stage file (DVC-file).
+moves a local file from an external location, producing a `data.txt.dvc` stage
+file (DVC-file).
 
 > Note that some of these commands use the `/home/shared` directory, typical in
 > Linux distributions.