Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Removed condenscened language from settings #1396

Closed
wants to merge 9 commits into from
Closed
6 changes: 3 additions & 3 deletions content/blog/2019-03-05-march-19-dvc-heartbeat.md
Original file line number Diff line number Diff line change
Expand Up @@ -144,9 +144,9 @@ liking and see your data files listed there.
### Q: [Managing data and pipelines with DVC on HDFS](https://discordapp.com/channels/485586884165107732/485596304961962003/545562334983356426)

With DVC, you could connect your data sources from HDFS with your pipeline in
your local project, by simply specifying it as an external dependency. For
example let’s say your script `process.cmd` works on an input file on HDFS and
then downloads a result to your local workspace, then with DVC it could look
your local project, by specifying it as an external dependency. For example
let’s say your script `process.cmd` works on an input file on HDFS and then
downloads a result to your local workspace, then with DVC it could look
something like:

```dvc
Expand Down
6 changes: 3 additions & 3 deletions content/blog/2019-05-21-may-19-dvc-heartbeat.md
Original file line number Diff line number Diff line change
Expand Up @@ -256,9 +256,9 @@ $ dvc metrics show metrics.json \

There are a few options to add a new dependency:

- simply opening a file with your favorite editor and adding a dependency there
without md5. DVC will understand that that stage is changed and will re-run
and re-calculate md5 checksums during the next DVC repro;
- opening a file with your favorite editor and adding a dependency there without
md5. DVC will understand that that stage is changed and will re-run and
re-calculate md5 checksums during the next DVC repro;

- use `dvc run --no-exec` is another option. It will rewrite the existing file
for you with new parameters.
Expand Down
18 changes: 9 additions & 9 deletions content/blog/2020-02-17-a-public-reddit-dataset.md
Original file line number Diff line number Diff line change
Expand Up @@ -110,7 +110,7 @@ you'll need to [install DVC](https://dvc.org/doc/install); one of the simplest
ways is `pip install dvc`.

Say you have a directory on your local machine where you plan to build some
analysis scripts. Simply run
analysis scripts. You run:

```dvc
$ dvc get https://github.com/iterative/aita_dataset \
Expand Down Expand Up @@ -225,7 +225,7 @@ $ dvc import https://github.com/iterative/aita_dataset \
```

Then, because the dataset in your workspace is linked to our dataset repository,
you can update it by simply running:
you can update it by running:

```dvc
$ dvc update aita_clean.csv
Expand Down Expand Up @@ -317,10 +317,10 @@ refine these existing methods. And there’s almost certainly room to push the
state of the art in asshole detection!

If you're interested in learning more about using Reddit data, check out
[pushshift.io](https://pushshift.io/), a database that contains basically all of
Reddit's content (so why make this dataset? I wanted to remove some of the
barriers to analyzing text from r/AmItheAsshole by providing an
already-processed and cleaned version of the data that can be downloaded with a
line of code; pushshift takes some work). You might use pushshift's API and/or
praw to augment this dataset in some way- perhaps to compare activity in this
subreddit with another, or broader patterns on Reddit.
[pushshift.io](https://pushshift.io/), a database that contains all of Reddit's
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one is also trying to say something important. I still don't see many of these words as condescending but maybe I'm still wrong. I would agree to replace "basically" here with something more specific (as explained in #1396 (comment)).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, let's roll this one back and not touch blog posts to make this easier. Blog posts have specific authors and the tone in their posts is up to them 🙂

content (so why make this dataset? I wanted to remove some of the barriers to
analyzing text from r/AmItheAsshole by providing an already-processed and
cleaned version of the data that can be downloaded with a line of code;
pushshift takes some work). You might use pushshift's API and/or praw to augment
this dataset in some way- perhaps to compare activity in this subreddit with
another, or broader patterns on Reddit.
2 changes: 1 addition & 1 deletion content/blog/2020-04-16-april-20-community-gems.md
Original file line number Diff line number Diff line change
Expand Up @@ -106,7 +106,7 @@ $ dvc pull process_data_stage.dvc
You can also use `dvc pull` at the level of individual files. This might be
needed if your DVC pipeline file creates 10 outputs, for example, and you only
want to pull one (say, `model.pkl`, your trained model) from remote DVC storage.
You'd simply run
You'd run:

```dvc
$ dvc pull model.pkl
Expand Down
4 changes: 2 additions & 2 deletions content/docs/api-reference/open.md
Original file line number Diff line number Diff line change
Expand Up @@ -113,7 +113,7 @@ should handle the event-driven parsing of the document in this case.) This
increases the performance of the code (minimizing memory usage), and is
typically faster than loading the whole data into memory.

> If you just needed to load the complete file contents into memory, you can use
> If you wanted to load the complete file contents into memory, you can use
> `dvc.api.read()` instead:
>
> ```py
Expand All @@ -127,7 +127,7 @@ typically faster than loading the whole data into memory.

## Example: Accessing private repos

This is just a matter of using the right `repo` argument, for example an SSH URL
The key for this is to use the right `repo` argument, for example an SSH URL
jorgeorpinel marked this conversation as resolved.
Show resolved Hide resolved
(requires that the
[credentials are configured](https://help.github.com/en/github/authenticating-to-github/connecting-to-github-with-ssh)
locally):
Expand Down
8 changes: 4 additions & 4 deletions content/docs/command-reference/checkout.md
Original file line number Diff line number Diff line change
Expand Up @@ -102,8 +102,8 @@ be pulled from remote storage using `dvc pull`.

## Examples

Let's employ a simple <abbr>workspace</abbr> with some data, code, ML models,
pipeline stages, such as the <abbr>DVC project</abbr> created for the
Let's create a <abbr>workspace</abbr> with some data, code, ML models, pipeline
stages, such as the <abbr>DVC project</abbr> created for the
[Get Started](/doc/tutorials/get-started). Then we can see what happens with
`git checkout` and `dvc checkout` as we switch from tag to tag.

Expand Down Expand Up @@ -151,8 +151,8 @@ baseline-experiment <- First simple version of the model
bigrams-experiment <- Uses bigrams to improve the model
```

We can now just run `dvc checkout` that will update the most recent `model.pkl`,
`data.xml`, and other files that are tracked by DVC. The model file hash
We can now run `dvc checkout` to update the most recent `model.pkl`, `data.xml`,
and other files that are tracked by DVC. The model file hash
`662eb7f64216d9c2c1088d0a5e2c6951` will be used in the `train.dvc`
[stage file](/doc/command-reference/run):

Expand Down
6 changes: 3 additions & 3 deletions content/docs/command-reference/commit.md
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@ further detailed below.
other change that doesn't cause changed stage outputs. However, DVC will
notice that some <abbr>dependencies</abbr> and have changed, and expect you to
reproduce the whole pipeline. If you're sure no pipeline results would change,
just use `dvc commit` to force update the related DVC-files and cache.
use `dvc commit` to force update the related DVC-files and cache.

Let's take a look at what is happening in the first scenario closely. Normally
DVC commands like `dvc add`, `dvc repro` or `dvc run` commit the data to the
Expand Down Expand Up @@ -95,8 +95,8 @@ reproducibility in those cases.

## Examples

Let's employ a simple <abbr>workspace</abbr> with some data, code, ML models,
pipeline stages, such as the <abbr>DVC project</abbr> created for the
Let's create a <abbr>workspace</abbr> with some data, code, ML models, pipeline
stages, such as the <abbr>DVC project</abbr> created for the
[Get Started](/doc/tutorials/get-started). Then we can see what happens with
`git commit` and `dvc commit` in different situations.

Expand Down
4 changes: 2 additions & 2 deletions content/docs/command-reference/import-url.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ external data source changes. Example scenarios:
- A shared dataset on a remote storage that is managed and updated outside DVC.

> Note that `dvc get-url` corresponds to the first step this command performs
> (just download the file or directory).
> (just downloads the file or directory).

The `dvc import-url` command helps the user create such an external data
dependency without having to manually copying files from the supported remote
Expand Down Expand Up @@ -78,7 +78,7 @@ Specific explanations:
is necessary to track if the specified remote file (URL) changed to download
it again.

- `remote://myremote/path/to/file` notation just means that a DVC
- `remote://myremote/path/to/file` notation means that a DVC
[remote](/doc/command-reference/remote) `myremote` is defined and when DVC is
running. DVC automatically expands this URL into a regular S3, SSH, GS, etc
URL by appending `/path/to/file` to the `myremote`'s configured base path.
Expand Down
2 changes: 1 addition & 1 deletion content/docs/command-reference/install.md
Original file line number Diff line number Diff line change
Expand Up @@ -262,7 +262,7 @@ matching what is referenced by the DVC-files.
To follow this example, start with the same workspace as before, making sure it
is not in a _detached HEAD_ state by running `git checkout master`.

If we simply edit one of the code files:
Let's imagine we have modified the file `src/featurization.py`:

```dvc
$ vi src/featurization.py
Expand Down
4 changes: 2 additions & 2 deletions content/docs/command-reference/list.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,8 +19,8 @@ positional arguments:
DVC, by effectively replacing data files, models, directories with DVC-files
(`.dvc`), hides actual locations and names. This means that you don't see data
files when you browse a <abbr>DVC repository</abbr> on Git hosting (e.g.
Github), you just see the DVC-files. This makes it hard to navigate the project
to find <abbr>data artifacts</abbr> for use with `dvc get`, `dvc import`, or
Github), you see the DVC-files. This makes it hard to navigate the project to
find <abbr>data artifacts</abbr> for use with `dvc get`, `dvc import`, or
`dvc.api`.

`dvc list` prints a virtual view of a DVC repository, as if files and
Expand Down
2 changes: 1 addition & 1 deletion content/docs/command-reference/metrics/show.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ compares them with a previous version.
## Options

- `-a`, `--all-branches` - print metric file contents in all Git branches
instead of just those present in the current workspace. It can be used to
instead of using those present in the current workspace. It can be used to
compare different experiments. Note that this can be combined with `-T` below,
for example using the `-aT` flag.

Expand Down
8 changes: 4 additions & 4 deletions content/docs/command-reference/pull.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ The default remote is used (see `dvc config core.remote`) unless the `--remote`
option is used. See `dvc remote` for more information on how to configure a
remote.

With no arguments, just `dvc pull` or `dvc pull --remote <name>`, it downloads
With no arguments, use `dvc pull` or `dvc pull --remote <name>`, it downloads
only the files (or directories) missing from the workspace by searching all
[DVC-files](/doc/user-guide/dvc-file-format) currently in the
<abbr>project</abbr>. It will not download files associated with earlier commits
Expand All @@ -59,7 +59,7 @@ reflinks or hardlinks to put it in the workspace without copying. See
## Options

- `-a`, `--all-branches` - determines the files to download by examining
DVC-files in all Git branches instead of just those present in the current
DVC-files in all Git branches instead of those present in the current
workspace. It's useful if branches are used to track experiments or project
checkpoints. Note that this can be combined with `-T` below, for example using
the `-aT` flag.
Expand Down Expand Up @@ -94,7 +94,7 @@ reflinks or hardlinks to put it in the workspace without copying. See

- `-j <number>`, `--jobs <number>` - number of threads to run simultaneously to
handle the downloading of files from the remote. The default value is
`4 * cpu_count()`. For SSH remotes, the default is just `4`. Using more jobs
`4 * cpu_count()`. For SSH remotes, the default value is `4`. Using more jobs
may improve the total download speed if a combination of small and large files
are being fetched.

Expand Down Expand Up @@ -136,7 +136,7 @@ The workspace looks almost like in this
└── train.dvc
```

We can now just run `dvc pull` to download the most recent `data/data.xml`,
We can now run `dvc pull` to download the most recent `data/data.xml`,
`model.pkl`, and other DVC-tracked files into the <abbr>workspace</abbr>:

```dvc
Expand Down
10 changes: 5 additions & 5 deletions content/docs/command-reference/push.md
Original file line number Diff line number Diff line change
Expand Up @@ -54,9 +54,9 @@ none are specified on the command line nor in the configuration. The default
remote is used (see `dvc config core.remote`) unless the `--remote` option is
used. See `dvc remote` for more information on how to configure a remote.

With no arguments, just `dvc push` or `dvc push --remote REMOTE`, it uploads
only the files (or directories) that are new in the local repository to remote
storage. It will not upload files associated with earlier commits in the
With no arguments, `dvc push` or `dvc push --remote REMOTE`, it uploads only the
files (or directories) that are new in the local repository to remote storage.
It will not upload files associated with earlier commits in the
<abbr>repository</abbr> (if using Git), nor will it upload files that have not
changed.

Expand All @@ -73,7 +73,7 @@ to push.
## Options

- `-a`, `--all-branches` - determines the files to upload by examining DVC-files
in all Git branches instead of just those present in the current workspace.
in all Git branches instead of using files present in the current workspace.
It's useful if branches are used to track experiments or project checkpoints.
Note that this can be combined with `-T` below, for example using the `-aT`
flag.
Expand Down Expand Up @@ -103,7 +103,7 @@ to push.

- `-j <number>`, `--jobs <number>` - number of threads to run simultaneously to
handle the uploading of files from the remote. The default value is
`4 * cpu_count()`. For SSH remotes, the default is just `4`. Using more jobs
`4 * cpu_count()`. For SSH remotes, the default value is `4`. Using more jobs
may improve the total download speed if a combination of small and large files
are being fetched.

Expand Down
5 changes: 2 additions & 3 deletions content/docs/command-reference/remote/add.md
Original file line number Diff line number Diff line change
Expand Up @@ -197,9 +197,8 @@ $ dvc remote add -d myremote "azure://"

To start using a GDrive remote, fist add it with a
[valid URL format](/doc/user-guide/setup-google-drive-remote#url-format). Then
simply use any DVC command that needs it (e.g. `dvc pull`, `dvc fetch`,
`dvc push`), and follow the instructions to connect your Google Drive with DVC.
For example:
use any DVC command that needs it (e.g. `dvc pull`, `dvc fetch`, `dvc push`),
and follow the instructions to connect your Google Drive with DVC. For example:

```dvc
$ dvc remote add -d myremote gdrive://0AIac4JZqHhKmUk9PDA/dvcstore
Expand Down
9 changes: 4 additions & 5 deletions content/docs/command-reference/status.md
Original file line number Diff line number Diff line change
Expand Up @@ -107,11 +107,10 @@ workspace) is different from remote storage. Bringing the two into sync requires
(specified in the `core.remote` config option).

- `-a`, `--all-branches` - compares cache content against all Git branches
instead of just the current workspace. This basically runs the same status
command in every branch of this repo. The corresponding branches are shown in
the status output. Applies only if `--cloud` or a `-r` remote is specified.
Note that this can be combined with `-T` below, for example using the `-aT`
flag.
instead of the current workspace. This basically runs the same status command
in every branch of this repo. The corresponding branches are shown in the
status output. Applies only if `--cloud` or a `-r` remote is specified. Note
that this can be combined with `-T` below, for example using the `-aT` flag.

- `-T`, `--all-tags` - same as `-a` above, but applies to Git tags as well as
the workspace. Note that both options can be combined, for example using the
Expand Down
4 changes: 2 additions & 2 deletions content/docs/command-reference/update.md
Original file line number Diff line number Diff line change
Expand Up @@ -70,8 +70,8 @@ Importing 'model.pkl ([email protected]:iterative/example-get-started)'
As DVC mentions, the import stage (DVC-file) `model.pkl.dvc` is created. This
[stage file](/doc/command-reference/run) is frozen by default though, so to
[reproduce](/doc/command-reference/repro) it, we would need to run
`dvc unfreeze` on it first, then `dvc repro` (and `dvc freeze` again). Let's
just run `dvc update` on it instead:
`dvc unfreeze` on it first, then `dvc repro` (and `dvc freeze` again). Let's run
`dvc update` on it instead:

```dvc
$ dvc update model.pkl.dvc
Expand Down
8 changes: 4 additions & 4 deletions content/docs/tutorials/get-started/data-access.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,11 +25,11 @@ cats-dogs.dvc
The benefit of this command over browsing a Git hosting website is that the list
includes files and directories tracked by **both Git and DVC**.

## Just download it
## Download it

One way is to simply download the data with `dvc get`. This is useful when
working outside of a <abbr>DVC project</abbr> environment, for example in an
automated ML model deployment task:
One way is to download the data with `dvc get`. This is useful when working
outside of a <abbr>DVC project</abbr> environment, for example in an automated
ML model deployment task:

```dvc
$ dvc get https://github.com/iterative/dataset-registry \
Expand Down
8 changes: 4 additions & 4 deletions content/docs/tutorials/get-started/data-pipelines.md
Original file line number Diff line number Diff line change
Expand Up @@ -163,9 +163,9 @@ This would be a good point to commit the changes with Git. This includes any

## Reproduce

Imagine you're just cloning the <abbr>repository</abbr> created so far, in
another computer. It's extremely easy for anyone to reproduce the result
end-to-end, by using `dvc repro`.
Imagine you're cloning the <abbr>repository</abbr> created so far, in another
computer. It's extremely easy for anyone to reproduce the result end-to-end, by
using `dvc repro`.

<details>

Expand Down Expand Up @@ -198,7 +198,7 @@ executes the necessary commands to rebuild all the pipeline
## Visualize

Having built our pipeline, we need a good way to understand its structure.
Seeing a graph of connected stage files would help. DVC lets you do just that,
Seeing a graph of connected stage files would help. DVC lets you do that,
without leaving the terminal!

```dvc
Expand Down
4 changes: 2 additions & 2 deletions content/docs/tutorials/get-started/data-versioning.md
Original file line number Diff line number Diff line change
Expand Up @@ -228,8 +228,8 @@ after `git clone` and `git pull`.

### 👉 Expand to simulate a fresh clone of this repo

Let's just remove the directory added so far, both from <abbr>workspace</abbr>
and <abbr>cache</abbr>:
Let's remove the directory added so far, both from <abbr>workspace</abbr> and
<abbr>cache</abbr>:

```dvc
$ rm -f datadir .dvc/cache/a3/04afb96060aad90176268345e10355
Expand Down
2 changes: 1 addition & 1 deletion content/docs/tutorials/get-started/experiments.md
Original file line number Diff line number Diff line change
Expand Up @@ -139,7 +139,7 @@ back and forth. To find the best-performing experiment or track the progress,
described in one of the previous sections).

Let's run evaluate for the latest `bigrams` experiment we created earlier. It
mostly takes just running the `dvc repro`:
mostly takes running the `dvc repro`:

```dvc
$ git checkout master
Expand Down
10 changes: 5 additions & 5 deletions content/docs/tutorials/pipelines.md
Original file line number Diff line number Diff line change
Expand Up @@ -183,9 +183,9 @@ outs:
persist: false
```

Just like the DVC-file we created earlier with `dvc add`, this stage file uses
`md5` hashes (that point to the <abbr>cache</abbr>) to describe and version
control dependencies and outputs. Output `data/Posts.xml` file is saved as
Like the DVC-file we created earlier with `dvc add`, this stage file uses `md5`
hashes (that point to the <abbr>cache</abbr>) to describe and version control
dependencies and outputs. Output `data/Posts.xml` file is saved as
`.dvc/cache/a3/04afb96060aad90176268345e10355` and linked (or copied) to the
<abbr>workspace</abbr>, as well as added to `.gitignore`.

Expand Down Expand Up @@ -331,8 +331,8 @@ $ dvc metrics show

It's time to save our [pipeline](/doc/command-reference/pipeline). You can
confirm that we do not tack files or raw datasets with Git, by using the
`git status` command. We are just saving a snapshot of the DVC-files that
describe data, transformations (stages), and relationships between them.
`git status` command. We are saving a snapshot of the DVC-files that describe
data, transformations (stages), and relationships between them.

```dvc
$ git add *.dvc auc.metric data/.gitignore
Expand Down
2 changes: 1 addition & 1 deletion content/docs/understanding-dvc/how-it-works.md
Original file line number Diff line number Diff line change
Expand Up @@ -84,7 +84,7 @@
$ cd myrepo
$ git pull # download tracked data from remote storage
$ dvc checkout # checkout data files
$ ls -l data/ # You just got gigabytes of data through Git and DVC:
$ ls -l data/ # You downloaded gigabytes of data through Git and DVC:

total 1017488
-r-------- 2 501 staff 273M Jan 27 03:48 Posts-test.tsv
Expand Down
2 changes: 1 addition & 1 deletion content/docs/use-cases/shared-development-server.md
Original file line number Diff line number Diff line change
Expand Up @@ -103,7 +103,7 @@ $ git commit -m "process clean data"
$ git push
```

And now you can just as easily make their work appear in your workspace with:
And now you can make their previous work appear in your workspace with:

```dvc
$ git pull
Expand Down
4 changes: 2 additions & 2 deletions content/docs/user-guide/external-dependencies.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,8 +35,8 @@ directory.
## Examples

As examples, let's take a look at a [stage](/doc/command-reference/run) that
simply moves a local file from an external location, producing a `data.txt.dvc`
stage file (DVC-file).
moves a local file from an external location, producing a `data.txt.dvc` stage
file (DVC-file).

> Note that some of these commands use the `/home/shared` directory, typical in
> Linux distributions.
Expand Down
Loading