Skip to content

Commit

Permalink
Regular updates (Feb 13) (#993)
Browse files Browse the repository at this point in the history
* cmd ref: add space after emoji
intro in https://github.com/iterative/dvc.org/pull/951/files

* term: under X control -> tracked by X (or similar) (1)
for #719

* term: under X control -> tracked by X (or similar) (2)
strings from core repo

* term: add to -> add with (DVC)

* cmd ref: update import -o behaviod explantation
per iterative/dvc/pull/3312
  • Loading branch information
jorgeorpinel authored Feb 15, 2020
1 parent eb190be commit 022680b
Show file tree
Hide file tree
Showing 25 changed files with 96 additions and 98 deletions.
2 changes: 1 addition & 1 deletion public/static/docs/changelog/0.35.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ improvements) we have done in the last few months:
download the whole project and reproduce all the models.

- **`dvc diff`** **command introduced**. Summary statistics for the
directory/file under the DVC control. How many files were
directory/file tracked by DVC. How many files were
added/deleted/modified/size:

```diff
Expand Down
20 changes: 10 additions & 10 deletions public/static/docs/command-reference/add.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# add

Take a data file or a directory under DVC control (by creating a corresponding
[DVC-file](/doc/user-guide/dvc-file-format)).
Track data files or directories with DVC, by creating a corresponding
[DVC-file](/doc/user-guide/dvc-file-format).

## Synopsis

Expand All @@ -15,13 +15,13 @@ positional arguments:

## Description

The `dvc add` command is analogous to the `git add` command. By default though,
an added file or directory is also committed to the <abbr>cache</abbr>. (Use the
`--no-commit` option to avoid this, and `dvc commit` as a separate step when
ready.)
The `dvc add` command is analogous to `git add`, in that it makes DVC aware of
the target data, as a first step to version it. Data added with DVC is also
committed to the <abbr>cache</abbr> (use the `--no-commit` option to avoid this,
and `dvc commit` as a separate step when needed).

The `targets` are files or directories to be places under DVC control. These are
turned into <abbr>outputs<abbr> (`outs` field) in a resulting
The `targets` are files or directories to be track with DVC. These are turned
into <abbr>outputs<abbr> (`outs` field) in a resulting
[DVC-file](/doc/user-guide/dvc-file-format). (See steps below for more details.)
Note that target data outside the current <abbr>workspace</abbr> is supported,
that becomes [external outputs](/doc/user-guide/managing-external-data).
Expand Down Expand Up @@ -115,7 +115,7 @@ reproducible.

## Example: Single file

Take a file under DVC control:
Track a file with DVC:

```dvc
$ dvc add data.xml
Expand Down Expand Up @@ -184,7 +184,7 @@ pics
└── dogs [more image files]
```

Taking a directory under DVC control as simple as with a single file:
Tracking a directory with DVC as simple as with a single file:

```dvc
$ dvc add pics
Expand Down
4 changes: 2 additions & 2 deletions public/static/docs/command-reference/checkout.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ positional arguments:
## Description

[DVC-files](/doc/user-guide/dvc-file-format) act as pointers to specific version
of data files or directories under DVC control. This command synchronizes the
of data files or directories tracked by DVC. This command synchronizes the
workspace data with the versions specified in the current DVC-files.

`dvc checkout` is useful, for example, when using Git in the
Expand Down Expand Up @@ -147,7 +147,7 @@ bigrams-experiment <- Uses bigrams to improve the model
This project comes with a predefined HTTP
[remote storage](/doc/command-reference/remote). We can now just run `dvc pull`
that will fetch and checkout the most recent `model.pkl`, `data.xml`, and other
files that are under DVC control. The model file hash
files that are tracked by DVC. The model file hash
`3863d0e317dee0a55c4e59d2ec0eef33` will be used in the `train.dvc`
[stage file](/doc/command-reference/run):

Expand Down
16 changes: 8 additions & 8 deletions public/static/docs/command-reference/config.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,9 +22,9 @@ takes a config option `name` (a section and a key, separated by a dot) and its
This command reads and updates the DVC configuration files. By default (if none
of `--local`, `--global`, or `--system` is provided) a project's config
(`.dvc/config`) file is read or modified. This file is by default meant to be
under Git control and should not contain sensitive and/or user-specific
information (passwords, SSH keys, etc). Use `--local`, `--global`, or `--system`
options instead to override project's settings, for sensitive, or user-specific
tracked by Git and should not contain sensitive and/or user-specific information
(passwords, SSH keys, etc). Use `--local`, `--global`, or `--system` options
instead to override project's settings, for sensitive, or user-specific
settings.

If the config option `value` is not provided and `--unset` option is not used,
Expand Down Expand Up @@ -95,7 +95,7 @@ remote. See `dvc remote` for more information.
### cache

A DVC project <abbr>cache</abbr> is the hidden storage (by default located in
the `.dvc/cache` directory) for files that are under DVC control, and their
the `.dvc/cache` directory) for files that are tracked by DVC, and their
different versions. (See `dvc cache` and
[DVC Files and Directories](/doc/user-guide/dvc-files-and-directories#structure-of-cache-directory)
for more details.) This section contains the following options:
Expand All @@ -109,9 +109,9 @@ for more details.) This section contains the following options:
> option, properly transforming paths relative to the current working
> directory into paths relative to the config file location.
- `cache.protected` - make files under DVC control read-only. Possible values
are `true` or `false` (default). Run `dvc checkout` after changing the value
of this option for the change to go into effect.
- `cache.protected` - make DVC-tracked files read-only. Possible values are
`true` or `false` (default). Run `dvc checkout` after changing the value of
this option for the change to go into effect.

Due to the way DVC handles linking between the data files in the cache and
their counterparts in the <abbr>workspace</abbr>, it's easy to accidentally
Expand Down Expand Up @@ -272,7 +272,7 @@ Set cache type: if `reflink` is not available, use `copy`:
$ dvc config cache.type reflink,copy
```

Protect data files under DVC control by making them read-only:
Protect DVC-tracked data files by making them read-only:

```dvc
$ dvc config cache.protected true
Expand Down
4 changes: 2 additions & 2 deletions public/static/docs/command-reference/diff.md
Original file line number Diff line number Diff line change
Expand Up @@ -146,8 +146,8 @@ The output from this command confirms that there's a difference in the

Unlike Git, DVC features controlling entire directories without having to add
each individual file. See `dvc add` without `--recursive` for example. `dvc run`
can also put whole directories under DVC control (when these are specified as
command dependencies or <abbr>outputs</abbr>).
can track entire directories (when these are specified as command dependencies
or <abbr>outputs</abbr>).

We can use `dvc diff` to check for changes in a directory by specifying the
directory as the target (with option `-t`). Note that we skip the `b_ref`
Expand Down
12 changes: 6 additions & 6 deletions public/static/docs/command-reference/fetch.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# fetch

Get files that are under DVC control from
Get files or directories tracked by DVC from
[remote storage](/doc/command-reference/remote) into the <abbr>cache</abbr>.

## Synopsis
Expand Down Expand Up @@ -43,8 +43,8 @@ project's cache ++ | dvc pull |
```

Fetching could be useful when first checking out a <abbr>DVC project</abbr>,
since files under DVC control should already exist in remote storage, but won't
be in the project's cache. (Refer to `dvc remote` for more information on DVC
since files tracked by DVC should already exist in remote storage, but won't be
in the project's cache. (Refer to `dvc remote` for more information on DVC
remotes.) These necessary data or model files are listed as dependencies or
outputs in a DVC-file (target [stage](/doc/command-reference/run)) so they are
required to [reproduce](/doc/get-started/reproduce) the corresponding
Expand All @@ -64,7 +64,7 @@ for more information on how to configure different remote storage providers.
`dvc fetch`, `dvc pull`, and `dvc push` are related in that these 3 commands
perform data synchronization among local and remote storage. The specific way in
which the set of files to push/fetch/pull is determined begins with calculating
file hashes when these are [added](/doc/get-started/add-files) to DVC. File
file hashes when these are [added](/doc/get-started/add-files) with DVC. File
hashes are stored in the corresponding DVC-files (typically versioned with Git).
Only the hashes specified in DVC-files currently in the workspace are considered
by `dvc fetch` (unless the `-a` or `-T` options are used).
Expand Down Expand Up @@ -161,8 +161,8 @@ bigrams-experiment <- use bigrams to improve the model

This project comes with a predefined HTTP
[remote storage](/doc/command-reference/remote). We can now just run `dvc fetch`
to download the most recent `model.pkl`, `data.xml`, and other files that are
under DVC control into our local <abbr>cache</abbr>.
to download the most recent `model.pkl`, `data.xml`, and other DVC-tracked files
into our local <abbr>cache</abbr>.

```dvc
$ dvc status --cloud
Expand Down
16 changes: 8 additions & 8 deletions public/static/docs/command-reference/import.md
Original file line number Diff line number Diff line change
Expand Up @@ -47,10 +47,10 @@ actual data.
> such as S3, SSH, HTTP, etc.
After running this command successfully, the imported data is placed in the
current working directory with its original file name e.g. `data.txt`. An
_import stage_ (DVC-file) is then created, extending the full file or directory
name of the imported data e.g. `data.txt.dvc` – similar to having used `dvc run`
to generate the same output.
current working directory (unless `-o` is used) with its original file name e.g.
`data.txt`. An _import stage_ (DVC-file) is also created in the same location,
extending the name of the imported data e.g. `data.txt.dvc` – similar to having
used `dvc run` to generate the output.

DVC-files support references to data in an external DVC repository (hosted on a
Git server). In such a DVC-file, the `deps` section specifies the `repo`-`url`
Expand All @@ -69,10 +69,10 @@ data artifact from the source repo.
## Options

- `-o`, `--out` - specify a path (directory and/or file name) to the desired
location to place the imported data in. The default value (when this option
isn't used) is the current working directory (`.`) and original file name. If
an existing directory is specified, then the output will be placed inside of
it.
location to place the imported data and import stage (DVC-file) in. The
default value (when this option isn't used) is the current working directory
(`.`) and original file name. If an existing directory is specified, then the
output will be placed inside of it.

- `--rev` - commit hash, branch or tag name, etc. (any
[Git revision](https://git-scm.com/docs/revisions)) of the repository to
Expand Down
4 changes: 2 additions & 2 deletions public/static/docs/command-reference/init.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,8 +22,8 @@ learn more.
`.dvc/cache` is one of the most important
[DVC directories](/doc/user-guide/dvc-files-and-directories). It will hold all
the contents of tracked data files. Note that `.dvc/.gitignore` lists this
directory, which means that the cache directory is not under Git control. This
is a local cache and you cannot `git push` it.
directory, which means that the cache directory is not tracked by Git. This is a
local cache and you cannot `git push` it.

## Options

Expand Down
4 changes: 2 additions & 2 deletions public/static/docs/command-reference/install.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@ This hook automates reminding the user to run either `dvc commit` or

**Push**: While publishing changes to the Git remote with `git push`, its easy
to forget that the `dvc push` command is necessary to upload new or updated data
files and directories under DVC control to
files and directories tracked by DVC to
[remote storage](/doc/command-reference/remote).

This hook automates `dvc push`.
Expand All @@ -52,7 +52,7 @@ This hook automates `dvc push`.
- A `post-checkout` hook executes `dvc checkout` after `git checkout` to
automatically synchronize the data files with the new workspace state.
- A `pre-push` hook executes `dvc push` before `git push` to upload files and
directories under DVC control to remote storage.
directories tracked by DVC to remote storage.

If a hook already exists, DVC will raise an exception. In such case, user should
try to manually edit existing file or remove it and retry install.
Expand Down
2 changes: 1 addition & 1 deletion public/static/docs/command-reference/push.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# push

Uploads files and directories under DVC control to the
Uploads files or directories tracked by DVC to
[remote storage](/doc/command-reference/remote).

## Synopsis
Expand Down
2 changes: 1 addition & 1 deletion public/static/docs/command-reference/remote/add.md
Original file line number Diff line number Diff line change
Expand Up @@ -187,7 +187,7 @@ $ dvc remote add myremote "azure://"
The connection string can be found in the "Access Keys" pane of your Storage
Account resource in the Azure portal.

> 💡Make sure the value is quoted to prevent shell from misprocessing the
> 💡 Make sure the value is quoted to prevent shell from misprocessing the
> command.
- `container name` - this is the top-level container in your Azure Storage
Expand Down
4 changes: 2 additions & 2 deletions public/static/docs/command-reference/remove.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ Note that it does not remove files from the DVC cache or remote storage (see
want to use or share in the future.

Refer to [Updating Tracked Files](/doc/user-guide/updating-tracked-files) to see
how it can be used to replace or modify files that are under DVC control.
how it can be used to replace or modify files that are tracked by DVC.

## Options

Expand All @@ -43,7 +43,7 @@ how it can be used to replace or modify files that are under DVC control.

## Examples

Let's imagine we have a `data.csv` under DVC control:
Let's imagine have a `data.csv` data file, and track it with DVC:

```dvc
$ dvc add data.csv
Expand Down
33 changes: 16 additions & 17 deletions public/static/docs/command-reference/run.md
Original file line number Diff line number Diff line change
Expand Up @@ -91,27 +91,26 @@ data pipeline (e.g. random numbers, time functions, hardware dependency, etc.)
- `-o`, `--outs` - specify a file or directory that is the result of running the
`command`. Multiple outputs can be specified: `-o model.pkl -o output.log`.
DVC builds a dependency graph (pipeline) to connect different stages with each
other based on this list of outputs and dependencies (see `-d`). DVC takes all
output files and directories under its control and puts them into the cache
(this is similar to what's happening when you use `dvc add`).
other based on this list of outputs and dependencies (see `-d`). DVC tracks
all output files and directories and puts them into the cache (this is similar
to what's happening when you use `dvc add`).

- `-O`, `--outs-no-cache` - the same as `-o` except outputs are not put
automatically under DVC control. It means that they are not cached, and it's
up to a user to save and version control them. This is useful if the outputs
are small enough to be put into Git control, or if these files are not of
future interest.
- `-O`, `--outs-no-cache` - the same as `-o` except that outputs are not tracked
by DVC. It means that they are not cached, and it's up to a user to save and
version control them. This is useful if the outputs are small enough to be put
into Git control, or if these files are not of future interest.

- `-m`, `--metrics` - specify a metric type of output. This option behaves like
`-o` but also adds `metric: true` in the output record of the resulting stage
file. Metrics are usually small, human readable files (e.g. JSON or CSV) with
numeric values or other information that describes a model (or any other
regular output). See `dvc metrics` to learn more about using metrics.

- `-M`, `--metrics-no-cache` - the same as `-m` except files are not put
automatically under DVC control. It means that they are not cached, and it's
up to a user to save and version control them. This is typically desirable
with metric files, because they are small enough to be put into Git control.
See also the difference between `-o` and `-O`.
- `-M`, `--metrics-no-cache` - the same as `-m` except that files are not
tracked by DVC. It means that they are not cached, and it's up to a user to
save and version control them. This is typically desirable with metric files,
because they are small enough to be put into Git control. See also the
difference between `-o` and `-O`.

- `-f`, `--file` - specify stage file name. By default the DVC-file name
generated is `<file>.dvc`, where `<file>` is file name of the first output
Expand All @@ -131,10 +130,10 @@ data pipeline (e.g. random numbers, time functions, hardware dependency, etc.)
`command`.

- `--no-exec` - create a stage file, but do not execute the `command` defined in
it, nor take dependencies or outputs under DVC control. In the DVC-file
contents, the file hash values will be empty; They will be populated the next
time this stage is actually executed. This is useful if, for example, you need
to build a pipeline (dependency graph) first, and then run it all at once.
it, nor track dependencies or outputs with DVC. In the DVC-file contents, the
file hash values will be empty; They will be populated the next time this
stage is actually executed. This is useful if, for example, you need to build
a pipeline (dependency graph) first, and then run it all at once.

- `-y`, `--yes` (_deprecated_) - See `--overwrite-dvcfile` below.

Expand Down
2 changes: 1 addition & 1 deletion public/static/docs/command-reference/unprotect.md
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,7 @@ Enable cache protected mode is enabled:
$ dvc config cache.protected true
```

Put a data file under DVC control:
Track a data file with DVC:

```dvc
$ ls -lh
Expand Down
13 changes: 6 additions & 7 deletions public/static/docs/get-started/add-files.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,8 +18,7 @@ $ dvc get https://github.com/iterative/dataset-registry \
> [Data Registries](/doc/use-cases/data-registries) for more info about this
> setup.)
To take a file (or a directory) under DVC control just run `dvc add` on it. For
example:
To track a file (or a directory) with DVC just run `dvc add` on it. For example:

```dvc
$ dvc add data/data.xml
Expand All @@ -35,7 +34,7 @@ $ git commit -m "Add raw data to project"
```

Committing DVC-files with Git allows us to track different versions of the
<abbr>project</abbr> data as it evolves with the source code under Git control.
<abbr>project</abbr> data as it evolves with the source code tracked by Git.

<details>

Expand All @@ -53,7 +52,7 @@ $ ls -R .dvc/cache
```

`a304afb96060aad90176268345e10355` above is the hash value of the `data.xml`
file we just added to DVC. If you check the `data/data.xml.dvc` DVC-file, you
file we just added with DVC. If you check the `data/data.xml.dvc` DVC-file, you
will see that it has this string inside.

### Important note on cache performance
Expand All @@ -80,9 +79,9 @@ See [Large Dataset Optimization](/doc/user-guide/large-dataset-optimization) and
</details>

If your workspace uses Git, without DVC you would have to manually put each data
file or directory into `.gitignore`. DVC commands that take or make files that
will go under its control automatically takes care of this for you! (You just
have to add the changes with Git.)
file or directory into `.gitignore`. DVC commands that track data files
automatically takes care of this for you! (You just have to add the changes with
Git.)

Refer to
[Versioning Data and Model Files](/doc/use-cases/versioning-data-and-model-files),
Expand Down
8 changes: 4 additions & 4 deletions public/static/docs/get-started/connect-code-and-data.md
Original file line number Diff line number Diff line change
Expand Up @@ -150,10 +150,10 @@ learn the specific details about how they behave, and all of their options.

</details>

You don't need to run `dvc add` to place output files (`prepared/train.tsv` and
`prepared/test.tsv`) under DVC control. `dvc run` takes care of this. You only
need to run `dvc push` (usually along with `git commit`) to save them to the
remote when you are done.
You don't need to run `dvc add` to track output files (`prepared/train.tsv` and
`prepared/test.tsv`) with DVC. `dvc run` takes care of this. You only need to
run `dvc push` (usually along with `git commit`) to save them to the remote when
you are done.

Let's commit the changes to save the stage we built:

Expand Down
4 changes: 2 additions & 2 deletions public/static/docs/get-started/initialize.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,5 +26,5 @@ learn more.
> [DVC Files and Directories](/doc/user-guide/dvc-files-and-directories) to
> learn about the DVC internal file and directory structure.
The last command, `git commit`, puts the `.dvc/config` and `.dvc/.gitignore`
files (DVC internals) under Git control.
The last command, `git commit`, versions the `.dvc/config` and `.dvc/.gitignore`
files (DVC internals) with Git.
Loading

0 comments on commit 022680b

Please sign in to comment.