Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: protected mode is now enabled by-default #1058

Merged
merged 9 commits into from
Mar 17, 2020
27 changes: 6 additions & 21 deletions public/static/docs/command-reference/config.md
Original file line number Diff line number Diff line change
Expand Up @@ -114,18 +114,9 @@ for more details.) This section contains the following options:
> option, properly transforming paths relative to the current working
> directory into paths relative to the config file location.

- `cache.protected` - make DVC-tracked files read-only. Possible values are
`true` or `false` (default). Run `dvc checkout` after changing the value of
this option for the change to go into effect.

Due to the way DVC handles linking between the data files in the cache and
shcheklein marked this conversation as resolved.
Show resolved Hide resolved
their counterparts in the <abbr>workspace</abbr>, it's easy to accidentally
corrupt the cached file by editing or overwriting it. Turning this config
option on forces you to run `dvc unprotect` before updating a file, providing
an additional layer of security to your data.

We highly recommend enabling `cache.protected` when `cache.type` is set to
`hardlink` or `symlink`.
- `cache.protected` (_deprecated_) - when using `hardlink` or `symlink` as
`cache.type`, the file links will automatically be protected (read-only). Use
`dvc unprotect` if you need to update them.

- `cache.type` - link type that DVC should use to link data files from cache to
the workspace. Possible values: `reflink`, `symlink`, `hardlink`, `copy` or a
shcheklein marked this conversation as resolved.
Show resolved Hide resolved
Expand All @@ -136,9 +127,9 @@ for more details.) This section contains the following options:
to protect user from accidental cache and repository corruption.

⚠️ If you manually set `cache.type` to `hardlink` or `symlink`, **you will
corrupt the cache** if you modify tracked data files in the workspace. See the
`cache.protected` option above, and corresponding `dvc unprotect` command to
modify files safely.
corrupt the cache** if you modify tracked data files in the workspace. In an
attempt to prevent that, DVC will automatically protect those file links (make
them read-only). Use `dvc unprotect` to be able to modify them safely.

There are pros and cons to different link types. Refer to
[File link types](/doc/user-guide/large-dataset-optimization#file-link-types-for-the-dvc-cache)
Expand Down Expand Up @@ -276,9 +267,3 @@ Set cache type: if `reflink` is not available, use `copy`:
```dvc
$ dvc config cache.type reflink,copy
```

Protect DVC-tracked data files by making them read-only:

```dvc
$ dvc config cache.protected true
```
17 changes: 8 additions & 9 deletions public/static/docs/command-reference/unprotect.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# unprotect

Unprotect tracked files or directories (when the <abbr>cache</abbr> protected
mode has been enabled with `dvc config cache`).
Unprotect tracked files or directories (when hardlinks or symlinks have been
enabled with `dvc config cache.type`).
efiop marked this conversation as resolved.
Show resolved Hide resolved

## Synopsis

Expand All @@ -19,9 +19,9 @@ to link tracked data files from the cache to the <abbr>workspace</abbr>.
However, these types of file links can be enabled with `dvc config cache`
(`cache.type` config option).

Enabling hardlinks or symlinks also requires the `cache.protected` mode to be
turned on, which makes the tracked data files in the workspace read-only. (This
prevent users from accidentally corrupting the cache by modifying file links.)
Enabling hardlinks or symlinks makes the tracked data files in the workspace
read-only. (This prevents users from accidentally corrupting the cache by
modifying file links.)

Running `dvc unprotect` guarantees that the target files or directories
(`targets`) in the workspace are physically "unlinked" from the cache and can be
Expand All @@ -30,8 +30,7 @@ safely updated. Read the
more on this process.

`dvc unprotect` can be an expensive operation (involves copying data). Check
first whether your task matches one of the cases that are considered safe, even
when cache protected mode is enabled:
first whether your task matches one of the cases that are considered safe:

- Adding more files to a directory input dataset (say, images or videos)
- Deleting files from a directory dataset
Expand All @@ -47,10 +46,10 @@ when cache protected mode is enabled:

## Examples

Enable cache protected mode is enabled:
Enable symlinks:

```dvc
$ dvc config cache.protected true
$ dvc config cache.type symlink
```

Track a data file with DVC:
Expand Down
29 changes: 15 additions & 14 deletions public/static/docs/user-guide/large-dataset-optimization.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,11 +38,13 @@ Symbolic links, and Reflinks in more recent systems. While reflinks bring all
the benefits and none of the worries, they're not commonly supported in most
platforms yet. Hard/soft links optimize **speed** and **space** in the file
system, but may break your workflow since updating hard/sym-linked files tracked
by DVC in the <abbr>workspace</abbr> causes <abbr>cache</abbr> corruption. These
2 link types thus require using cache **protected mode** (see the
`cache.protected` config option in `dvc config cache`). Finally, a 4th "linking"
alternative is to actually copy files from/to the cache, which is safe but
inefficient – especially for large files (several GBs or more).
by DVC in the <abbr>workspace</abbr> causes <abbr>cache</abbr> corruption. To
protect against that, DVC makes hardlinks and symlinks links read-only, which requires the user to
use `dvc unprotect` before modifying them.

Finally, a 4th "linking" alternative
is to actually copy files from/to the cache, which is safe but inefficient –
especially for large files (several GBs or more).

> Some versions of Windows (e.g. Windows Server 2012+ and Windows 10 Enterprise)
> support hard or soft links on the
Expand All @@ -53,12 +55,12 @@ inefficient – especially for large files (several GBs or more).

File link type benefits summary:

| `cache.type` | speed | space | no protected mode |
| ------------ | ----- | ----- | ----------------- |
| `reflink` | x | x | x |
| `hardlink` | x | x | |
| `symlink` | x | x | |
| `copy` | | | x |
| `cache.type` | speed | space | editable |
| ------------ | ----- | ----- | ------------------ |
| `reflink` | x | x | x |
| `hardlink` | x | x | |
| `symlink` | x | x | |
| `copy` | | | x |

Each file linking method is further detailed below, in function of their
efficiency:
Expand Down Expand Up @@ -110,13 +112,12 @@ configure DVC like this:

```dvc
$ dvc config cache.type hardlink,symlink
$ dvc config cache.protected true
```

> Refer to `dvc config cache` for more details.

Setting `cache.protected` is important with `hardlink` and/or `symlink` cache
file link types. Please refer to the
Note that with this `cache.type`, your workspace files will be in read-only mode
in order to protect the cache from corruption. Please refer to the
[Update a Tracked File](/doc/user-guide/updating-tracked-files) on how to manage
tracked files under these cache configurations.

Expand Down