Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: protected mode is now enabled by-default #1058

Merged
merged 9 commits into from
Mar 17, 2020
28 changes: 7 additions & 21 deletions public/static/docs/command-reference/config.md
Original file line number Diff line number Diff line change
Expand Up @@ -114,18 +114,9 @@ for more details.) This section contains the following options:
> option, properly transforming paths relative to the current working
> directory into paths relative to the config file location.

- `cache.protected` - make DVC-tracked files read-only. Possible values are
`true` or `false` (default). Run `dvc checkout` after changing the value of
this option for the change to go into effect.

Due to the way DVC handles linking between the data files in the cache and
shcheklein marked this conversation as resolved.
Show resolved Hide resolved
their counterparts in the <abbr>workspace</abbr>, it's easy to accidentally
corrupt the cached file by editing or overwriting it. Turning this config
option on forces you to run `dvc unprotect` before updating a file, providing
an additional layer of security to your data.

We highly recommend enabling `cache.protected` when `cache.type` is set to
`hardlink` or `symlink`.
- `cache.protected` - obsoleted. When using `hardlink` or `symlink` as
`cache.type`, they will automatically be read-only. Use `dvc unprotect`
before updating a file.
efiop marked this conversation as resolved.
Show resolved Hide resolved

- `cache.type` - link type that DVC should use to link data files from cache to
the workspace. Possible values: `reflink`, `symlink`, `hardlink`, `copy` or a
shcheklein marked this conversation as resolved.
Show resolved Hide resolved
Expand All @@ -136,9 +127,10 @@ for more details.) This section contains the following options:
to protect user from accidental cache and repository corruption.

⚠️ If you manually set `cache.type` to `hardlink` or `symlink`, **you will
corrupt the cache** if you modify tracked data files in the workspace. See the
`cache.protected` option above, and corresponding `dvc unprotect` command to
modify files safely.
corrupt the cache** if you modify tracked data files in the workspace. In
an attempt to prevent that, DVC will automatically make those links
read-only and an additional barrier. Use `dvc unprotect` command to modify
files safely.
efiop marked this conversation as resolved.
Show resolved Hide resolved

There are pros and cons to different link types. Refer to
[File link types](/doc/user-guide/large-dataset-optimization#file-link-types-for-the-dvc-cache)
Expand Down Expand Up @@ -276,9 +268,3 @@ Set cache type: if `reflink` is not available, use `copy`:
```dvc
$ dvc config cache.type reflink,copy
```

Protect DVC-tracked data files by making them read-only:

```dvc
$ dvc config cache.protected true
```
17 changes: 8 additions & 9 deletions public/static/docs/command-reference/unprotect.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# unprotect

Unprotect tracked files or directories (when the <abbr>cache</abbr> protected
mode has been enabled with `dvc config cache`).
Unprotect tracked files or directories (when hardlinks or symlinks have been
enabled with `dvc config cache.type`).
efiop marked this conversation as resolved.
Show resolved Hide resolved

## Synopsis

Expand All @@ -19,9 +19,9 @@ to link tracked data files from the cache to the <abbr>workspace</abbr>.
However, these types of file links can be enabled with `dvc config cache`
(`cache.type` config option).

Enabling hardlinks or symlinks also requires the `cache.protected` mode to be
turned on, which makes the tracked data files in the workspace read-only. (This
prevent users from accidentally corrupting the cache by modifying file links.)
Enabling hardlinks or symlinks makes the tracked data files in the workspace
read-only. (This prevent users from accidentally corrupting the cache by
efiop marked this conversation as resolved.
Show resolved Hide resolved
modifying file links.)

Running `dvc unprotect` guarantees that the target files or directories
(`targets`) in the workspace are physically "unlinked" from the cache and can be
Expand All @@ -30,8 +30,7 @@ safely updated. Read the
more on this process.

`dvc unprotect` can be an expensive operation (involves copying data). Check
first whether your task matches one of the cases that are considered safe, even
when cache protected mode is enabled:
first whether your task matches one of the cases that are considered safe:

- Adding more files to a directory input dataset (say, images or videos)
- Deleting files from a directory dataset
Expand All @@ -47,10 +46,10 @@ when cache protected mode is enabled:

## Examples

Enable cache protected mode is enabled:
Enable symlinks:

```dvc
$ dvc config cache.protected true
$ dvc config cache.type symlink
```

Track a data file with DVC:
Expand Down
12 changes: 6 additions & 6 deletions public/static/docs/user-guide/large-dataset-optimization.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,9 +40,10 @@ platforms yet. Hard/soft links optimize **speed** and **space** in the file
system, but may break your workflow since updating hard/sym-linked files tracked
by DVC in the <abbr>workspace</abbr> causes <abbr>cache</abbr> corruption. These
2 link types thus require using cache **protected mode** (see the
`cache.protected` config option in `dvc config cache`). Finally, a 4th "linking"
alternative is to actually copy files from/to the cache, which is safe but
inefficient – especially for large files (several GBs or more).
`cache.protected` config option in `dvc config cache`), which is enabled
efiop marked this conversation as resolved.
Show resolved Hide resolved
by-default and makes your files read-only. Finally, a 4th "linking" alternative
is to actually copy files from/to the cache, which is safe but inefficient –
especially for large files (several GBs or more).

> Some versions of Windows (e.g. Windows Server 2012+ and Windows 10 Enterprise)
> support hard or soft links on the
Expand Down Expand Up @@ -110,13 +111,12 @@ configure DVC like this:

```dvc
$ dvc config cache.type hardlink,symlink
$ dvc config cache.protected true
```

> Refer to `dvc config cache` for more details.

Setting `cache.protected` is important with `hardlink` and/or `symlink` cache
file link types. Please refer to the
Note that your workspace files will be in read-only mode because of the efforts
to protect cache from corruption. Please refer to the
efiop marked this conversation as resolved.
Show resolved Hide resolved
[Update a Tracked File](/doc/user-guide/updating-tracked-files) on how to manage
tracked files under these cache configurations.

Expand Down