Skip to content

Commit

Permalink
Merge pull request #1058 from iterative/protected-mode-by-default
Browse files Browse the repository at this point in the history
docs: protected mode is now enabled by-default
  • Loading branch information
efiop authored Mar 17, 2020
2 parents e4269d4 + 833ea3c commit 25f9652
Show file tree
Hide file tree
Showing 3 changed files with 29 additions and 44 deletions.
27 changes: 6 additions & 21 deletions public/static/docs/command-reference/config.md
Original file line number Diff line number Diff line change
Expand Up @@ -114,18 +114,9 @@ for more details.) This section contains the following options:
> option, properly transforming paths relative to the current working
> directory into paths relative to the config file location.
- `cache.protected` - make DVC-tracked files read-only. Possible values are
`true` or `false` (default). Run `dvc checkout` after changing the value of
this option for the change to go into effect.

Due to the way DVC handles linking between the data files in the cache and
their counterparts in the <abbr>workspace</abbr>, it's easy to accidentally
corrupt the cached file by editing or overwriting it. Turning this config
option on forces you to run `dvc unprotect` before updating a file, providing
an additional layer of security to your data.

We highly recommend enabling `cache.protected` when `cache.type` is set to
`hardlink` or `symlink`.
- `cache.protected` (_deprecated_) - when using `hardlink` or `symlink` as
`cache.type`, the file links will automatically be protected (read-only). Use
`dvc unprotect` if you need to update them.

- `cache.type` - link type that DVC should use to link data files from cache to
the workspace. Possible values: `reflink`, `symlink`, `hardlink`, `copy` or a
Expand All @@ -136,9 +127,9 @@ for more details.) This section contains the following options:
to protect user from accidental cache and repository corruption.

⚠️ If you manually set `cache.type` to `hardlink` or `symlink`, **you will
corrupt the cache** if you modify tracked data files in the workspace. See the
`cache.protected` option above, and corresponding `dvc unprotect` command to
modify files safely.
corrupt the cache** if you modify tracked data files in the workspace. In an
attempt to prevent that, DVC will automatically protect those file links (make
them read-only). Use `dvc unprotect` to be able to modify them safely.

There are pros and cons to different link types. Refer to
[File link types](/doc/user-guide/large-dataset-optimization#file-link-types-for-the-dvc-cache)
Expand Down Expand Up @@ -276,9 +267,3 @@ Set cache type: if `reflink` is not available, use `copy`:
```dvc
$ dvc config cache.type reflink,copy
```

Protect DVC-tracked data files by making them read-only:

```dvc
$ dvc config cache.protected true
```
17 changes: 8 additions & 9 deletions public/static/docs/command-reference/unprotect.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# unprotect

Unprotect tracked files or directories (when the <abbr>cache</abbr> protected
mode has been enabled with `dvc config cache`).
Unprotect tracked files or directories (when hardlinks or symlinks have been
enabled with `dvc config cache.type`).

## Synopsis

Expand All @@ -19,9 +19,9 @@ to link tracked data files from the cache to the <abbr>workspace</abbr>.
However, these types of file links can be enabled with `dvc config cache`
(`cache.type` config option).

Enabling hardlinks or symlinks also requires the `cache.protected` mode to be
turned on, which makes the tracked data files in the workspace read-only. (This
prevent users from accidentally corrupting the cache by modifying file links.)
Enabling hardlinks or symlinks makes the tracked data files in the workspace
read-only. (This prevents users from accidentally corrupting the cache by
modifying file links.)

Running `dvc unprotect` guarantees that the target files or directories
(`targets`) in the workspace are physically "unlinked" from the cache and can be
Expand All @@ -30,8 +30,7 @@ safely updated. Read the
more on this process.

`dvc unprotect` can be an expensive operation (involves copying data). Check
first whether your task matches one of the cases that are considered safe, even
when cache protected mode is enabled:
first whether your task matches one of the cases that are considered safe:

- Adding more files to a directory input dataset (say, images or videos)
- Deleting files from a directory dataset
Expand All @@ -47,10 +46,10 @@ when cache protected mode is enabled:

## Examples

Enable cache protected mode is enabled:
Enable symlinks:

```dvc
$ dvc config cache.protected true
$ dvc config cache.type symlink
```

Track a data file with DVC:
Expand Down
29 changes: 15 additions & 14 deletions public/static/docs/user-guide/large-dataset-optimization.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,11 +38,13 @@ Symbolic links, and Reflinks in more recent systems. While reflinks bring all
the benefits and none of the worries, they're not commonly supported in most
platforms yet. Hard/soft links optimize **speed** and **space** in the file
system, but may break your workflow since updating hard/sym-linked files tracked
by DVC in the <abbr>workspace</abbr> causes <abbr>cache</abbr> corruption. These
2 link types thus require using cache **protected mode** (see the
`cache.protected` config option in `dvc config cache`). Finally, a 4th "linking"
alternative is to actually copy files from/to the cache, which is safe but
inefficient – especially for large files (several GBs or more).
by DVC in the <abbr>workspace</abbr> causes <abbr>cache</abbr> corruption. To
protect against that, DVC makes hardlinks and symlinks links read-only, which
requires the user to use `dvc unprotect` before modifying them.

Finally, a 4th "linking" alternative is to actually copy files from/to the
cache, which is safe but inefficient – especially for large files (several GBs
or more).

> Some versions of Windows (e.g. Windows Server 2012+ and Windows 10 Enterprise)
> support hard or soft links on the
Expand All @@ -53,12 +55,12 @@ inefficient – especially for large files (several GBs or more).
File link type benefits summary:

| `cache.type` | speed | space | no protected mode |
| ------------ | ----- | ----- | ----------------- |
| `reflink` | x | x | x |
| `hardlink` | x | x | |
| `symlink` | x | x | |
| `copy` | | | x |
| `cache.type` | speed | space | editable |
| ------------ | ----- | ----- | -------- |
| `reflink` | x | x | x |
| `hardlink` | x | x | |
| `symlink` | x | x | |
| `copy` | | | x |

Each file linking method is further detailed below, in function of their
efficiency:
Expand Down Expand Up @@ -110,13 +112,12 @@ configure DVC like this:

```dvc
$ dvc config cache.type hardlink,symlink
$ dvc config cache.protected true
```

> Refer to `dvc config cache` for more details.
Setting `cache.protected` is important with `hardlink` and/or `symlink` cache
file link types. Please refer to the
Note that with this `cache.type`, your workspace files will be in read-only mode
in order to protect the cache from corruption. Please refer to the
[Update a Tracked File](/doc/user-guide/updating-tracked-files) on how to manage
tracked files under these cache configurations.

Expand Down

0 comments on commit 25f9652

Please sign in to comment.