From 21aab371a487acf6f6e6201b29cd832e7c55ed23 Mon Sep 17 00:00:00 2001 From: Ruslan Kuprieiev Date: Mon, 16 Mar 2020 20:11:43 +0200 Subject: [PATCH 1/9] docs: protected mode is now enabled by-default https://github.com/iterative/dvc/pull/3472 --- .../static/docs/command-reference/config.md | 28 +++++-------------- .../docs/command-reference/unprotect.md | 17 ++++++----- .../user-guide/large-dataset-optimization.md | 12 ++++---- 3 files changed, 21 insertions(+), 36 deletions(-) diff --git a/public/static/docs/command-reference/config.md b/public/static/docs/command-reference/config.md index f440ddab9c..84b06d20df 100644 --- a/public/static/docs/command-reference/config.md +++ b/public/static/docs/command-reference/config.md @@ -114,18 +114,9 @@ for more details.) This section contains the following options: > option, properly transforming paths relative to the current working > directory into paths relative to the config file location. -- `cache.protected` - make DVC-tracked files read-only. Possible values are - `true` or `false` (default). Run `dvc checkout` after changing the value of - this option for the change to go into effect. - - Due to the way DVC handles linking between the data files in the cache and - their counterparts in the workspace, it's easy to accidentally - corrupt the cached file by editing or overwriting it. Turning this config - option on forces you to run `dvc unprotect` before updating a file, providing - an additional layer of security to your data. - - We highly recommend enabling `cache.protected` when `cache.type` is set to - `hardlink` or `symlink`. +- `cache.protected` - obsoleted. When using `hardlink` or `symlink` as + `cache.type`, they will automatically be read-only. Use `dvc unprotect` + before updating a file. - `cache.type` - link type that DVC should use to link data files from cache to the workspace. Possible values: `reflink`, `symlink`, `hardlink`, `copy` or a @@ -136,9 +127,10 @@ for more details.) This section contains the following options: to protect user from accidental cache and repository corruption. ⚠️ If you manually set `cache.type` to `hardlink` or `symlink`, **you will - corrupt the cache** if you modify tracked data files in the workspace. See the - `cache.protected` option above, and corresponding `dvc unprotect` command to - modify files safely. + corrupt the cache** if you modify tracked data files in the workspace. In + an attempt to prevent that, DVC will automatically make those links + read-only and an additional barrier. Use `dvc unprotect` command to modify + files safely. There are pros and cons to different link types. Refer to [File link types](/doc/user-guide/large-dataset-optimization#file-link-types-for-the-dvc-cache) @@ -276,9 +268,3 @@ Set cache type: if `reflink` is not available, use `copy`: ```dvc $ dvc config cache.type reflink,copy ``` - -Protect DVC-tracked data files by making them read-only: - -```dvc -$ dvc config cache.protected true -``` diff --git a/public/static/docs/command-reference/unprotect.md b/public/static/docs/command-reference/unprotect.md index e59c94355a..6149fb488d 100644 --- a/public/static/docs/command-reference/unprotect.md +++ b/public/static/docs/command-reference/unprotect.md @@ -1,7 +1,7 @@ # unprotect -Unprotect tracked files or directories (when the cache protected -mode has been enabled with `dvc config cache`). +Unprotect tracked files or directories (when hardlinks or symlinks have been +enabled with `dvc config cache.type`). ## Synopsis @@ -19,9 +19,9 @@ to link tracked data files from the cache to the workspace. However, these types of file links can be enabled with `dvc config cache` (`cache.type` config option). -Enabling hardlinks or symlinks also requires the `cache.protected` mode to be -turned on, which makes the tracked data files in the workspace read-only. (This -prevent users from accidentally corrupting the cache by modifying file links.) +Enabling hardlinks or symlinks makes the tracked data files in the workspace +read-only. (This prevent users from accidentally corrupting the cache by +modifying file links.) Running `dvc unprotect` guarantees that the target files or directories (`targets`) in the workspace are physically "unlinked" from the cache and can be @@ -30,8 +30,7 @@ safely updated. Read the more on this process. `dvc unprotect` can be an expensive operation (involves copying data). Check -first whether your task matches one of the cases that are considered safe, even -when cache protected mode is enabled: +first whether your task matches one of the cases that are considered safe: - Adding more files to a directory input dataset (say, images or videos) - Deleting files from a directory dataset @@ -47,10 +46,10 @@ when cache protected mode is enabled: ## Examples -Enable cache protected mode is enabled: +Enable symlinks: ```dvc -$ dvc config cache.protected true +$ dvc config cache.type symlink ``` Track a data file with DVC: diff --git a/public/static/docs/user-guide/large-dataset-optimization.md b/public/static/docs/user-guide/large-dataset-optimization.md index 144ffd3aaf..2ce7ab7658 100644 --- a/public/static/docs/user-guide/large-dataset-optimization.md +++ b/public/static/docs/user-guide/large-dataset-optimization.md @@ -40,9 +40,10 @@ platforms yet. Hard/soft links optimize **speed** and **space** in the file system, but may break your workflow since updating hard/sym-linked files tracked by DVC in the workspace causes cache corruption. These 2 link types thus require using cache **protected mode** (see the -`cache.protected` config option in `dvc config cache`). Finally, a 4th "linking" -alternative is to actually copy files from/to the cache, which is safe but -inefficient – especially for large files (several GBs or more). +`cache.protected` config option in `dvc config cache`), which is enabled +by-default and makes your files read-only. Finally, a 4th "linking" alternative +is to actually copy files from/to the cache, which is safe but inefficient – +especially for large files (several GBs or more). > Some versions of Windows (e.g. Windows Server 2012+ and Windows 10 Enterprise) > support hard or soft links on the @@ -110,13 +111,12 @@ configure DVC like this: ```dvc $ dvc config cache.type hardlink,symlink -$ dvc config cache.protected true ``` > Refer to `dvc config cache` for more details. -Setting `cache.protected` is important with `hardlink` and/or `symlink` cache -file link types. Please refer to the +Note that your workspace files will be in read-only mode because of the efforts +to protect cache from corruption. Please refer to the [Update a Tracked File](/doc/user-guide/updating-tracked-files) on how to manage tracked files under these cache configurations. From 6da96dc37776d1ec1d17aec99a5c90cd3856ef8c Mon Sep 17 00:00:00 2001 From: Ruslan Kuprieiev Date: Tue, 17 Mar 2020 00:25:06 +0200 Subject: [PATCH 2/9] Update public/static/docs/command-reference/config.md Co-Authored-By: Jorge Orpinel --- public/static/docs/command-reference/config.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/public/static/docs/command-reference/config.md b/public/static/docs/command-reference/config.md index 84b06d20df..b1ac943b96 100644 --- a/public/static/docs/command-reference/config.md +++ b/public/static/docs/command-reference/config.md @@ -114,9 +114,9 @@ for more details.) This section contains the following options: > option, properly transforming paths relative to the current working > directory into paths relative to the config file location. -- `cache.protected` - obsoleted. When using `hardlink` or `symlink` as - `cache.type`, they will automatically be read-only. Use `dvc unprotect` - before updating a file. +- `cache.protected` (_deprecated_) - when using `hardlink` or `symlink` as + `cache.type`, the file links will automatically be protected (read-only). Use + `dvc unprotect` if you need to update them. - `cache.type` - link type that DVC should use to link data files from cache to the workspace. Possible values: `reflink`, `symlink`, `hardlink`, `copy` or a From 60769635a0226ff5caddf9226d5976a708365b8f Mon Sep 17 00:00:00 2001 From: Ruslan Kuprieiev Date: Tue, 17 Mar 2020 00:26:09 +0200 Subject: [PATCH 3/9] Update public/static/docs/command-reference/config.md Co-Authored-By: Jorge Orpinel --- public/static/docs/command-reference/config.md | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/public/static/docs/command-reference/config.md b/public/static/docs/command-reference/config.md index b1ac943b96..cd58941a59 100644 --- a/public/static/docs/command-reference/config.md +++ b/public/static/docs/command-reference/config.md @@ -128,9 +128,8 @@ for more details.) This section contains the following options: ⚠️ If you manually set `cache.type` to `hardlink` or `symlink`, **you will corrupt the cache** if you modify tracked data files in the workspace. In - an attempt to prevent that, DVC will automatically make those links - read-only and an additional barrier. Use `dvc unprotect` command to modify - files safely. + an attempt to prevent that, DVC will automatically protect those file links + (make them read-only). Use `dvc unprotect` to be able to modify them safely. There are pros and cons to different link types. Refer to [File link types](/doc/user-guide/large-dataset-optimization#file-link-types-for-the-dvc-cache) From b2ad46671fb5dd9556a28a23dc36fd070b6c0cbd Mon Sep 17 00:00:00 2001 From: Ruslan Kuprieiev Date: Tue, 17 Mar 2020 00:26:26 +0200 Subject: [PATCH 4/9] Update public/static/docs/command-reference/unprotect.md Co-Authored-By: Jorge Orpinel --- public/static/docs/command-reference/unprotect.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/public/static/docs/command-reference/unprotect.md b/public/static/docs/command-reference/unprotect.md index 6149fb488d..bd4c56ff90 100644 --- a/public/static/docs/command-reference/unprotect.md +++ b/public/static/docs/command-reference/unprotect.md @@ -20,7 +20,7 @@ However, these types of file links can be enabled with `dvc config cache` (`cache.type` config option). Enabling hardlinks or symlinks makes the tracked data files in the workspace -read-only. (This prevent users from accidentally corrupting the cache by +read-only. (This prevents users from accidentally corrupting the cache by modifying file links.) Running `dvc unprotect` guarantees that the target files or directories From 14ecc2603d13a2e8bd8e05e020b0b8be1d1dad9a Mon Sep 17 00:00:00 2001 From: Ruslan Kuprieiev Date: Tue, 17 Mar 2020 18:29:37 +0200 Subject: [PATCH 5/9] user-guide: large dataset: remove `protected` mention https://github.com/iterative/dvc.org/pull/1058/files#r393328615 --- .../user-guide/large-dataset-optimization.md | 19 +++++++++---------- 1 file changed, 9 insertions(+), 10 deletions(-) diff --git a/public/static/docs/user-guide/large-dataset-optimization.md b/public/static/docs/user-guide/large-dataset-optimization.md index 2ce7ab7658..5c965b574c 100644 --- a/public/static/docs/user-guide/large-dataset-optimization.md +++ b/public/static/docs/user-guide/large-dataset-optimization.md @@ -38,10 +38,9 @@ Symbolic links, and Reflinks in more recent systems. While reflinks bring all the benefits and none of the worries, they're not commonly supported in most platforms yet. Hard/soft links optimize **speed** and **space** in the file system, but may break your workflow since updating hard/sym-linked files tracked -by DVC in the workspace causes cache corruption. These -2 link types thus require using cache **protected mode** (see the -`cache.protected` config option in `dvc config cache`), which is enabled -by-default and makes your files read-only. Finally, a 4th "linking" alternative +by DVC in the workspace causes cache corruption. To +protect against that, DVC makes hard/soft links read-only, forcing the user to +use `dvc unprotect` before modifying them. Finally, a 4th "linking" alternative is to actually copy files from/to the cache, which is safe but inefficient – especially for large files (several GBs or more). @@ -54,12 +53,12 @@ especially for large files (several GBs or more). File link type benefits summary: -| `cache.type` | speed | space | no protected mode | -| ------------ | ----- | ----- | ----------------- | -| `reflink` | x | x | x | -| `hardlink` | x | x | | -| `symlink` | x | x | | -| `copy` | | | x | +| `cache.type` | speed | space | no read-only links | +| ------------ | ----- | ----- | ------------------ | +| `reflink` | x | x | x | +| `hardlink` | x | x | | +| `symlink` | x | x | | +| `copy` | | | x | Each file linking method is further detailed below, in function of their efficiency: From 73447dce1f8f9c208e9a11e85fbcadfc4d7e84ae Mon Sep 17 00:00:00 2001 From: Ruslan Kuprieiev Date: Tue, 17 Mar 2020 18:39:31 +0200 Subject: [PATCH 6/9] Update public/static/docs/user-guide/large-dataset-optimization.md Co-Authored-By: Jorge Orpinel --- public/static/docs/command-reference/config.md | 6 +++--- public/static/docs/user-guide/large-dataset-optimization.md | 4 ++-- 2 files changed, 5 insertions(+), 5 deletions(-) diff --git a/public/static/docs/command-reference/config.md b/public/static/docs/command-reference/config.md index cd58941a59..fa526efe14 100644 --- a/public/static/docs/command-reference/config.md +++ b/public/static/docs/command-reference/config.md @@ -127,9 +127,9 @@ for more details.) This section contains the following options: to protect user from accidental cache and repository corruption. ⚠️ If you manually set `cache.type` to `hardlink` or `symlink`, **you will - corrupt the cache** if you modify tracked data files in the workspace. In - an attempt to prevent that, DVC will automatically protect those file links - (make them read-only). Use `dvc unprotect` to be able to modify them safely. + corrupt the cache** if you modify tracked data files in the workspace. In an + attempt to prevent that, DVC will automatically protect those file links (make + them read-only). Use `dvc unprotect` to be able to modify them safely. There are pros and cons to different link types. Refer to [File link types](/doc/user-guide/large-dataset-optimization#file-link-types-for-the-dvc-cache) diff --git a/public/static/docs/user-guide/large-dataset-optimization.md b/public/static/docs/user-guide/large-dataset-optimization.md index 5c965b574c..ffba28f2ef 100644 --- a/public/static/docs/user-guide/large-dataset-optimization.md +++ b/public/static/docs/user-guide/large-dataset-optimization.md @@ -114,8 +114,8 @@ $ dvc config cache.type hardlink,symlink > Refer to `dvc config cache` for more details. -Note that your workspace files will be in read-only mode because of the efforts -to protect cache from corruption. Please refer to the +Note that with this `cache.type`, your workspace files will be in read-only mode +in order to protect the cache from corruption. Please refer to the [Update a Tracked File](/doc/user-guide/updating-tracked-files) on how to manage tracked files under these cache configurations. From a73a0f5264360338419c4d6899c5bf7c734f1093 Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Tue, 17 Mar 2020 11:04:15 -0600 Subject: [PATCH 7/9] Update public/static/docs/user-guide/large-dataset-optimization.md --- public/static/docs/user-guide/large-dataset-optimization.md | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/public/static/docs/user-guide/large-dataset-optimization.md b/public/static/docs/user-guide/large-dataset-optimization.md index ffba28f2ef..6e66ff7337 100644 --- a/public/static/docs/user-guide/large-dataset-optimization.md +++ b/public/static/docs/user-guide/large-dataset-optimization.md @@ -39,8 +39,10 @@ the benefits and none of the worries, they're not commonly supported in most platforms yet. Hard/soft links optimize **speed** and **space** in the file system, but may break your workflow since updating hard/sym-linked files tracked by DVC in the workspace causes cache corruption. To -protect against that, DVC makes hard/soft links read-only, forcing the user to -use `dvc unprotect` before modifying them. Finally, a 4th "linking" alternative +protect against that, DVC makes hardlinks and symlinks links read-only, which requires the user to +use `dvc unprotect` before modifying them. + +Finally, a 4th "linking" alternative is to actually copy files from/to the cache, which is safe but inefficient – especially for large files (several GBs or more). From 2ed2158714d8d80b83081d3a705c14ac635be02d Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Tue, 17 Mar 2020 11:04:25 -0600 Subject: [PATCH 8/9] Update public/static/docs/user-guide/large-dataset-optimization.md --- public/static/docs/user-guide/large-dataset-optimization.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/public/static/docs/user-guide/large-dataset-optimization.md b/public/static/docs/user-guide/large-dataset-optimization.md index 6e66ff7337..2ff1dfe1e1 100644 --- a/public/static/docs/user-guide/large-dataset-optimization.md +++ b/public/static/docs/user-guide/large-dataset-optimization.md @@ -55,7 +55,7 @@ especially for large files (several GBs or more). File link type benefits summary: -| `cache.type` | speed | space | no read-only links | +| `cache.type` | speed | space | editable | | ------------ | ----- | ----- | ------------------ | | `reflink` | x | x | x | | `hardlink` | x | x | | From 833ea3c379f33f1d4260b1c959f1c7debc8be597 Mon Sep 17 00:00:00 2001 From: Ruslan Kuprieiev Date: Tue, 17 Mar 2020 19:09:13 +0200 Subject: [PATCH 9/9] fix formatting --- .../user-guide/large-dataset-optimization.md | 20 +++++++++---------- 1 file changed, 10 insertions(+), 10 deletions(-) diff --git a/public/static/docs/user-guide/large-dataset-optimization.md b/public/static/docs/user-guide/large-dataset-optimization.md index 2ff1dfe1e1..76e5d6d42d 100644 --- a/public/static/docs/user-guide/large-dataset-optimization.md +++ b/public/static/docs/user-guide/large-dataset-optimization.md @@ -39,12 +39,12 @@ the benefits and none of the worries, they're not commonly supported in most platforms yet. Hard/soft links optimize **speed** and **space** in the file system, but may break your workflow since updating hard/sym-linked files tracked by DVC in the workspace causes cache corruption. To -protect against that, DVC makes hardlinks and symlinks links read-only, which requires the user to -use `dvc unprotect` before modifying them. +protect against that, DVC makes hardlinks and symlinks links read-only, which +requires the user to use `dvc unprotect` before modifying them. -Finally, a 4th "linking" alternative -is to actually copy files from/to the cache, which is safe but inefficient – -especially for large files (several GBs or more). +Finally, a 4th "linking" alternative is to actually copy files from/to the +cache, which is safe but inefficient – especially for large files (several GBs +or more). > Some versions of Windows (e.g. Windows Server 2012+ and Windows 10 Enterprise) > support hard or soft links on the @@ -56,11 +56,11 @@ especially for large files (several GBs or more). File link type benefits summary: | `cache.type` | speed | space | editable | -| ------------ | ----- | ----- | ------------------ | -| `reflink` | x | x | x | -| `hardlink` | x | x | | -| `symlink` | x | x | | -| `copy` | | | x | +| ------------ | ----- | ----- | -------- | +| `reflink` | x | x | x | +| `hardlink` | x | x | | +| `symlink` | x | x | | +| `copy` | | | x | Each file linking method is further detailed below, in function of their efficiency: