From 47e602f9ebc88b958374e510c954bbd6d2225d3e Mon Sep 17 00:00:00 2001 From: Alexander Schepanovski Date: Sat, 14 Dec 2019 17:06:05 +0700 Subject: [PATCH 01/42] cmd ref: add checkout --relink option --- static/docs/command-reference/checkout.md | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/static/docs/command-reference/checkout.md b/static/docs/command-reference/checkout.md index b882bd8672..12f4e543b5 100644 --- a/static/docs/command-reference/checkout.md +++ b/static/docs/command-reference/checkout.md @@ -6,7 +6,7 @@ DVC-files. ## Synopsis ```usage -usage: dvc checkout [-h] [-q | -v] [-d] [-f] [-R] +usage: dvc checkout [-h] [-q | -v] [-d] [-R] [-f] [--relink] [targets [targets ...]] positional arguments: @@ -95,6 +95,10 @@ be pulled from remote storage using `dvc pull`. remove files that don't match those DVC-file references or are missing from cache. (They are not "committed", in DVC terms.) +- `--relink` - recreates links or copies for all checked out files even ones + with unchanged checksums. This ensures that link types of all the files in a + workspace match configured `cache.type`. + - `-h`, `--help` - shows the help message and exit. - `-q`, `--quiet` - do not write anything to standard output. Exit with 0 if no From 0d7e4f8d25216640fc0a5ece4bd0c832c4fe9073 Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Fri, 27 Dec 2019 13:42:38 -0700 Subject: [PATCH 02/42] cmd ref: reword --relink option of checkout per https://github.com/iterative/dvc.org/pull/864#discussion_r361353886 --- static/docs/command-reference/checkout.md | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/static/docs/command-reference/checkout.md b/static/docs/command-reference/checkout.md index 12f4e543b5..c8ebf630a5 100644 --- a/static/docs/command-reference/checkout.md +++ b/static/docs/command-reference/checkout.md @@ -95,9 +95,12 @@ be pulled from remote storage using `dvc pull`. remove files that don't match those DVC-file references or are missing from cache. (They are not "committed", in DVC terms.) -- `--relink` - recreates links or copies for all checked out files even ones - with unchanged checksums. This ensures that link types of all the files in a - workspace match configured `cache.type`. +- `--relink` - recreates links or copies from cache to workspace for all data + files in the workspace, including the ones normally checked out by this + command, as well as existing ones that already have matching checksums in + current DVC-files. This ensures the link types of all the data files in the + workspace match the project's + [`cache.type`](/doc/command-reference/config#cache). - `-h`, `--help` - shows the help message and exit. From 73bbb3af9b2a8cd231770431252252c48df16da5 Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Fri, 27 Dec 2019 17:34:42 -0700 Subject: [PATCH 03/42] user-guide: link to `dvc checkout --relink` option per https://github.com/iterative/dvc.org/pull/864#issuecomment-567345548 --- .../user-guide/large-dataset-optimization.md | 23 ++++++++++++------- 1 file changed, 15 insertions(+), 8 deletions(-) diff --git a/static/docs/user-guide/large-dataset-optimization.md b/static/docs/user-guide/large-dataset-optimization.md index 5c29ed0f09..acf8fc7e9e 100644 --- a/static/docs/user-guide/large-dataset-optimization.md +++ b/static/docs/user-guide/large-dataset-optimization.md @@ -38,11 +38,11 @@ Symbolic links, and Reflinks in more recent systems. While reflinks bring all the benefits and none of the worries, they're not commonly supported in most platforms yet. Hard/soft links optimize **speed** and **space** in the file system, but may break your workflow since updating hard/sym-linked files tracked -by DVC in the workspace causes cache corruption. These 2 link types -thus require using cache **protected mode** (see the `cache.protected` config -option in `dvc config cache`). Finally, a 4th "linking" option is to actually -copy files from/to the cache, which is safe but inefficient – especially for -large files (several GBs or more). +by DVC in the workspace causes cache corruption. These +2 link types thus require using cache **protected mode** (see the +`cache.protected` config option in `dvc config cache`). Finally, a 4th "linking" +option is to actually copy files from/to the cache, which is safe but +inefficient – especially for large files (several GBs or more). > Some versions of Windows (e.g. Windows Server 2012+ and Windows 10 Enterprise) > support hard or soft links on the @@ -92,9 +92,9 @@ efficiency: 4. **`copy`**: An inefficient "linking" strategy, yet supported on all file systems. Using `copy` means there will be no file links, but that the tracked - files will be duplicated as copies existing in both the cache and workspace. - Suitable for scenarios with relatively small data files, where copying them - is not a storage performance concern. + files will be duplicated as copies existing in both the cache and + workspace. Suitable for scenarios with relatively small data + files, where copying them is not a storage performance concern. > DVC avoids `symlink` and `hardlink` types by default to protect user from > accidental cache corruption. Refer to the @@ -120,6 +120,13 @@ file link types. Please refer to the [Update a Tracked File](/doc/user-guide/updating-tracked-files) on how to manage tracked files under these cache configurations. +### Re-linking data in the workspace + +To re-create the file links in the workspace, for example after changing the +`cache.type` option for a project, please use +`dvc checkout --relink`. See +[checkout options](/doc/command-reference/checkout#options) for more details. + --- > \***copy-on-write links or "reflinks"** are a relatively new way to link files From 8e1fb170378e7e4c37f933b9bebe73a8ce210c3f Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Fri, 27 Dec 2019 17:39:29 -0700 Subject: [PATCH 04/42] cmd ref: add link to `dvc checkout --relink` in cache.type option of config --- static/docs/command-reference/config.md | 10 +++++++--- 1 file changed, 7 insertions(+), 3 deletions(-) diff --git a/static/docs/command-reference/config.md b/static/docs/command-reference/config.md index 9e82b0de7e..6a0eebacfa 100644 --- a/static/docs/command-reference/config.md +++ b/static/docs/command-reference/config.md @@ -55,7 +55,7 @@ corresponding config file. ## Configuration sections -The following config sections are written by this command to the project config +The following config sections are written by this command to the project config file (in `.dvc/config` by default), and they support the options below: ### core @@ -133,6 +133,10 @@ for more details.) This section contains the following options: [File link types](/doc/user-guide/large-dataset-optimization#file-link-types-for-the-dvc-cache) for a full explanation of each one. + To re-create the file links in the workspace after this option, please use + `dvc checkout --relink`. See + [checkout options](/doc/command-reference/checkout#options) for more details. + - `cache.slow_link_warning` - used to turn off the warnings about having a slow cache link type. These warnings are thrown by `dvc pull` and `dvc checkout` when linking files takes longer than usual, to remind them that there are @@ -169,8 +173,8 @@ for more details.) This section contains the following options: ### state -See [DVC Files and Directories](/doc/user-guide/dvc-files-and-directories) to learn -more about the state file (database) that is used for optimization. +See [DVC Files and Directories](/doc/user-guide/dvc-files-and-directories) to +learn more about the state file (database) that is used for optimization. - `state.row_limit` - maximum number of entries in the state database, which affects the physical size of the state file itself, as well as the performance From d322dacd6bd1cc8ec4de92d913e0c1a5277a896d Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Fri, 27 Dec 2019 18:28:32 -0700 Subject: [PATCH 05/42] cmd ref: rewrite description and --relink option in checkout per https://github.com/iterative/dvc.org/pull/864#discussion_r361741531 --- static/docs/command-reference/checkout.md | 71 ++++++++++------------- 1 file changed, 32 insertions(+), 39 deletions(-) diff --git a/static/docs/command-reference/checkout.md b/static/docs/command-reference/checkout.md index c8ebf630a5..7cdea3d57d 100644 --- a/static/docs/command-reference/checkout.md +++ b/static/docs/command-reference/checkout.md @@ -17,39 +17,35 @@ positional arguments: ## Description [DVC-files](/doc/user-guide/dvc-file-format) in a project specify -which instance of each data file or directory is to be used, using the checksum -saved in the `outs` fields. The `dvc checkout` command updates the workspace -data to match with the cached files corresponding to those -checksums. - -Using an SCM like Git, the DVC-files are kept under version control. At a given -branch or tag of the SCM repository, the DVC-files will contain checksums for -the corresponding data files kept in the cache. After an SCM command like -`git checkout` is run, the DVC-files will change to the state at the specified -branch or commit or tag. Afterwards, the `dvc checkout` command is required in -order to synchronize the data files with the currently checked out DVC-files. - -This command must be executed after `git checkout` since Git doesn't track files -that are under DVC control. For convenience a Git hook is available, simply by -running `dvc install`, that will automate running `dvc checkout` after -`git checkout`. See `dvc install` for more information. +which instance of each data file or directory should be used, with the checksums +saved in the `outs` field. The `dvc checkout` command updates the workspace data +to match with the cached files that correspond to those checksums. + +When using Git, the DVC-files can be kept under version control. DVC-files in +different branches or tags may contain checksums for different data files, saved +by DVC in the project's cache. The DVC-files in the workspace, or the value of +their `outs` fields can change when using `git checkout`. The `dvc checkout` +command is required in this situation, in order to synchronize the data files +(tracked by DVC) with the checked out DVC-files. + +For convenience a Git hook is available to automate running `dvc checkout` after +`git checkout`. To install it, use `dvc install`. The execution of `dvc checkout` does: -- Scan the `outs` entries in DVC-files to compare with the currently checked out - data files. The scanned DVC-files is limited by the listed `targets` (if any) - on the command line. And if the `--with-deps` option is specified, it scans - backward from the given `targets` in the corresponding - [pipeline](/doc/command-reference/pipeline). +- Scan the `outs` entries in DVC-files to compare with the outputs + currently in the workspace. Scanning is limited to the given + `targets` (if any). -- For any data files where the checksum doesn't match their DVC-file entry, the - data file is restored from the cache. The link strategy used (`reflink`, - `hardlink`, `symlink`, or `copy`) depends on the OS and the configured value - for `cache.type` – See `dvc config cache`. +- Missing data files or directories, or those with checksums that don't match + any DVC-file, are restored from the cache. If the `--relink` option is used, + all outputs in the workspace are recreated (overwritten). The + file linking strategy used (`reflink`, `hardlink`, `symlink`, or `copy`) + depends on the OS, and on the configured value for `cache.type`. (See + `dvc config cache`.) -Note that this command by default tries NOT to copy files between the cache and -the workspace, using reflinks instead when supported by the file system. (Refer -to +By default, this command tries not to copy files between the cache and the +workspace, using reflinks instead when supported by the file system. (Refer to [File link types](/doc/user-guide/large-dataset-optimization#file-link-types-for-the-dvc-cache).) The next linking strategy default value is `copy` though, so unless other file link types are manually configured in `cache.type` (using `dvc config`), files @@ -64,13 +60,10 @@ restoring any file size will be almost instantaneous. > the faster link types available. These warnings can be turned off setting the > `cache.slow_link_warning` config option to `false` with `dvc config cache`. -The output of `dvc checkout` does not list which data files were restored. It -does report removed files and files that DVC was unable to restore because -they're missing from the cache. - This command will fail to checkout files that are missing from the cache. In -such a case, `dvc checkout` prints a warning message. Any files that can be -checked out without error will be restored. +such a case, `dvc checkout` prints a warning message. It also lists removed +files. Any files that can be checked out without error will be restored without +being reported individually. There are two methods to restore a file missing from the cache, depending on the situation. In some cases a pipeline must be reproduced (using `dvc repro`) to @@ -95,11 +88,11 @@ be pulled from remote storage using `dvc pull`. remove files that don't match those DVC-file references or are missing from cache. (They are not "committed", in DVC terms.) -- `--relink` - recreates links or copies from cache to workspace for all data - files in the workspace, including the ones normally checked out by this - command, as well as existing ones that already have matching checksums in - current DVC-files. This ensures the link types of all the data files in the - workspace match the project's +- `--relink` - recreates (overwrites) links or copies from cache to workspace + for all outputs in the workspace, including the ones normally + checked out by this command, **as well as existing ones** with matching + checksums in current DVC-files. This ensures the link types of all the data + files in the workspace match the project's [`cache.type`](/doc/command-reference/config#cache). - `-h`, `--help` - shows the help message and exit. From 8530e9b0ebf5f05c4eb1309b73bf8bc189cee038 Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Thu, 2 Jan 2020 16:38:59 -0600 Subject: [PATCH 06/42] checkout: rephrase --relink option explanation per https://github.com/iterative/dvc.org/pull/864#issuecomment-569589010 --- static/docs/command-reference/checkout.md | 9 ++++----- 1 file changed, 4 insertions(+), 5 deletions(-) diff --git a/static/docs/command-reference/checkout.md b/static/docs/command-reference/checkout.md index 7cdea3d57d..32562e5d4e 100644 --- a/static/docs/command-reference/checkout.md +++ b/static/docs/command-reference/checkout.md @@ -88,11 +88,10 @@ be pulled from remote storage using `dvc pull`. remove files that don't match those DVC-file references or are missing from cache. (They are not "committed", in DVC terms.) -- `--relink` - recreates (overwrites) links or copies from cache to workspace - for all outputs in the workspace, including the ones normally - checked out by this command, **as well as existing ones** with matching - checksums in current DVC-files. This ensures the link types of all the data - files in the workspace match the project's +- `--relink` - recreates (overwrites) file links or copies, from cache to + workspace, of **all outputs** referenced in current DVC-files + (regardless of whether the checksums match a DVC-file). This ensures the link + types of all the data files in the workspace are consistent with the project's [`cache.type`](/doc/command-reference/config#cache). - `-h`, `--help` - shows the help message and exit. From 1c783b05659339015f05cf0bc826addf19859859 Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Thu, 2 Jan 2020 16:55:08 -0600 Subject: [PATCH 07/42] checkout: small impro per https://github.com/iterative/dvc.org/pull/864#pullrequestreview-337893097 --- static/docs/command-reference/checkout.md | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/static/docs/command-reference/checkout.md b/static/docs/command-reference/checkout.md index 32562e5d4e..6815ae9fa9 100644 --- a/static/docs/command-reference/checkout.md +++ b/static/docs/command-reference/checkout.md @@ -21,12 +21,12 @@ which instance of each data file or directory should be used, with the checksums saved in the `outs` field. The `dvc checkout` command updates the workspace data to match with the cached files that correspond to those checksums. -When using Git, the DVC-files can be kept under version control. DVC-files in -different branches or tags may contain checksums for different data files, saved -by DVC in the project's cache. The DVC-files in the workspace, or the value of -their `outs` fields can change when using `git checkout`. The `dvc checkout` -command is required in this situation, in order to synchronize the data files -(tracked by DVC) with the checked out DVC-files. +When using Git, the different DVC-files versioned in separate branches or tags +may contain checksums for different data files (saved by DVC in the project's +cache). So when using `git checkout`, the DVC-files in the +workspace, or the value of their `outs` fields, can change. The +`dvc checkout` command is required in this situation, in order to synchronize +the data files (tracked by DVC) with the checked out DVC-files. For convenience a Git hook is available to automate running `dvc checkout` after `git checkout`. To install it, use `dvc install`. From 101c400021583d755475a4855077f7ec1eb6608d Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Thu, 2 Jan 2020 20:30:16 -0600 Subject: [PATCH 08/42] cmd ref: improve link from config to checkout --relink per https://github.com/iterative/dvc.org/pull/864#pullrequestreview-337895358 --- static/docs/command-reference/config.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/static/docs/command-reference/config.md b/static/docs/command-reference/config.md index 5f38d4f030..92289a71cc 100644 --- a/static/docs/command-reference/config.md +++ b/static/docs/command-reference/config.md @@ -139,8 +139,8 @@ for more details.) This section contains the following options: [File link types](/doc/user-guide/large-dataset-optimization#file-link-types-for-the-dvc-cache) for a full explanation of each one. - To re-create the file links in the workspace after this option, please use - `dvc checkout --relink`. See + To apply changes to this option in the workspace, by recreating file + links/copies from cache, please use `dvc checkout --relink`. See [checkout options](/doc/command-reference/checkout#options) for more details. - `cache.slow_link_warning` - used to turn off the warnings about having a slow From 9ba0a4b1f0363e42c65e163cc9b43a4eebcd3ce0 Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Thu, 2 Jan 2020 21:28:05 -0600 Subject: [PATCH 09/42] cmd ref: update checkout description per https://github.com/iterative/dvc.org/pull/864#pullrequestreview-337893478 and https://github.com/iterative/dvc.org/pull/864#pullrequestreview-337893629 --- static/docs/command-reference/checkout.md | 37 ++++++++++++----------- 1 file changed, 19 insertions(+), 18 deletions(-) diff --git a/static/docs/command-reference/checkout.md b/static/docs/command-reference/checkout.md index 6815ae9fa9..45a0c51ef8 100644 --- a/static/docs/command-reference/checkout.md +++ b/static/docs/command-reference/checkout.md @@ -17,25 +17,26 @@ positional arguments: ## Description [DVC-files](/doc/user-guide/dvc-file-format) in a project specify -which instance of each data file or directory should be used, with the checksums -saved in the `outs` field. The `dvc checkout` command updates the workspace data -to match with the cached files that correspond to those checksums. +which data files or directories from the cache should be in use. +DVC saves data file checksums in the `outs` fields inside DVC-files for this. -When using Git, the different DVC-files versioned in separate branches or tags -may contain checksums for different data files (saved by DVC in the project's -cache). So when using `git checkout`, the DVC-files in the -workspace, or the value of their `outs` fields, can change. The -`dvc checkout` command is required in this situation, in order to synchronize -the data files (tracked by DVC) with the checked out DVC-files. +When using Git, different DVC-files versioned in separate +[revisions](https://git-scm.com/book/en/v2/Git-Internals-Git-References) +probably specify different data files from the cache. When switching to those +versions (with Git commands such as `git checkout`), the current DVC-files will +no longer match with all of the data in the workspace. -For convenience a Git hook is available to automate running `dvc checkout` after -`git checkout`. To install it, use `dvc install`. +The `dvc checkout` command synchronizes the workspace data to match with the +current DVC-files, using a mechanism described below. -The execution of `dvc checkout` does: +💡 For convenience, a Git hook is available to automate running `dvc checkout` +after `git checkout`. Use `dvc install` to install it. -- Scan the `outs` entries in DVC-files to compare with the outputs - currently in the workspace. Scanning is limited to the given - `targets` (if any). +The execution of `dvc checkout` does the following: + +- Scans the `outs` field values in DVC-files to compare with the + outputs currently in the workspace. Scanning is + limited to the given `targets` (if any). - Missing data files or directories, or those with checksums that don't match any DVC-file, are restored from the cache. If the `--relink` option is used, @@ -45,14 +46,14 @@ The execution of `dvc checkout` does: `dvc config cache`.) By default, this command tries not to copy files between the cache and the -workspace, using reflinks instead when supported by the file system. (Refer to +workspace, using reflinks instead, when supported by the file system. (Refer to [File link types](/doc/user-guide/large-dataset-optimization#file-link-types-for-the-dvc-cache).) The next linking strategy default value is `copy` though, so unless other file link types are manually configured in `cache.type` (using `dvc config`), files will be copied. Keep in mind that having file copies doesn't present much of a negative impact unless the project uses very large data (several GBs or more). -But leveraging file links is crucial for large files where checking out a 50Gb -by copying file might take a few minutes for example, whereas with links, +But leveraging file links is crucial with large files, for example when checking +out a 50Gb file by copying might take a few minutes whereas, with links, restoring any file size will be almost instantaneous. > When linking files takes longer than expected (10 seconds for any one file) From 56f0780a7efdde0cda506e041ee9e178994339d3 Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Thu, 2 Jan 2020 21:56:37 -0600 Subject: [PATCH 10/42] cmd ref: introduce "outputs" term (and tooltip) earlier in the description of checkout per https://github.com/iterative/dvc.org/pull/864#pullrequestreview-337893874 --- static/docs/command-reference/checkout.md | 18 +++++++++--------- 1 file changed, 9 insertions(+), 9 deletions(-) diff --git a/static/docs/command-reference/checkout.md b/static/docs/command-reference/checkout.md index 45a0c51ef8..4bed54ba7a 100644 --- a/static/docs/command-reference/checkout.md +++ b/static/docs/command-reference/checkout.md @@ -17,8 +17,9 @@ positional arguments: ## Description [DVC-files](/doc/user-guide/dvc-file-format) in a project specify -which data files or directories from the cache should be in use. -DVC saves data file checksums in the `outs` fields inside DVC-files for this. +which data files or directories from the cache should be in use. We +call these files outputs, and their checksums are saved in the +`outs` fields inside DVC-files to achieve this. When using Git, different DVC-files versioned in separate [revisions](https://git-scm.com/book/en/v2/Git-Internals-Git-References) @@ -34,16 +35,15 @@ after `git checkout`. Use `dvc install` to install it. The execution of `dvc checkout` does the following: -- Scans the `outs` field values in DVC-files to compare with the - outputs currently in the workspace. Scanning is - limited to the given `targets` (if any). +- Scans the `outs` field values in DVC-files to compare with the outputs + currently in the workspace. Scanning is limited to the given + `targets` (if any). - Missing data files or directories, or those with checksums that don't match any DVC-file, are restored from the cache. If the `--relink` option is used, - all outputs in the workspace are recreated (overwritten). The - file linking strategy used (`reflink`, `hardlink`, `symlink`, or `copy`) - depends on the OS, and on the configured value for `cache.type`. (See - `dvc config cache`.) + all outputs in the workspace are recreated (overwritten). The file linking + strategy used (`reflink`, `hardlink`, `symlink`, or `copy`) depends on the OS, + and on the configured value for `cache.type`. (See `dvc config cache`.) By default, this command tries not to copy files between the cache and the workspace, using reflinks instead, when supported by the file system. (Refer to From 837574625be2c6e089090260cdbf55a7671012b8 Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Thu, 2 Jan 2020 22:35:52 -0600 Subject: [PATCH 11/42] cmd ref: rewrite --relink option desc. in checkout (again) per https://github.com/iterative/dvc.org/pull/864#pullrequestreview-337896034 --- static/docs/command-reference/checkout.md | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/static/docs/command-reference/checkout.md b/static/docs/command-reference/checkout.md index 4bed54ba7a..f6fd140015 100644 --- a/static/docs/command-reference/checkout.md +++ b/static/docs/command-reference/checkout.md @@ -41,9 +41,9 @@ The execution of `dvc checkout` does the following: - Missing data files or directories, or those with checksums that don't match any DVC-file, are restored from the cache. If the `--relink` option is used, - all outputs in the workspace are recreated (overwritten). The file linking - strategy used (`reflink`, `hardlink`, `symlink`, or `copy`) depends on the OS, - and on the configured value for `cache.type`. (See `dvc config cache`.) + all outputs in the workspace are recreated. The file linking strategy used + (`reflink`, `hardlink`, `symlink`, or `copy`) depends on the OS, and on the + configured value for `cache.type`. (See `dvc config cache`.) By default, this command tries not to copy files between the cache and the workspace, using reflinks instead, when supported by the file system. (Refer to @@ -89,11 +89,11 @@ be pulled from remote storage using `dvc pull`. remove files that don't match those DVC-file references or are missing from cache. (They are not "committed", in DVC terms.) -- `--relink` - recreates (overwrites) file links or copies, from cache to - workspace, of **all outputs** referenced in current DVC-files - (regardless of whether the checksums match a DVC-file). This ensures the link - types of all the data files in the workspace are consistent with the project's - [`cache.type`](/doc/command-reference/config#cache). +- `--relink` - recreates **all outputs** referenced in current + DVC-files (regardless of whether the checksums match a DVC-file). This means + overwriting the file links or copies from cache to workspace. This ensures the + link types of all the data files in the workspace are consistent with the + project's [`cache.type`](/doc/command-reference/config#cache). - `-h`, `--help` - shows the help message and exit. From f2a5d61254fe60ffb0c112ce2f4b0a1d2de1f7fb Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Thu, 2 Jan 2020 22:40:56 -0600 Subject: [PATCH 12/42] cmd ref: switch order of sentences in checkout --relink option desc. per https://github.com/iterative/dvc.org/pull/864#pullrequestreview-337896338 --- static/docs/command-reference/checkout.md | 11 ++++++----- 1 file changed, 6 insertions(+), 5 deletions(-) diff --git a/static/docs/command-reference/checkout.md b/static/docs/command-reference/checkout.md index f6fd140015..e34d5d392e 100644 --- a/static/docs/command-reference/checkout.md +++ b/static/docs/command-reference/checkout.md @@ -89,11 +89,12 @@ be pulled from remote storage using `dvc pull`. remove files that don't match those DVC-file references or are missing from cache. (They are not "committed", in DVC terms.) -- `--relink` - recreates **all outputs** referenced in current - DVC-files (regardless of whether the checksums match a DVC-file). This means - overwriting the file links or copies from cache to workspace. This ensures the - link types of all the data files in the workspace are consistent with the - project's [`cache.type`](/doc/command-reference/config#cache). +- `--relink` - ensures the link types of all the data files in the workspace are + consistent with the project's + [`cache.type`](/doc/command-reference/config#cache). This is achieved by + recreating **all outputs** referenced in current DVC-files + (regardless of whether the checksums match a DVC-file). This means overwriting + the file links or copies from cache to workspace. - `-h`, `--help` - shows the help message and exit. From 7deb5e29a748005b36be7a3e2bc3e6b78988b21d Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Fri, 3 Jan 2020 13:41:17 -0600 Subject: [PATCH 13/42] cmd ref: remove --relink sentence from description, mention all options per https://github.com/iterative/dvc.org/pull/864#pullrequestreview-338233895 --- static/docs/command-reference/checkout.md | 32 +++++++++++------------ 1 file changed, 15 insertions(+), 17 deletions(-) diff --git a/static/docs/command-reference/checkout.md b/static/docs/command-reference/checkout.md index e34d5d392e..e13ba23c5e 100644 --- a/static/docs/command-reference/checkout.md +++ b/static/docs/command-reference/checkout.md @@ -37,13 +37,11 @@ The execution of `dvc checkout` does the following: - Scans the `outs` field values in DVC-files to compare with the outputs currently in the workspace. Scanning is limited to the given - `targets` (if any). + `targets` (if any). See also the `--recursive` option below. - Missing data files or directories, or those with checksums that don't match - any DVC-file, are restored from the cache. If the `--relink` option is used, - all outputs in the workspace are recreated. The file linking strategy used - (`reflink`, `hardlink`, `symlink`, or `copy`) depends on the OS, and on the - configured value for `cache.type`. (See `dvc config cache`.) + any DVC-file, are restored from the cache. See also options `--force`, + `--with-deps`, and `--relink`. By default, this command tries not to copy files between the cache and the workspace, using reflinks instead, when supported by the file system. (Refer to @@ -73,12 +71,6 @@ be pulled from remote storage using `dvc pull`. ## Options -- `-d`, `--with-deps` - determine files to update by tracking dependencies to - the target DVC-files (stages). This option only has effect when one or more - `targets` are specified. By traversing all stage dependencies, DVC searches - backward from the target stages in the corresponding pipelines. This means DVC - will not checkout files referenced in later stages than the `targets`. - - `-R`, `--recursive` - `targets` is expected to contain at least one directory path for this option to have effect. Determines the files to checkout by searching each target directory and its subdirectories for DVC-files to @@ -89,12 +81,18 @@ be pulled from remote storage using `dvc pull`. remove files that don't match those DVC-file references or are missing from cache. (They are not "committed", in DVC terms.) -- `--relink` - ensures the link types of all the data files in the workspace are - consistent with the project's - [`cache.type`](/doc/command-reference/config#cache). This is achieved by - recreating **all outputs** referenced in current DVC-files - (regardless of whether the checksums match a DVC-file). This means overwriting - the file links or copies from cache to workspace. +- `-d`, `--with-deps` - determine files to update by tracking dependencies to + the target DVC-files (stages). This option only has effect when one or more + `targets` are specified. By traversing all stage dependencies, DVC searches + backward from the target stages in the corresponding pipelines. This means DVC + will not checkout files referenced in later stages than the `targets`. + +- `--relink` - ensures the file linking strategy (`reflink`, `hardlink`, + `symlink`, or `copy`) for all data files in the workspace is consistent with + the project's [`cache.type`](/doc/command-reference/config#cache). This is + achieved by recreating **all outputs** referenced in current + DVC-files (regardless of whether the checksums match a DVC-file). This means + overwriting the file links or copies from cache to workspace. - `-h`, `--help` - shows the help message and exit. From b86d5affb585a00f76a6c2c5ca62f7acec8e6752 Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Fri, 3 Jan 2020 16:18:45 -0600 Subject: [PATCH 14/42] cmd ref: simplify checkout description intro per https://github.com/iterative/dvc.org/pull/864#discussion_r362923144 --- static/docs/command-reference/checkout.md | 24 +++++++++++------------ 1 file changed, 12 insertions(+), 12 deletions(-) diff --git a/static/docs/command-reference/checkout.md b/static/docs/command-reference/checkout.md index e13ba23c5e..1f9bd8df46 100644 --- a/static/docs/command-reference/checkout.md +++ b/static/docs/command-reference/checkout.md @@ -16,19 +16,19 @@ positional arguments: ## Description -[DVC-files](/doc/user-guide/dvc-file-format) in a project specify -which data files or directories from the cache should be in use. We -call these files outputs, and their checksums are saved in the -`outs` fields inside DVC-files to achieve this. - -When using Git, different DVC-files versioned in separate -[revisions](https://git-scm.com/book/en/v2/Git-Internals-Git-References) -probably specify different data files from the cache. When switching to those -versions (with Git commands such as `git checkout`), the current DVC-files will -no longer match with all of the data in the workspace. - The `dvc checkout` command synchronizes the workspace data to match with the -current DVC-files, using a mechanism described below. +current [DVC-files](/doc/user-guide/dvc-file-format) in the +project. DVC knows which data files (a.k.a. outputs) +to use because their checksums are saved in the `outs` fields inside the +DVC-files. + +This is useful when the project is a DVC repository, since +DVC-files versioned in different +[revisions](https://git-scm.com/book/en/v2/Git-Internals-Git-References) will +specify different outputs. When switching to those versions (with Git commands +such as `git checkout`), the current DVC-files will no longer match with all of +the data in the workspace, and so `dvc checkout` will be needed at +that point. 💡 For convenience, a Git hook is available to automate running `dvc checkout` after `git checkout`. Use `dvc install` to install it. From f9395b40b7e15d42c5849e6442feeafe23ccb91e Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Fri, 3 Jan 2020 21:36:22 -0600 Subject: [PATCH 15/42] cmd ref: further simplify checkout desc. intro and remove "output" term repetition per https://github.com/iterative/dvc.org/pull/864#discussion_r362993186 and https://github.com/iterative/dvc.org/pull/864#pullrequestreview-338313569 --- static/docs/command-reference/checkout.md | 36 ++++++++++------------- 1 file changed, 16 insertions(+), 20 deletions(-) diff --git a/static/docs/command-reference/checkout.md b/static/docs/command-reference/checkout.md index 1f9bd8df46..077639c199 100644 --- a/static/docs/command-reference/checkout.md +++ b/static/docs/command-reference/checkout.md @@ -16,32 +16,28 @@ positional arguments: ## Description -The `dvc checkout` command synchronizes the workspace data to match with the -current [DVC-files](/doc/user-guide/dvc-file-format) in the -project. DVC knows which data files (a.k.a. outputs) -to use because their checksums are saved in the `outs` fields inside the -DVC-files. +[DVC-files](/doc/user-guide/dvc-file-format) are essentially placeholders that +point to the actual data files or a directories under DVC control. This command +synchronizes the workspace data with the versions specified in the current +DVC-files. DVC knows which data files (outputs) to use because +their checksums are saved in the `outs` fields inside the DVC-files. -This is useful when the project is a DVC repository, since -DVC-files versioned in different -[revisions](https://git-scm.com/book/en/v2/Git-Internals-Git-References) will -specify different outputs. When switching to those versions (with Git commands -such as `git checkout`), the current DVC-files will no longer match with all of -the data in the workspace, and so `dvc checkout` will be needed at -that point. +`dvc checkout` is useful when using Git in the project, after +`git clone`, `git checkout`, or any other repository operations that change the +currently present DVC-files. 💡 For convenience, a Git hook is available to automate running `dvc checkout` after `git checkout`. Use `dvc install` to install it. The execution of `dvc checkout` does the following: -- Scans the `outs` field values in DVC-files to compare with the outputs - currently in the workspace. Scanning is limited to the given - `targets` (if any). See also the `--recursive` option below. +- Scans the `outs` field values in DVC-files to compare with the data files or a + directories currently in the workspace. Scanning is limited to + the given `targets` (if any). See also the `--recursive` option below. - Missing data files or directories, or those with checksums that don't match - any DVC-file, are restored from the cache. See also options `--force`, - `--with-deps`, and `--relink`. + any DVC-file, are restored from the cache. See also options + `--force`, `--with-deps`, and `--relink`. By default, this command tries not to copy files between the cache and the workspace, using reflinks instead, when supported by the file system. (Refer to @@ -90,9 +86,9 @@ be pulled from remote storage using `dvc pull`. - `--relink` - ensures the file linking strategy (`reflink`, `hardlink`, `symlink`, or `copy`) for all data files in the workspace is consistent with the project's [`cache.type`](/doc/command-reference/config#cache). This is - achieved by recreating **all outputs** referenced in current - DVC-files (regardless of whether the checksums match a DVC-file). This means - overwriting the file links or copies from cache to workspace. + achieved by recreating **all data files or a directories** referenced in + current DVC-files (regardless of whether the checksums match a DVC-file). This + means overwriting the file links or copies from cache to workspace. - `-h`, `--help` - shows the help message and exit. From 85eec5ffa7e6880f9e0675bc5fcae5d57ee9f4ec Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Fri, 3 Jan 2020 21:41:13 -0600 Subject: [PATCH 16/42] cmd ref: move `--with-deps` to the first bullet in checkout desc. per https://github.com/iterative/dvc.org/pull/864#pullrequestreview-338312734 --- static/docs/command-reference/checkout.md | 19 ++++++++++--------- 1 file changed, 10 insertions(+), 9 deletions(-) diff --git a/static/docs/command-reference/checkout.md b/static/docs/command-reference/checkout.md index 077639c199..57d67f2f58 100644 --- a/static/docs/command-reference/checkout.md +++ b/static/docs/command-reference/checkout.md @@ -33,11 +33,12 @@ The execution of `dvc checkout` does the following: - Scans the `outs` field values in DVC-files to compare with the data files or a directories currently in the workspace. Scanning is limited to - the given `targets` (if any). See also the `--recursive` option below. + the given `targets` (if any). See also options `--with-deps` and `--recursive` + below. - Missing data files or directories, or those with checksums that don't match - any DVC-file, are restored from the cache. See also options - `--force`, `--with-deps`, and `--relink`. + any DVC-file, are restored from the cache. See options `--force` + and `--relink`. By default, this command tries not to copy files between the cache and the workspace, using reflinks instead, when supported by the file system. (Refer to @@ -67,6 +68,12 @@ be pulled from remote storage using `dvc pull`. ## Options +- `-d`, `--with-deps` - determine files to update by tracking dependencies to + the target DVC-files (stages). This option only has effect when one or more + `targets` are specified. By traversing all stage dependencies, DVC searches + backward from the target stages in the corresponding pipelines. This means DVC + will not checkout files referenced in later stages than the `targets`. + - `-R`, `--recursive` - `targets` is expected to contain at least one directory path for this option to have effect. Determines the files to checkout by searching each target directory and its subdirectories for DVC-files to @@ -77,12 +84,6 @@ be pulled from remote storage using `dvc pull`. remove files that don't match those DVC-file references or are missing from cache. (They are not "committed", in DVC terms.) -- `-d`, `--with-deps` - determine files to update by tracking dependencies to - the target DVC-files (stages). This option only has effect when one or more - `targets` are specified. By traversing all stage dependencies, DVC searches - backward from the target stages in the corresponding pipelines. This means DVC - will not checkout files referenced in later stages than the `targets`. - - `--relink` - ensures the file linking strategy (`reflink`, `hardlink`, `symlink`, or `copy`) for all data files in the workspace is consistent with the project's [`cache.type`](/doc/command-reference/config#cache). This is From 10d1eb63a53023a3396890de9947f1e255abf6bb Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Fri, 3 Jan 2020 21:47:19 -0600 Subject: [PATCH 17/42] cmd ref: simplify bullets in checkout desc. per https://github.com/iterative/dvc.org/pull/864#pullrequestreview-338315176 --- static/docs/command-reference/checkout.md | 15 +++++++-------- 1 file changed, 7 insertions(+), 8 deletions(-) diff --git a/static/docs/command-reference/checkout.md b/static/docs/command-reference/checkout.md index 57d67f2f58..2fa6a16a0f 100644 --- a/static/docs/command-reference/checkout.md +++ b/static/docs/command-reference/checkout.md @@ -31,14 +31,13 @@ after `git checkout`. Use `dvc install` to install it. The execution of `dvc checkout` does the following: -- Scans the `outs` field values in DVC-files to compare with the data files or a - directories currently in the workspace. Scanning is limited to - the given `targets` (if any). See also options `--with-deps` and `--recursive` - below. - -- Missing data files or directories, or those with checksums that don't match - any DVC-file, are restored from the cache. See options `--force` - and `--relink`. +- Scans the DVC-files to compare vs. the data files or directories currently in + the workspace. Scanning is limited to the given `targets` (if + any). See also options `--with-deps` and `--recursive` below. + +- Missing data files or directories, or those that don't match with any + DVC-file, are restored from the cache. See options `--force` and + `--relink`. By default, this command tries not to copy files between the cache and the workspace, using reflinks instead, when supported by the file system. (Refer to From 7392e5c6af712ce597904f012c33ccfb9635b9eb Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Fri, 3 Jan 2020 21:52:20 -0600 Subject: [PATCH 18/42] cmd ref: remove "checksum" term from checkout --relink option desc. per https://github.com/iterative/dvc.org/pull/864#discussion_r360175115 and https://github.com/iterative/dvc.org/pull/864#discussion_r363014159 --- static/docs/command-reference/checkout.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/static/docs/command-reference/checkout.md b/static/docs/command-reference/checkout.md index 2fa6a16a0f..4e8263dfb9 100644 --- a/static/docs/command-reference/checkout.md +++ b/static/docs/command-reference/checkout.md @@ -87,7 +87,7 @@ be pulled from remote storage using `dvc pull`. `symlink`, or `copy`) for all data files in the workspace is consistent with the project's [`cache.type`](/doc/command-reference/config#cache). This is achieved by recreating **all data files or a directories** referenced in - current DVC-files (regardless of whether the checksums match a DVC-file). This + current DVC-files (regardless of whether they match a current DVC-file). This means overwriting the file links or copies from cache to workspace. - `-h`, `--help` - shows the help message and exit. From 1813f942e47aecfd5c2009fd8347956cfbeb2975 Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Fri, 3 Jan 2020 21:59:55 -0600 Subject: [PATCH 19/42] term: replace "recreate" by "restore" for checkout --relink option, et al. per https://github.com/iterative/dvc.org/pull/864#pullrequestreview-338315859 --- static/docs/command-reference/checkout.md | 2 +- static/docs/command-reference/config.md | 2 +- static/docs/understanding-dvc/related-technologies.md | 8 ++++---- 3 files changed, 6 insertions(+), 6 deletions(-) diff --git a/static/docs/command-reference/checkout.md b/static/docs/command-reference/checkout.md index 4e8263dfb9..70fcb9e27d 100644 --- a/static/docs/command-reference/checkout.md +++ b/static/docs/command-reference/checkout.md @@ -86,7 +86,7 @@ be pulled from remote storage using `dvc pull`. - `--relink` - ensures the file linking strategy (`reflink`, `hardlink`, `symlink`, or `copy`) for all data files in the workspace is consistent with the project's [`cache.type`](/doc/command-reference/config#cache). This is - achieved by recreating **all data files or a directories** referenced in + achieved by restoring **all data files or a directories** referenced in current DVC-files (regardless of whether they match a current DVC-file). This means overwriting the file links or copies from cache to workspace. diff --git a/static/docs/command-reference/config.md b/static/docs/command-reference/config.md index 92289a71cc..8753002f62 100644 --- a/static/docs/command-reference/config.md +++ b/static/docs/command-reference/config.md @@ -139,7 +139,7 @@ for more details.) This section contains the following options: [File link types](/doc/user-guide/large-dataset-optimization#file-link-types-for-the-dvc-cache) for a full explanation of each one. - To apply changes to this option in the workspace, by recreating file + To apply changes to this option in the workspace, by restoring all file links/copies from cache, please use `dvc checkout --relink`. See [checkout options](/doc/command-reference/checkout#options) for more details. diff --git a/static/docs/understanding-dvc/related-technologies.md b/static/docs/understanding-dvc/related-technologies.md index 2dd2cb7c57..bf5dffa5bf 100644 --- a/static/docs/understanding-dvc/related-technologies.md +++ b/static/docs/understanding-dvc/related-technologies.md @@ -100,11 +100,11 @@ http://studio.ml/ - Git-annex is a datafile-centric system whereas DVC is focused on providing a workflow for machine learning and reproducible experiments. When a DVC or Git-annex repository is cloned via `git clone`, data files won't be copied to - the local machine as file contents are stored in separate + the local machine, as file contents are stored in separate [remotes](/doc/command-reference/remote). With DVC, - [DVC-files](/doc/user-guide/dvc-file-format) (that provide the reproducible - workflow) are always included in the Git repository and hence can be recreated - locally with minimal effort. + [DVC-files](/doc/user-guide/dvc-file-format), which provide the reproducible + workflow, are always included in the Git repository. Hence, they can be + executed locally with minimal effort. - DVC is not fundamentally bound to Git, and users have the option of changing the repository format. From ada1e84a4910a91c0a8ed3a4bbbbbd56e9467c2b Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Sat, 4 Jan 2020 15:23:15 -0600 Subject: [PATCH 20/42] engine: improve comments aroud the engine load time, reformat sidebar.json --- server.js | 13 +++++++++---- src/utils/sidebar.js | 25 +++++++++++++++---------- static/docs/sidebar.json | 3 +-- 3 files changed, 25 insertions(+), 16 deletions(-) diff --git a/server.js b/server.js index 4f85090804..25aac30b58 100644 --- a/server.js +++ b/server.js @@ -1,9 +1,13 @@ /* eslint-env node */ -// This file doesn't go through babel or webpack transformation. Make sure the -// syntax and sources this file requires are compatible with the current Node.js -// version you are running. (See https://github.com/zeit/next.js/issues/1245 for -// discussions on universal Webpack vs universal Babel.) +/* + * Custom server (with custom routes) See + * https://nextjs.org/docs/advanced-features/custom-server + * + * NOTE: This file doesn't go through babel or webpack. Make sure the syntax and + * sources this file requires are compatible with the current node version you + * are running. + */ const { createServer } = require('http') const { parse } = require('url') @@ -111,6 +115,7 @@ app.prepare().then(() => { res.statusCode = 404 } + // Custom route for all docs ("engine" based on /pages/doc.js page) app.render(req, res, '/doc', query) } } else { diff --git a/src/utils/sidebar.js b/src/utils/sidebar.js index d58130148f..071538f924 100644 --- a/src/utils/sidebar.js +++ b/src/utils/sidebar.js @@ -1,11 +1,7 @@ /* eslint-env node */ - -const startCase = require('lodash.startcase') -const sidebar = require('../../static/docs/sidebar.json') - /* - We will use this helper to normalize sidebar structure and create - all of the resurces we need to prevent future recalculations. + These helpers normalize sidebar structure and create all the resources needed. + This prevents future recalculations. Target structure example: @@ -22,11 +18,18 @@ const sidebar = require('../../static/docs/sidebar.json') } */ +const startCase = require('lodash.startcase') + +/* Base sidebar structure */ +const sidebar = require('../../static/docs/sidebar.json') + const PATH_ROOT = '/doc/' const FILE_ROOT = '/static/docs/' const FILE_EXTENSION = '.md' -// Inner helpers +/* + * Private functions + */ function findItem(data, targetPath) { if (data.length) { @@ -73,7 +76,7 @@ function validateRawItem({ slug, source, children }) { } } -// Normalization +/* Normalization */ function normalizeItem({ item, parentPath, resultRef, prevRef }) { validateRawItem(item) @@ -142,13 +145,15 @@ function normalizeSidebar({ return currentResult } +/* + * Exports + */ + const normalizedSidebar = normalizeSidebar({ data: sidebar, parentPath: '' }) -// Exports - function getItemByPath(path) { const normalizedPath = path.replace(/\/$/, '') const isRoot = normalizedPath === PATH_ROOT.slice(0, -1) diff --git a/static/docs/sidebar.json b/static/docs/sidebar.json index c92333fbb3..68c3c4d148 100644 --- a/static/docs/sidebar.json +++ b/static/docs/sidebar.json @@ -30,7 +30,6 @@ ] }, { - "label": "Install", "slug": "install", "source": "install/index.md", "children": [ @@ -356,8 +355,8 @@ ] }, { - "label": "Understanding DVC", "slug": "understanding-dvc", + "label": "Understanding DVC", "source": false, "children": [ "collaboration-issues", From 02deb57700d9cc0f3e7d897577d92e15f2fcce1c Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Sat, 4 Jan 2020 15:41:17 -0600 Subject: [PATCH 21/42] HOTFIX: bad path in previous merge --- src/utils/sidebar.js | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/utils/sidebar.js b/src/utils/sidebar.js index 071538f924..6c1cbd2f75 100644 --- a/src/utils/sidebar.js +++ b/src/utils/sidebar.js @@ -21,7 +21,7 @@ const startCase = require('lodash.startcase') /* Base sidebar structure */ -const sidebar = require('../../static/docs/sidebar.json') +const sidebar = require('../../public/static/docs/sidebar.json') const PATH_ROOT = '/doc/' const FILE_ROOT = '/static/docs/' From 109e4515429aa31f596866b87253126fe7ae3928 Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Sat, 4 Jan 2020 15:58:21 -0600 Subject: [PATCH 22/42] engine: make sidebar.js comment more informative per https://github.com/iterative/dvc.org/pull/891#pullrequestreview-338371474 --- src/utils/sidebar.js | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/utils/sidebar.js b/src/utils/sidebar.js index 6c1cbd2f75..340890b529 100644 --- a/src/utils/sidebar.js +++ b/src/utils/sidebar.js @@ -20,7 +20,7 @@ const startCase = require('lodash.startcase') -/* Base sidebar structure */ +// Base to build the target struct described above const sidebar = require('../../public/static/docs/sidebar.json') const PATH_ROOT = '/doc/' From f981979c18b9854026d0ffefc0a93938d3e07217 Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Sat, 4 Jan 2020 18:26:30 -0600 Subject: [PATCH 23/42] make all sidebar.json indices optional + bunch of comments and some refactoring --- package.json | 2 +- public/static/docs/sidebar.json | 2 +- .../docs/user-guide/contributing/index.md | 35 ----------- server.js | 1 + src/utils/sidebar.js | 61 ++++++++++--------- 5 files changed, 35 insertions(+), 66 deletions(-) delete mode 100644 public/static/docs/user-guide/contributing/index.md diff --git a/package.json b/package.json index 426f1a9649..475db187e6 100644 --- a/package.json +++ b/package.json @@ -5,7 +5,7 @@ "main": "index.js", "scripts": { "dev": "node server.js", - "dev:debug": "node --inspect server.js", + "dev:debug": "node --inspect-brk server.js", "build": "next build", "test": "jest", "start": "NODE_ENV=production node server.js", diff --git a/public/static/docs/sidebar.json b/public/static/docs/sidebar.json index 68c3c4d148..c253134e83 100644 --- a/public/static/docs/sidebar.json +++ b/public/static/docs/sidebar.json @@ -139,7 +139,7 @@ { "label": "Contributing", "slug": "contributing", - "source": "contributing/index.md", + "source": false, "children": [ { "label": "DVC Core Project", diff --git a/public/static/docs/user-guide/contributing/index.md b/public/static/docs/user-guide/contributing/index.md deleted file mode 100644 index 14b24fe0b6..0000000000 --- a/public/static/docs/user-guide/contributing/index.md +++ /dev/null @@ -1,35 +0,0 @@ -# Contributing - -## Contributing to DVC - -We welcome [contributions](/doc/user-guide/contributing/core) to -[DVC](https://github.com/iterative/dvc) by the community. - -- [How to report a problem](/doc/user-guide/contributing/core#how-to-report-a-problem) - -- [Submitting changes](/doc/user-guide/contributing/core#submitting-changes) - -- [Development environment](/doc/user-guide/contributing/core#development-environment) - -- [Running tests](/doc/user-guide/contributing/core#running-tests) - -- [Testing remotes](/doc/user-guide/contributing/core#testing-remotes) - -- [Code style guidelines (for Python)](/doc/user-guide/contributing/core#code-style-guidelines-for-python) - -- [Commit message format guidelines](/doc/user-guide/contributing/core#commit-message-format-guidelines) - -## Contributing Docs - -We welcome any [contributions](/doc/user-guide/contributing/docs) to our -documentation repository, [dvc.org](https://github.com/iterative/dvc.org). -Contribution can be an update to the documentation or (rare) updating or fixing -the JS engine that we use to run the website. - -- [Structure of the project](/doc/user-guide/contributing/docs#structure-of-the-project) - -- [Submitting changes](/doc/user-guide/contributing/docs#submitting-changes) - -- [Development environment](/doc/user-guide/contributing/docs#development-environment) - -- [Doc style guidelines and tips (for JavaScript and Markdown)](/doc/user-guide/contributing/docs#doc-style-guidelines-and-tips-for-java-script-and-markdown) diff --git a/server.js b/server.js index 8055c2ee8a..7609b42135 100644 --- a/server.js +++ b/server.js @@ -118,6 +118,7 @@ app.prepare().then(() => { // Force 404 response for any inexistent /doc item. if (!getItemByPath(pathname)) { res.statusCode = 404 + // NOTE: Assumes the route below will render a 404 page. } // Custom route for all docs diff --git a/src/utils/sidebar.js b/src/utils/sidebar.js index 340890b529..97515f7038 100644 --- a/src/utils/sidebar.js +++ b/src/utils/sidebar.js @@ -27,9 +27,19 @@ const PATH_ROOT = '/doc/' const FILE_ROOT = '/static/docs/' const FILE_EXTENSION = '.md' -/* - * Private functions - */ +function validateRawItem({ slug, source, children }) { + const isSourceDisabled = source === false + + if (!slug) { + throw Error("'slug' field is required in objects in sidebar.json") + } + + if (isSourceDisabled && (!children || !children.length)) { + throw Error( + "If you set 'source' to false, you had to add at least one child" + ) + } +} function findItem(data, targetPath) { if (data.length) { @@ -48,36 +58,16 @@ function findItem(data, targetPath) { } } -function findChildWithSource(item) { - return item.source ? item : findChildWithSource(item.children[0]) -} - -function findPrevItemWithSource(data, item) { - if (item.source) { - return item - } else if (item.prev) { - const prevItem = findItem(data, item.prev) +function findPrevItemWithSource(data, ref) { + if (ref && ref.source) { + return ref + } else if (ref && ref.prev) { + const prevItem = findItem(data, ref.prev) return findPrevItemWithSource(data, prevItem) } } -function validateRawItem({ slug, source, children }) { - const isSourceDisabled = source === false - - if (!slug) { - throw Error("'slug' field is required in objects in sidebar.json") - } - - if (isSourceDisabled && (!children || !children.length)) { - throw Error( - "If you set 'source' to false, you had to add at least one child" - ) - } -} - -/* Normalization */ - function normalizeItem({ item, parentPath, resultRef, prevRef }) { validateRawItem(item) @@ -145,15 +135,25 @@ function normalizeSidebar({ return currentResult } +function findChildWithSource(item) { + return item.source ? item : findChildWithSource(item.children[0]) +} + /* * Exports */ +// Runs at module load time const normalizedSidebar = normalizeSidebar({ data: sidebar, parentPath: '' }) +/** + * Finds `path` in sidebar struct + * @param {*} path + * @uses `normalizedSidebar` + */ function getItemByPath(path) { const normalizedPath = path.replace(/\/$/, '') const isRoot = normalizedPath === PATH_ROOT.slice(0, -1) @@ -161,7 +161,10 @@ function getItemByPath(path) { ? normalizedSidebar[0] : findItem(normalizedSidebar, normalizedPath) - return item && findChildWithSource(item) + if (!item) return false + + // TODO: Refactor this recursive fn into a loop inside `getItemByPath` + return findChildWithSource(item) } function getParentsListFromPath(path) { From 1a8bbdac80169d469ffbe9c40e46abc1b2453903 Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Sun, 5 Jan 2020 15:44:42 -0600 Subject: [PATCH 24/42] refactor: item vs rawItem var name in sidebar helper per https://github.com/iterative/dvc.org/pull/891#pullrequestreview-338378612 --- src/utils/sidebar.js | 32 +++++++++++++++++--------------- 1 file changed, 17 insertions(+), 15 deletions(-) diff --git a/src/utils/sidebar.js b/src/utils/sidebar.js index 97515f7038..d3fdc0758c 100644 --- a/src/utils/sidebar.js +++ b/src/utils/sidebar.js @@ -58,22 +58,23 @@ function findItem(data, targetPath) { } } -function findPrevItemWithSource(data, ref) { - if (ref && ref.source) { - return ref - } else if (ref && ref.prev) { - const prevItem = findItem(data, ref.prev) +// Recursive +function findPrevItemWithSource(data, item) { + if (item && item.source) { + return item + } else if (item && item.prev) { + const prevItem = findItem(data, item.prev) return findPrevItemWithSource(data, prevItem) } } -function normalizeItem({ item, parentPath, resultRef, prevRef }) { - validateRawItem(item) +function normalizeItem({ rawItem, parentPath, resultRef, prevRef }) { + validateRawItem(rawItem) - const { label, slug, source, tutorials } = item + const { label, slug, source, tutorials } = rawItem - // If prev item doesn't have source we need to recirsively search for it + // If prev item doesn't have source we need to search for it const prevItemWithSource = prevRef && findPrevItemWithSource(resultRef, prevRef) @@ -92,6 +93,7 @@ function normalizeItem({ item, parentPath, resultRef, prevRef }) { } } +// Recursive function normalizeSidebar({ data, parentPath, @@ -104,9 +106,9 @@ function normalizeSidebar({ data.forEach(rawItem => { const isShortcut = typeof rawItem === 'string' - const item = isShortcut ? { slug: rawItem } : rawItem + rawItem = isShortcut ? { slug: rawItem } : rawItem const normalizedItem = normalizeItem({ - item, + rawItem, parentPath, resultRef, prevRef @@ -116,10 +118,10 @@ function normalizeSidebar({ prevRef.next = normalizedItem.path } - if (item.children) { + if (rawItem.children) { normalizedItem.children = normalizeSidebar({ - data: item.children, - parentPath: `${parentPath}${item.slug}/`, + data: rawItem.children, + parentPath: `${parentPath}${rawItem.slug}/`, parentResultRef: resultRef, startingPrevRef: normalizedItem }) @@ -135,6 +137,7 @@ function normalizeSidebar({ return currentResult } +// Recursive function findChildWithSource(item) { return item.source ? item : findChildWithSource(item.children[0]) } @@ -163,7 +166,6 @@ function getItemByPath(path) { if (!item) return false - // TODO: Refactor this recursive fn into a loop inside `getItemByPath` return findChildWithSource(item) } From 6e0f9f6986822cb92437a8b5a58f9f2526d99275 Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Sun, 5 Jan 2020 17:52:08 -0600 Subject: [PATCH 25/42] server: remove some unnecessary code comments --- server.js | 9 +++------ src/utils/sidebar.js | 11 ----------- 2 files changed, 3 insertions(+), 17 deletions(-) diff --git a/server.js b/server.js index 7609b42135..711dbc5e90 100644 --- a/server.js +++ b/server.js @@ -25,7 +25,7 @@ app.prepare().then(() => { const { pathname, query } = parsedUrl /* - * Special URL redirects. + * HTTP redirects * NOTE: The order of the redirects is important. */ if ( @@ -111,14 +111,12 @@ app.prepare().then(() => { res.end() } else if (/^\/doc(\/.*)?$/.test(pathname)) { /* - * Special Docs Engine handler - * Based on /pages/doc.js page. + * Docs Engine handler */ - // Force 404 response for any inexistent /doc item. + // Force 404 response code for any inexistent /doc item. if (!getItemByPath(pathname)) { res.statusCode = 404 - // NOTE: Assumes the route below will render a 404 page. } // Custom route for all docs @@ -128,7 +126,6 @@ app.prepare().then(() => { handle(req, res, parsedUrl) } }).listen(port, err => { - // Invokes `createServer` server. if (err) throw err console.info(`> Ready on localhost:${port}`) }) diff --git a/src/utils/sidebar.js b/src/utils/sidebar.js index d3fdc0758c..0135d87975 100644 --- a/src/utils/sidebar.js +++ b/src/utils/sidebar.js @@ -19,8 +19,6 @@ */ const startCase = require('lodash.startcase') - -// Base to build the target struct described above const sidebar = require('../../public/static/docs/sidebar.json') const PATH_ROOT = '/doc/' @@ -58,7 +56,6 @@ function findItem(data, targetPath) { } } -// Recursive function findPrevItemWithSource(data, item) { if (item && item.source) { return item @@ -93,7 +90,6 @@ function normalizeItem({ rawItem, parentPath, resultRef, prevRef }) { } } -// Recursive function normalizeSidebar({ data, parentPath, @@ -137,7 +133,6 @@ function normalizeSidebar({ return currentResult } -// Recursive function findChildWithSource(item) { return item.source ? item : findChildWithSource(item.children[0]) } @@ -146,17 +141,11 @@ function findChildWithSource(item) { * Exports */ -// Runs at module load time const normalizedSidebar = normalizeSidebar({ data: sidebar, parentPath: '' }) -/** - * Finds `path` in sidebar struct - * @param {*} path - * @uses `normalizedSidebar` - */ function getItemByPath(path) { const normalizedPath = path.replace(/\/$/, '') const isRoot = normalizedPath === PATH_ROOT.slice(0, -1) From d71036c66df1975d9f9fe8a7cbb39296d3f28a01 Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Sun, 5 Jan 2020 19:04:41 -0600 Subject: [PATCH 26/42] util: update debugging node script name per https://github.com/iterative/dvc.org/pull/891#pullrequestreview-338434652 --- package.json | 2 +- pages/doc.js | 2 +- public/static/docs/user-guide/contributing/docs.md | 6 +++--- 3 files changed, 5 insertions(+), 5 deletions(-) diff --git a/package.json b/package.json index 475db187e6..6eb4265682 100644 --- a/package.json +++ b/package.json @@ -5,7 +5,7 @@ "main": "index.js", "scripts": { "dev": "node server.js", - "dev:debug": "node --inspect-brk server.js", + "debug": "node --inspect-brk server.js", "build": "next build", "test": "jest", "start": "NODE_ENV=production node server.js", diff --git a/pages/doc.js b/pages/doc.js index 340ddd1f91..d21133ece2 100644 --- a/pages/doc.js +++ b/pages/doc.js @@ -60,7 +60,7 @@ export default function Documentation({ item, headings, markdown, errorCode }) { apiKey: '755929839e113a981f481601c4f52082', indexName: 'dvc', inputSelector: '#doc-search', - debug: false // Set debug to true if you want to inspect the dropdown + debug: false // Set to `true` if you want to inspect the dropdown }) } } catch (ReferenceError) { diff --git a/public/static/docs/user-guide/contributing/docs.md b/public/static/docs/user-guide/contributing/docs.md index ce1c4fd9b5..ca72868420 100644 --- a/public/static/docs/user-guide/contributing/docs.md +++ b/public/static/docs/user-guide/contributing/docs.md @@ -89,8 +89,8 @@ documentation files automatically. ### Debugging -The `yarn dev:debug` script runs the local development server with Node's -[`--inspect` option](https://nodejs.org/en/docs/guides/debugging-getting-started/#command-line-options) +The `yarn debug` script runs the local development server with `node`'s +[`--inspect-brk` option](https://nodejs.org/en/docs/guides/debugging-getting-started/#command-line-options) in order for debuggers to connect to it (on the default port, 9229). > For example, use this launch configuration in **Visual Studio Code**: @@ -101,7 +101,7 @@ in order for debuggers to connect to it (on the default port, 9229). > "request": "launch", > "name": "Launch via Yarn", > "runtimeExecutable": "yarn", -> "runtimeArgs": ["dev:debug"], +> "runtimeArgs": ["debug"], > "port": 9229 > } > ``` From 01934dc9902ef96915404d6bdf3836c284b94af4 Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Sun, 5 Jan 2020 20:20:15 -0600 Subject: [PATCH 27/42] typo per https://github.com/iterative/dvc.org/pull/864#pullrequestreview-338379504 --- static/docs/command-reference/checkout.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/static/docs/command-reference/checkout.md b/static/docs/command-reference/checkout.md index 70fcb9e27d..f55e754927 100644 --- a/static/docs/command-reference/checkout.md +++ b/static/docs/command-reference/checkout.md @@ -17,7 +17,7 @@ positional arguments: ## Description [DVC-files](/doc/user-guide/dvc-file-format) are essentially placeholders that -point to the actual data files or a directories under DVC control. This command +point to the actual data files or directories under DVC control. This command synchronizes the workspace data with the versions specified in the current DVC-files. DVC knows which data files (outputs) to use because their checksums are saved in the `outs` fields inside the DVC-files. From 1f305819a5079feab8dca6c285ea1490fcfc5d98 Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Sun, 5 Jan 2020 20:25:42 -0600 Subject: [PATCH 28/42] cmd ref: reword last sentence in checkout --relink option desc. per https://github.com/iterative/dvc.org/pull/864#pullrequestreview-338379578 --- static/docs/command-reference/checkout.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/static/docs/command-reference/checkout.md b/static/docs/command-reference/checkout.md index f55e754927..ce3ad7160c 100644 --- a/static/docs/command-reference/checkout.md +++ b/static/docs/command-reference/checkout.md @@ -87,8 +87,8 @@ be pulled from remote storage using `dvc pull`. `symlink`, or `copy`) for all data files in the workspace is consistent with the project's [`cache.type`](/doc/command-reference/config#cache). This is achieved by restoring **all data files or a directories** referenced in - current DVC-files (regardless of whether they match a current DVC-file). This - means overwriting the file links or copies from cache to workspace. + current DVC-files (regardless of whether they match a current DVC-file). Note + that this overwrites the data files or directories in the workspace. - `-h`, `--help` - shows the help message and exit. From 82bff276344e74310a83943dc2bf505aae46883f Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Sun, 5 Jan 2020 20:39:34 -0600 Subject: [PATCH 29/42] user-guide: simplify note about checkout --relink in large-dataset-optimization per https://github.com/iterative/dvc.org/pull/864#pullrequestreview-338379594 --- static/docs/user-guide/large-dataset-optimization.md | 9 +++------ 1 file changed, 3 insertions(+), 6 deletions(-) diff --git a/static/docs/user-guide/large-dataset-optimization.md b/static/docs/user-guide/large-dataset-optimization.md index acf8fc7e9e..b88a94a96f 100644 --- a/static/docs/user-guide/large-dataset-optimization.md +++ b/static/docs/user-guide/large-dataset-optimization.md @@ -120,12 +120,9 @@ file link types. Please refer to the [Update a Tracked File](/doc/user-guide/updating-tracked-files) on how to manage tracked files under these cache configurations. -### Re-linking data in the workspace - -To re-create the file links in the workspace, for example after changing the -`cache.type` option for a project, please use -`dvc checkout --relink`. See -[checkout options](/doc/command-reference/checkout#options) for more details. +To make sure that the data files in the workspace are consistent with the +project's `cache.type` option, you may use `dvc checkout --relink`. +See `dvc checkout` for more information. --- From 95a8804c9900ea5de6a91709db424a1bfce82191 Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Sun, 5 Jan 2020 20:49:31 -0600 Subject: [PATCH 30/42] cmd ref: try to use "data files and directories" always in checkout per https://github.com/iterative/dvc.org/pull/864#pullrequestreview-338379632 --- static/docs/command-reference/checkout.md | 37 ++++++++++++----------- 1 file changed, 20 insertions(+), 17 deletions(-) diff --git a/static/docs/command-reference/checkout.md b/static/docs/command-reference/checkout.md index ce3ad7160c..4198376ab9 100644 --- a/static/docs/command-reference/checkout.md +++ b/static/docs/command-reference/checkout.md @@ -19,12 +19,12 @@ positional arguments: [DVC-files](/doc/user-guide/dvc-file-format) are essentially placeholders that point to the actual data files or directories under DVC control. This command synchronizes the workspace data with the versions specified in the current -DVC-files. DVC knows which data files (outputs) to use because -their checksums are saved in the `outs` fields inside the DVC-files. +DVC-files. DVC knows which data (outputs) to use because their +checksums are saved in the `outs` fields inside the DVC-files. -`dvc checkout` is useful when using Git in the project, after -`git clone`, `git checkout`, or any other repository operations that change the -currently present DVC-files. +`dvc checkout` is useful, for example, when using Git in the +project, after `git clone`, `git checkout`, or any other repository +operation that changes the currently present DVC-files. 💡 For convenience, a Git hook is available to automate running `dvc checkout` after `git checkout`. Use `dvc install` to install it. @@ -84,11 +84,11 @@ be pulled from remote storage using `dvc pull`. cache. (They are not "committed", in DVC terms.) - `--relink` - ensures the file linking strategy (`reflink`, `hardlink`, - `symlink`, or `copy`) for all data files in the workspace is consistent with - the project's [`cache.type`](/doc/command-reference/config#cache). This is + `symlink`, or `copy`) for all data in the workspace is consistent with the + project's [`cache.type`](/doc/command-reference/config#cache). This is achieved by restoring **all data files or a directories** referenced in current DVC-files (regardless of whether they match a current DVC-file). Note - that this overwrites the data files or directories in the workspace. + that this overwrites the data in the workspace. - `-h`, `--help` - shows the help message and exit. @@ -206,18 +206,21 @@ do `dvc fetch` + `dvc checkout`. ## Automating `dvc checkout` -We have the data files (managed by DVC) lined up with the other files (managed -by Git). This required us to remember to run `dvc checkout`, and of course we -won't always remember to do so. Wouldn't it be nice to automate this? +We want the data files or directories (managed by DVC) to match with the other +files (managed by Git e.g. source code). This requires us to remember running +`dvc checkout` when needed, and of course we won't always remember to do so. +Wouldn't it be nice to automate this? -Let's run this command: +Let's try this: ```dvc $ dvc install ``` -This installs Git hooks to automate running `dvc checkout` (or `dvc status`) -when needed. Then we can checkout the master branch again: +`dvc install` installs Git hooks to automate common operations, including +running `dvc checkout` when needed. + +We can then checkout the master branch again: ```dvc $ git checkout bigrams @@ -229,6 +232,6 @@ $ md5 model.pkl MD5 (model.pkl) = 3863d0e317dee0a55c4e59d2ec0eef33 ``` -Previously this took two steps, `git checkout` followed by `dvc checkout`. We -can now skip the second one, which is automatically executed for us. The -workspace is automatically synchronized accordingly. +Previously this took two commands, `git checkout` followed by `dvc checkout`. We +can now skip the second one, which is automatically run for us. The workspace is +automatically synchronized accordingly. From 48dc37cc8e0b6882689e0cbcab291ad6c940483b Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Sun, 5 Jan 2020 23:30:58 -0600 Subject: [PATCH 31/42] cmd ref: various wording updates to checkout per https://github.com/iterative/dvc.org/pull/864#pullrequestreview-338468763 and https://github.com/iterative/dvc.org/pull/864#pullrequestreview-338470052 and https://github.com/iterative/dvc.org/pull/864#pullrequestreview-338470171 and https://github.com/iterative/dvc.org/pull/864#pullrequestreview-338470278 --- .../static/docs/command-reference/checkout.md | 21 +++++++++---------- 1 file changed, 10 insertions(+), 11 deletions(-) diff --git a/public/static/docs/command-reference/checkout.md b/public/static/docs/command-reference/checkout.md index 4198376ab9..f1eafde244 100644 --- a/public/static/docs/command-reference/checkout.md +++ b/public/static/docs/command-reference/checkout.md @@ -19,25 +19,25 @@ positional arguments: [DVC-files](/doc/user-guide/dvc-file-format) are essentially placeholders that point to the actual data files or directories under DVC control. This command synchronizes the workspace data with the versions specified in the current -DVC-files. DVC knows which data (outputs) to use because their -checksums are saved in the `outs` fields inside the DVC-files. +DVC-files. `dvc checkout` is useful, for example, when using Git in the -project, after `git clone`, `git checkout`, or any other repository -operation that changes the currently present DVC-files. +project, after `git clone`, `git checkout`, or any other operation +that changes the DVC-files in the workspace. 💡 For convenience, a Git hook is available to automate running `dvc checkout` after `git checkout`. Use `dvc install` to install it. The execution of `dvc checkout` does the following: -- Scans the DVC-files to compare vs. the data files or directories currently in - the workspace. Scanning is limited to the given `targets` (if - any). See also options `--with-deps` and `--recursive` below. +- Scans the DVC-files to compare against the data files or directories in the + workspace. Scanning is limited to the given `targets` (if any). + See also options `--with-deps` and `--recursive` below. - Missing data files or directories, or those that don't match with any - DVC-file, are restored from the cache. See options `--force` and - `--relink`. + DVC-file, are restored from the cache. DVC knows which data + (outputs) to use because their checksums are saved in the `outs` + fields inside the DVC-files. See options `--force` and `--relink`. By default, this command tries not to copy files between the cache and the workspace, using reflinks instead, when supported by the file system. (Refer to @@ -87,8 +87,7 @@ be pulled from remote storage using `dvc pull`. `symlink`, or `copy`) for all data in the workspace is consistent with the project's [`cache.type`](/doc/command-reference/config#cache). This is achieved by restoring **all data files or a directories** referenced in - current DVC-files (regardless of whether they match a current DVC-file). Note - that this overwrites the data in the workspace. + current DVC-files (regardless of whether they match a current DVC-file). - `-h`, `--help` - shows the help message and exit. From 4f2470e6e1be5a822f05926e3adac68285c3a96b Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Sun, 5 Jan 2020 23:36:43 -0600 Subject: [PATCH 32/42] cmd ref: another small wording update for checkout per https://github.com/iterative/dvc.org/pull/864#pullrequestreview-338470475 comments --- public/static/docs/command-reference/checkout.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/public/static/docs/command-reference/checkout.md b/public/static/docs/command-reference/checkout.md index f1eafde244..36e34c8df5 100644 --- a/public/static/docs/command-reference/checkout.md +++ b/public/static/docs/command-reference/checkout.md @@ -16,8 +16,8 @@ positional arguments: ## Description -[DVC-files](/doc/user-guide/dvc-file-format) are essentially placeholders that -point to the actual data files or directories under DVC control. This command +[DVC-files](/doc/user-guide/dvc-file-format) are placeholders that point to +specific version of data files or directories under DVC control. This command synchronizes the workspace data with the versions specified in the current DVC-files. From a09dcc17fbd4ef27dfcb5b07fbf028b3f94f9e69 Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Mon, 6 Jan 2020 00:10:43 -0600 Subject: [PATCH 33/42] tests: updte "Returns first child with source..." test per https://github.com/iterative/dvc.org/pull/891#pullrequestreview-338378539 --- .gitignore | 3 --- src/utils/sidebar.test.js | 27 ++++++++++++++++++++------- 2 files changed, 20 insertions(+), 10 deletions(-) diff --git a/.gitignore b/.gitignore index abda566daa..914928c067 100644 --- a/.gitignore +++ b/.gitignore @@ -17,9 +17,6 @@ lib-cov # Coverage directory used by tools like istanbul coverage -# nyc test coverage -.nyc_output - # Grunt intermediate storage (http://gruntjs.com/creating-plugins#storing-task-files) .grunt diff --git a/src/utils/sidebar.test.js b/src/utils/sidebar.test.js index e66f692f31..c8f977de3e 100644 --- a/src/utils/sidebar.test.js +++ b/src/utils/sidebar.test.js @@ -336,20 +336,31 @@ describe('SidebarMenu/helper', () => { expect(getItemByPath('/doc')).toEqual(result) }) - it('Returns first child with source for sourceless parents', () => { + // eslint-disable-next-line max-len + it('Returns first child with source for all parents with source:false', () => { const rawData = [ { - slug: 'item-name', + slug: 'item', source: false, children: [ - { slug: 'nested-item', source: false, children: ['subnested-item'] } + { + slug: 'nested', + source: false, + children: [ + { + slug: 'subnested', + source: false, + children: ['leaf-item'] + } + ] + } ] } ] const result = { - label: 'Subnested Item', - path: '/doc/item-name/nested-item/subnested-item', - source: '/static/docs/item-name/nested-item/subnested-item.md', + label: 'Leaf Item', + path: '/doc/item/nested/subnested/leaf-item', + source: '/static/docs/item/nested/subnested/leaf-item.md', tutorials: {}, prev: undefined, next: undefined @@ -358,7 +369,9 @@ describe('SidebarMenu/helper', () => { jest.doMock('../../public/static/docs/sidebar.json', () => rawData) const { getItemByPath } = require('./sidebar') - expect(getItemByPath('/doc/item-name')).toEqual(result) + expect(getItemByPath('/doc/item')).toEqual(result) + expect(getItemByPath('/doc/item/nested')).toEqual(result) + expect(getItemByPath('/doc/item/nested/subnested')).toEqual(result) }) }) From 53ca886243ecba5f0bf40a3797f459264c529def Mon Sep 17 00:00:00 2001 From: Ruslan Kuprieiev Date: Mon, 6 Jan 2020 19:10:23 +0200 Subject: [PATCH 34/42] remote: use `-d` in s3 example User complained that local one has `-d` but s3 doesn't. --- public/static/docs/command-reference/remote/index.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/public/static/docs/command-reference/remote/index.md b/public/static/docs/command-reference/remote/index.md index 9856561927..df57354276 100644 --- a/public/static/docs/command-reference/remote/index.md +++ b/public/static/docs/command-reference/remote/index.md @@ -101,7 +101,7 @@ remote = myremote > [Create a Bucket](https://docs.aws.amazon.com/AmazonS3/latest/gsg/CreatingABucket.html). ```dvc -$ dvc remote add mynewremote s3://mybucket/myproject +$ dvc remote add -d mynewremote s3://mybucket/myproject $ dvc remote modify mynewremote region us-east-2 ``` From f61ea583bc4dc0ae8e5c2af4f11cbc6128c09ae3 Mon Sep 17 00:00:00 2001 From: Ivan Shcheklein Date: Mon, 6 Jan 2020 09:25:33 -0800 Subject: [PATCH 35/42] fix #901: fix link to github repo for edit button --- pages/doc.js | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/pages/doc.js b/pages/doc.js index 340ddd1f91..61e91753d8 100644 --- a/pages/doc.js +++ b/pages/doc.js @@ -81,7 +81,7 @@ export default function Documentation({ item, headings, markdown, errorCode }) { return () => Router.events.off('routeChangeComplete', handleRouteChange) }, []) - const githubLink = `https://github.com/iterative/dvc.org/blob/master${source}` + const githubLink = `https://github.com/iterative/dvc.org/blob/master/public/${source}` return ( From 5b36fa8a7a1877760298c4cd9554e01850bc865a Mon Sep 17 00:00:00 2001 From: Ivan Shcheklein Date: Mon, 6 Jan 2020 09:31:05 -0800 Subject: [PATCH 36/42] remove extra / in the edit gihub path --- pages/doc.js | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/pages/doc.js b/pages/doc.js index 61e91753d8..e5062fd806 100644 --- a/pages/doc.js +++ b/pages/doc.js @@ -81,7 +81,7 @@ export default function Documentation({ item, headings, markdown, errorCode }) { return () => Router.events.off('routeChangeComplete', handleRouteChange) }, []) - const githubLink = `https://github.com/iterative/dvc.org/blob/master/public/${source}` + const githubLink = `https://github.com/iterative/dvc.org/blob/master/public${source}` return ( From 0d8f749fae095dcacb0009f6c6950cba5cc66cf4 Mon Sep 17 00:00:00 2001 From: Ruslan Kuprieiev Date: Mon, 16 Dec 2019 13:34:11 +0200 Subject: [PATCH 37/42] docs: locking-related updates Related to #860 --- public/static/docs/command-reference/repro.md | 32 +++++++++++++++++++ .../user-guide/dvc-files-and-directories.md | 6 ++++ 2 files changed, 38 insertions(+) diff --git a/public/static/docs/command-reference/repro.md b/public/static/docs/command-reference/repro.md index a990c2b76c..729645abbd 100644 --- a/public/static/docs/command-reference/repro.md +++ b/public/static/docs/command-reference/repro.md @@ -45,6 +45,38 @@ files, intermediate or final results. It saves all the data files, intermediate or final results into the DVC cache (unless `--no-commit` option is specified), and updates stage files with the new checksum information. +### Parallel stage execution + +Currently, `dvc repro` is not able to parallelize stage execution automatically. +If you need to do this, you can launch `dvc repro` multiple times manually. For +example, let's say a pipeline graph looks something like this: + +``` +$ dvc pipeline show --ascii result.py ++--------+ +--------+ +| A1.dvc | | B1.dvc | ++--------+ +--------+ + * * + * * + * * ++--------+ +--------+ +| A2.dvc | | B2.dvc | ++--------+ +--------+ + * * + ** ** + * * + +------------+ + | result.dvc | + +------------+ +``` + +This pipeline consists of two parallel branches (`A` and `B`), and the final +"result" stage, where the branches merge. To reproduce both branches at the same +time, you could run `dvc repro A2.dvc` and `dvc repro B2.dvc` at the same time +(e.g. in separate terminals). After both finish successfully, you can then run +`dvc repro result.dvc`: DVC will know that both branches are already up-to-date +and only execute the final stage. + ## Options - `-f`, `--force` - reproduce a pipeline, regenerating its results, even if no diff --git a/public/static/docs/user-guide/dvc-files-and-directories.md b/public/static/docs/user-guide/dvc-files-and-directories.md index 272426ace6..e971e23df8 100644 --- a/public/static/docs/user-guide/dvc-files-and-directories.md +++ b/public/static/docs/user-guide/dvc-files-and-directories.md @@ -41,6 +41,12 @@ operation: - `.dvc/lock`: Lock file for the entire DVC project +- `.dvc/tmp`: Directory for miscellaneous temporary files + +- `.dvc/tmp/rwlock`: JSON file that contains read and write locks for specific + dependencies and outputs, to allow safely running multiple DVC commands in + parallel. + ## Structure of cache directory There are two ways in which the data is stored in cache: As a From 99c0a338699901e80970349f9327e252ccf26448 Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Mon, 6 Jan 2020 16:42:14 -0600 Subject: [PATCH 38/42] cmd ref: update intro and implementation details per https://github.com/iterative/dvc.org/pull/864#pullrequestreview-338869346 and https://github.com/iterative/dvc.org/pull/864#pullrequestreview-338870491 --- .../static/docs/command-reference/checkout.md | 18 +++++++++--------- 1 file changed, 9 insertions(+), 9 deletions(-) diff --git a/public/static/docs/command-reference/checkout.md b/public/static/docs/command-reference/checkout.md index 36e34c8df5..829f1b1602 100644 --- a/public/static/docs/command-reference/checkout.md +++ b/public/static/docs/command-reference/checkout.md @@ -16,10 +16,9 @@ positional arguments: ## Description -[DVC-files](/doc/user-guide/dvc-file-format) are placeholders that point to -specific version of data files or directories under DVC control. This command -synchronizes the workspace data with the versions specified in the current -DVC-files. +[DVC-files](/doc/user-guide/dvc-file-format) act as pointers to specific version +of data files or directories under DVC control. This command synchronizes the +workspace data with the versions specified in the current DVC-files. `dvc checkout` is useful, for example, when using Git in the project, after `git clone`, `git checkout`, or any other operation @@ -31,13 +30,14 @@ after `git checkout`. Use `dvc install` to install it. The execution of `dvc checkout` does the following: - Scans the DVC-files to compare against the data files or directories in the - workspace. Scanning is limited to the given `targets` (if any). - See also options `--with-deps` and `--recursive` below. + workspace. DVC knows which data (outputs) match + because their checksums are saved in the `outs` fields inside the DVC-files. + Scanning is limited to the given `targets` (if any). See also options + `--with-deps` and `--recursive` below. - Missing data files or directories, or those that don't match with any - DVC-file, are restored from the cache. DVC knows which data - (outputs) to use because their checksums are saved in the `outs` - fields inside the DVC-files. See options `--force` and `--relink`. + DVC-file, are restored from the cache. See options `--force` and + `--relink`. By default, this command tries not to copy files between the cache and the workspace, using reflinks instead, when supported by the file system. (Refer to From 8f6b2172743b3b062f688f5f70588d8666c61442 Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Tue, 7 Jan 2020 12:24:22 -0600 Subject: [PATCH 39/42] refactor: don't modify fn args in Array.reduce callback (styles.js) per https://github.com/iterative/dvc.org/pull/891#discussion_r363580968 --- src/styles.js | 23 ++++++++++++----------- 1 file changed, 12 insertions(+), 11 deletions(-) diff --git a/src/styles.js b/src/styles.js index ef89ddebf8..89018dcb58 100644 --- a/src/styles.js +++ b/src/styles.js @@ -8,25 +8,25 @@ export const global = ` font-weight: 400; text-rendering: optimizeLegibility !important; } - + @-moz-document url-prefix() { body { font-weight: lighter !important; } } - + body { padding: 0px; font-family: BrandonGrotesque, Tahoma, Arial; font-weight: normal; -webkit-font-smoothing: antialiased; line-height: 1.5; - + // IE flex min-height fix https://stackoverflow.com/a/40491316 display: flex; flex-direction: column; } - + *:focus { outline: 0; } @@ -51,13 +51,14 @@ export const sizes = { sizes.phablet = Math.floor((sizes.tablet + sizes.phone) / 2) -export const media = Object.keys(sizes).reduce((accumulator, label) => { - accumulator[label] = (...args) => css` - @media (max-width: ${sizes[label]}px) { - ${css(...args)}; - } - ` - return accumulator +export const media = Object.keys(sizes).reduce((acc, cur) => { + return Object.assign(acc, { + [cur]: (...args) => css` + @media (max-width: ${sizes[cur]}px) { + ${css(...args)}; + } + ` + }) }, {}) export const container = css` From 7c57f1c6433e80bbb1e44d2dcb09d4faa827fb09 Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Tue, 7 Jan 2020 12:30:19 -0600 Subject: [PATCH 40/42] refactor: don't modify fn args in `isInsideCodeBlock` (Markdown.js) per https://github.com/iterative/dvc.org/pull/891#pullrequestreview-338653640 --- src/Documentation/Markdown/Markdown.js | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/src/Documentation/Markdown/Markdown.js b/src/Documentation/Markdown/Markdown.js index 7a708cd1be..4f9619c6e2 100644 --- a/src/Documentation/Markdown/Markdown.js +++ b/src/Documentation/Markdown/Markdown.js @@ -156,9 +156,9 @@ export default class Markdown extends React.PureComponent { } isInsideCodeBlock = elem => { - for (; elem && elem !== document; elem = elem.parentNode) { - if (elem.tagName === 'PRE') return true - if (elem.tagName === 'ARTICLE') return false + for (let el = elem; el && el !== document; el = el.parentNode) { + if (el.tagName === 'PRE') return true + if (el.tagName === 'ARTICLE') return false } return false } From 530efd9c610873e774e2b43eea7a75a4d65af522 Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Tue, 7 Jan 2020 12:34:57 -0600 Subject: [PATCH 41/42] `cur`->`s` in Array.reduce callback (styles.js) --- src/styles.js | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/src/styles.js b/src/styles.js index 89018dcb58..d4500df152 100644 --- a/src/styles.js +++ b/src/styles.js @@ -51,10 +51,10 @@ export const sizes = { sizes.phablet = Math.floor((sizes.tablet + sizes.phone) / 2) -export const media = Object.keys(sizes).reduce((acc, cur) => { +export const media = Object.keys(sizes).reduce((acc, s) => { return Object.assign(acc, { - [cur]: (...args) => css` - @media (max-width: ${sizes[cur]}px) { + [s]: (...args) => css` + @media (max-width: ${sizes[s]}px) { ${css(...args)}; } ` From c6657e5c506cef5401e79b322d68c86f1f30ac82 Mon Sep 17 00:00:00 2001 From: Ruslan Kuprieiev Date: Tue, 7 Jan 2020 22:40:43 +0200 Subject: [PATCH 42/42] remote: explicitly mention that we are setting default remote https://github.com/iterative/dvc.org/pull/902#discussion_r363390627 --- public/static/docs/command-reference/remote/index.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/public/static/docs/command-reference/remote/index.md b/public/static/docs/command-reference/remote/index.md index df57354276..7e018cd6e7 100644 --- a/public/static/docs/command-reference/remote/index.md +++ b/public/static/docs/command-reference/remote/index.md @@ -95,7 +95,7 @@ url = /path/to/remote remote = myremote ``` -## Example: Add Amazon S3 remote and modify its region +## Example: Add a default Amazon S3 remote and modify its region > 💡 Before adding an S3 remote, be sure to > [Create a Bucket](https://docs.aws.amazon.com/AmazonS3/latest/gsg/CreatingABucket.html).