diff --git a/content/docs/command-reference/fetch.md b/content/docs/command-reference/fetch.md index d430600f18..b1d2bdc4de 100644 --- a/content/docs/command-reference/fetch.md +++ b/content/docs/command-reference/fetch.md @@ -70,8 +70,9 @@ specific one is given with `--remote`. [remote storage](/doc/command-reference/remote) to fetch from (see `dvc remote list`). -- `--run-cache` - downloads all available history of stage runs from the remote - repository. +- `--run-cache` - downloads all available history of + [stage runs](/doc/user-guide/project-structure/internal-files#run-cache) from + the remote repository. See the same option in `dvc push`. - `-d`, `--with-deps` - determines files to download by tracking dependencies to the `targets`. If none are provided, this option is ignored. By traversing all diff --git a/content/docs/command-reference/pull.md b/content/docs/command-reference/pull.md index 013f511d87..eb47e16993 100644 --- a/content/docs/command-reference/pull.md +++ b/content/docs/command-reference/pull.md @@ -110,9 +110,10 @@ used to see what files `dvc pull` would download. [remote storage](/doc/command-reference/remote) to pull from (see `dvc remote list`). -- `--run-cache` - downloads all available history of stage runs from the remote - repository (to the cache only, like `dvc fetch --run-cache`). Note that - `dvc repro ` is necessary to checkout these files (into the +- `--run-cache` - downloads all available history of + [stage runs](/doc/user-guide/project-structure/internal-files#run-cache) from + the remote repository (to the cache only, like `dvc fetch --run-cache`). Note + that `dvc repro ` is necessary to checkout these files (into the workspace) and update `dvc.lock`. - `-j `, `--jobs ` - parallelism level for DVC to download data diff --git a/content/docs/command-reference/push.md b/content/docs/command-reference/push.md index cc925f7ec4..89a36dbb4a 100644 --- a/content/docs/command-reference/push.md +++ b/content/docs/command-reference/push.md @@ -88,8 +88,9 @@ in the cache (compared to the default remote.) It can be used to see what files [remote storage](/doc/command-reference/remote) to push to (see `dvc remote list`). -- `--run-cache` - uploads all available history of stage runs to the remote - repository. +- `--run-cache` - uploads all available history of + [stage runs](/doc/user-guide/project-structure/internal-files#run-cache) to + the remote repository. - `-j `, `--jobs ` - parallelism level for DVC to upload data to remote storage. The default value is `4 * cpu_count()`. For SSH remotes, the diff --git a/content/docs/command-reference/repro.md b/content/docs/command-reference/repro.md index 08844a1999..30e41612a8 100644 --- a/content/docs/command-reference/repro.md +++ b/content/docs/command-reference/repro.md @@ -153,8 +153,11 @@ up-to-date and only execute the final stage. present in the DVC project. Specifying `targets` has no effects with this option, as all possible targets are already included. -- `--no-run-cache` - execute stage commands even if they have already been run - with the same dependencies/outputs/etc. before. +- `--no-run-cache` - execute stage command(s) even if they have already been run + with the same dependencies and outputs (see the + [details](/doc/user-guide/project-structure/internal-files#run-cache)). Useful + for example if the stage command/s is/are non-deterministic + ([not recommended](/doc/command-reference/run#avoiding-unexpected-behavior)). - `--force-downstream` - in cases like `... -> A (changed) -> B -> C` it will reproduce `A` first and then `B`, even if `B` was previously executed with the @@ -178,10 +181,8 @@ up-to-date and only execute the final stage. - `--pull` - [pulls](/doc/command-reference/pull) dependencies and outputs involved in the stages being reproduced, if they are found in the - [default](/doc/command-reference/remote/default) remote storage. Note that it - checks the local run-cache too (available history of stage runs). - - > Has no effect if combined with `--no-run-cache`. + [default remote storage](/doc/command-reference/remote/default). Note that it + tries the local run-cache first (unless `--no-run-cache` is also used). - `-h`, `--help` - prints the usage/help message, and exit. diff --git a/content/docs/command-reference/run.md b/content/docs/command-reference/run.md index aab7be0502..c6862d8850 100644 --- a/content/docs/command-reference/run.md +++ b/content/docs/command-reference/run.md @@ -240,9 +240,10 @@ $ dvc run -n second_stage './another_script.sh $MYENVVAR' - `-f`, `--force` - overwrite an existing stage in `dvc.yaml` file without asking for confirmation. -- `--no-run-cache` - execute the stage `command` even if it has already been run - with the same dependencies/outputs/etc. before. Useful for example if the - command's code is non-deterministic +- `--no-run-cache` - execute the stage command(s) even if they have already been + run with the same dependencies and outputs (see the + [details](/doc/user-guide/project-structure/internal-files#run-cache)). Useful + for example if the stage command/s is/are non-deterministic ([not recommended](#avoiding-unexpected-behavior)). - `--no-commit` - do not store the outputs of this execution in the cache diff --git a/content/docs/user-guide/basic-concepts/run-cache.md b/content/docs/user-guide/basic-concepts/run-cache.md new file mode 100644 index 0000000000..0eab76c538 --- /dev/null +++ b/content/docs/user-guide/basic-concepts/run-cache.md @@ -0,0 +1,11 @@ +--- +name: 'Run-cache' +match: ['run-cache'] +--- + +The DVC run-cache is a log of stages that have been run in the project. It's +comprised of `dvc.lock` file backups, identified as combinations of +dependencies, commands, and outputs that correspond to each other. `dvc repro` +and `dvc run` populate and reutilize the run-cache. See +[Run-cache](/doc/user-guide/project-structure/internal-files#run-cache) for more +details. diff --git a/content/docs/user-guide/project-structure/internal-files.md b/content/docs/user-guide/project-structure/internal-files.md index 675bdb1f96..2d1f25eec1 100644 --- a/content/docs/user-guide/project-structure/internal-files.md +++ b/content/docs/user-guide/project-structure/internal-files.md @@ -15,18 +15,22 @@ operation. (credentials, private locations, etc). The local config file can be edited by hand or with the command `dvc config --local`. -- `.dvc/cache`: The cache directory will store your data in a - special [structure](#structure-of-the-cache-directory). The data files and - directories in the workspace will only contain links to the data - files in the cache. (Refer to +- `.dvc/cache`: Default location of the cache directory. The cache + stores the project data in a special + [structure](#structure-of-the-cache-directory). The data files and directories + in the workspace will only contain links to the data files in the + cache (refer to [Large Dataset Optimization](/doc/user-guide/large-dataset-optimization). See - `dvc config cache` for related configuration options. + `dvc config cache` for related configuration options, including changing the + its location. > Note that DVC includes the cache directory in `.gitignore` during > initialization. No data tracked by DVC should ever be pushed to the Git > repository, only the DVC files that are needed to download or > reproduce that data. +- `.dvc/cache/runs`: Default location of the [run-cache](#run-cache). + - `.dvc/plots`: Directory for [plot templates](/doc/command-reference/plots#plot-templates) @@ -120,3 +124,30 @@ $ cat .dvc/cache/19/6a322c107c2572335158503c64bfba.dir ``` That's how DVC knows that the other two cached files belong in the directory. + +### Run-cache + +`dvc repro` and `dvc run` by default populate and reutilize a log of stages that +have been run in the project. It is found in the `runs/` directory inside the +cache (or [remote storage](/doc/command-reference/remote)). + +Runs are identified as combinations of dependencies, commands, and +outputs that correspond to each other. These combinations are +hashed into special values that make up the file paths inside the run-cache dir. + +```dvc +$ tree .dvc/cache/runs +.dvc/cache/runs +└── 86 + └── 8632e1555283d6e23ec808c9ee1fadc30630c888d5c08695333609ef341508bf + └── e98a34c44fa6b564ef211e76fb3b265bc67f19e5de2e255217d3900d8f... +``` + +The files themselves are backups of the `dvc.lock` file that resulted from that +run. + +> Note that the run's outputs are stored and retrieved from the +> regular cache. + +💡 `dvc push` and `dvc pull` (and `dvc fetch`) can download and upload the +run-cache to remote storage for sharing and/or as a back up.