Skip to content

Commit

Permalink
Merge pull request #2137 from iterative/run-cache
Browse files Browse the repository at this point in the history
run-cache: basic docs
  • Loading branch information
jorgeorpinel authored Feb 4, 2021
2 parents 90a4ca7 + d2f8ebb commit b2149f1
Show file tree
Hide file tree
Showing 7 changed files with 68 additions and 21 deletions.
5 changes: 3 additions & 2 deletions content/docs/command-reference/fetch.md
Original file line number Diff line number Diff line change
Expand Up @@ -70,8 +70,9 @@ specific one is given with `--remote`.
[remote storage](/doc/command-reference/remote) to fetch from (see
`dvc remote list`).

- `--run-cache` - downloads all available history of stage runs from the remote
repository.
- `--run-cache` - downloads all available history of
[stage runs](/doc/user-guide/project-structure/internal-files#run-cache) from
the remote repository. See the same option in `dvc push`.

- `-d`, `--with-deps` - determines files to download by tracking dependencies to
the `targets`. If none are provided, this option is ignored. By traversing all
Expand Down
7 changes: 4 additions & 3 deletions content/docs/command-reference/pull.md
Original file line number Diff line number Diff line change
Expand Up @@ -110,9 +110,10 @@ used to see what files `dvc pull` would download.
[remote storage](/doc/command-reference/remote) to pull from (see
`dvc remote list`).

- `--run-cache` - downloads all available history of stage runs from the remote
repository (to the cache only, like `dvc fetch --run-cache`). Note that
`dvc repro <stage_name>` is necessary to checkout these files (into the
- `--run-cache` - downloads all available history of
[stage runs](/doc/user-guide/project-structure/internal-files#run-cache) from
the remote repository (to the cache only, like `dvc fetch --run-cache`). Note
that `dvc repro <stage_name>` is necessary to checkout these files (into the
workspace) and update `dvc.lock`.

- `-j <number>`, `--jobs <number>` - parallelism level for DVC to download data
Expand Down
5 changes: 3 additions & 2 deletions content/docs/command-reference/push.md
Original file line number Diff line number Diff line change
Expand Up @@ -88,8 +88,9 @@ in the cache (compared to the default remote.) It can be used to see what files
[remote storage](/doc/command-reference/remote) to push to (see
`dvc remote list`).

- `--run-cache` - uploads all available history of stage runs to the remote
repository.
- `--run-cache` - uploads all available history of
[stage runs](/doc/user-guide/project-structure/internal-files#run-cache) to
the remote repository.

- `-j <number>`, `--jobs <number>` - parallelism level for DVC to upload data to
remote storage. The default value is `4 * cpu_count()`. For SSH remotes, the
Expand Down
13 changes: 7 additions & 6 deletions content/docs/command-reference/repro.md
Original file line number Diff line number Diff line change
Expand Up @@ -153,8 +153,11 @@ up-to-date and only execute the final stage.
present in the DVC project. Specifying `targets` has no effects with this
option, as all possible targets are already included.

- `--no-run-cache` - execute stage commands even if they have already been run
with the same dependencies/outputs/etc. before.
- `--no-run-cache` - execute stage command(s) even if they have already been run
with the same dependencies and outputs (see the
[details](/doc/user-guide/project-structure/internal-files#run-cache)). Useful
for example if the stage command/s is/are non-deterministic
([not recommended](/doc/command-reference/run#avoiding-unexpected-behavior)).

- `--force-downstream` - in cases like `... -> A (changed) -> B -> C` it will
reproduce `A` first and then `B`, even if `B` was previously executed with the
Expand All @@ -178,10 +181,8 @@ up-to-date and only execute the final stage.

- `--pull` - [pulls](/doc/command-reference/pull) dependencies and outputs
involved in the stages being reproduced, if they are found in the
[default](/doc/command-reference/remote/default) remote storage. Note that it
checks the local run-cache too (available history of stage runs).

> Has no effect if combined with `--no-run-cache`.
[default remote storage](/doc/command-reference/remote/default). Note that it
tries the local run-cache first (unless `--no-run-cache` is also used).

- `-h`, `--help` - prints the usage/help message, and exit.

Expand Down
7 changes: 4 additions & 3 deletions content/docs/command-reference/run.md
Original file line number Diff line number Diff line change
Expand Up @@ -240,9 +240,10 @@ $ dvc run -n second_stage './another_script.sh $MYENVVAR'
- `-f`, `--force` - overwrite an existing stage in `dvc.yaml` file without
asking for confirmation.

- `--no-run-cache` - execute the stage `command` even if it has already been run
with the same dependencies/outputs/etc. before. Useful for example if the
command's code is non-deterministic
- `--no-run-cache` - execute the stage command(s) even if they have already been
run with the same dependencies and outputs (see the
[details](/doc/user-guide/project-structure/internal-files#run-cache)). Useful
for example if the stage command/s is/are non-deterministic
([not recommended](#avoiding-unexpected-behavior)).

- `--no-commit` - do not store the outputs of this execution in the cache
Expand Down
11 changes: 11 additions & 0 deletions content/docs/user-guide/basic-concepts/run-cache.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
---
name: 'Run-cache'
match: ['run-cache']
---

The DVC run-cache is a log of stages that have been run in the project. It's
comprised of `dvc.lock` file backups, identified as combinations of
dependencies, commands, and outputs that correspond to each other. `dvc repro`
and `dvc run` populate and reutilize the run-cache. See
[Run-cache](/doc/user-guide/project-structure/internal-files#run-cache) for more
details.
41 changes: 36 additions & 5 deletions content/docs/user-guide/project-structure/internal-files.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,18 +15,22 @@ operation.
(credentials, private locations, etc). The local config file can be edited by
hand or with the command `dvc config --local`.

- `.dvc/cache`: The <abbr>cache</abbr> directory will store your data in a
special [structure](#structure-of-the-cache-directory). The data files and
directories in the <abbr>workspace</abbr> will only contain links to the data
files in the cache. (Refer to
- `.dvc/cache`: Default location of the <abbr>cache</abbr> directory. The cache
stores the project data in a special
[structure](#structure-of-the-cache-directory). The data files and directories
in the <abbr>workspace</abbr> will only contain links to the data files in the
cache (refer to
[Large Dataset Optimization](/doc/user-guide/large-dataset-optimization). See
`dvc config cache` for related configuration options.
`dvc config cache` for related configuration options, including changing the
its location.

> Note that DVC includes the cache directory in `.gitignore` during
> initialization. No data tracked by DVC should ever be pushed to the Git
> repository, only the <abbr>DVC files</abbr> that are needed to download or
> reproduce that data.
- `.dvc/cache/runs`: Default location of the [run-cache](#run-cache).

- `.dvc/plots`: Directory for
[plot templates](/doc/command-reference/plots#plot-templates)

Expand Down Expand Up @@ -120,3 +124,30 @@ $ cat .dvc/cache/19/6a322c107c2572335158503c64bfba.dir
```

That's how DVC knows that the other two cached files belong in the directory.

### Run-cache

`dvc repro` and `dvc run` by default populate and reutilize a log of stages that
have been run in the project. It is found in the `runs/` directory inside the
cache (or [remote storage](/doc/command-reference/remote)).

Runs are identified as combinations of <abbr>dependencies</abbr>, commands, and
<abbr>outputs</abbr> that correspond to each other. These combinations are
hashed into special values that make up the file paths inside the run-cache dir.

```dvc
$ tree .dvc/cache/runs
.dvc/cache/runs
└── 86
└── 8632e1555283d6e23ec808c9ee1fadc30630c888d5c08695333609ef341508bf
└── e98a34c44fa6b564ef211e76fb3b265bc67f19e5de2e255217d3900d8f...
```

The files themselves are backups of the `dvc.lock` file that resulted from that
run.

> Note that the run's <abbr>outputs</abbr> are stored and retrieved from the
> regular cache.
πŸ’‘ `dvc push` and `dvc pull` (and `dvc fetch`) can download and upload the
run-cache to remote storage for sharing and/or as a back up.

0 comments on commit b2149f1

Please sign in to comment.