From 0324fa2c730be273b08195dcef964ed5ff4d4db3 Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Wed, 22 Jan 2020 14:13:17 -0600 Subject: [PATCH] cache: improve note about .dir files and add in appropriate docs rel: iterative/dvc/issues/3182 --- public/static/docs/command-reference/add.md | 8 ++--- public/static/docs/command-reference/diff.md | 5 +++ .../user-guide/dvc-files-and-directories.md | 5 +-- public/static/docs/user-guide/dvcignore.md | 31 ++++++++++++------- 4 files changed, 31 insertions(+), 18 deletions(-) diff --git a/public/static/docs/command-reference/add.md b/public/static/docs/command-reference/add.md index 0e992c116d..b84e9b73bd 100644 --- a/public/static/docs/command-reference/add.md +++ b/public/static/docs/command-reference/add.md @@ -211,10 +211,10 @@ outs: wdir: . ``` -> The cache file with `.dir` extension is a special text file that records the -> mapping of files in the `pics/` directory. (Refer to -> [Structure of cache directory](/doc/user-guide/dvc-files-and-directories#structure-of-cache-directory) -> for an example.) +> The cache file with `.dir` extension is a special text file that contains the +> mapping of files in the `pics/` directory (as a JSON array), along with their +> checksums. (Refer to +> [Structure of cache directory](/doc/user-guide/dvc-files-and-directories#structure-of-cache-directory).) This allows us to treat the entire directory structure as one unit (a dependency or an output) with DVC commands. For example, it lets you pass the diff --git a/public/static/docs/command-reference/diff.md b/public/static/docs/command-reference/diff.md index 724b6c59b8..daa29ca201 100644 --- a/public/static/docs/command-reference/diff.md +++ b/public/static/docs/command-reference/diff.md @@ -168,6 +168,11 @@ diff for 'data/features' 0 files deleted, size was increased by 2.9 MB ``` +> The cache file with `.dir` extension is a special text file that contains the +> mapping of files in the `data/features/` directory (as a JSON array), along +> with their checksums. (Refer to +> [Structure of cache directory](/doc/user-guide/dvc-files-and-directories#structure-of-cache-directory).) + ## Example: Confirming that a target has not changed Let's use our example repo once again, that has several diff --git a/public/static/docs/user-guide/dvc-files-and-directories.md b/public/static/docs/user-guide/dvc-files-and-directories.md index e971e23df8..68a9049191 100644 --- a/public/static/docs/user-guide/dvc-files-and-directories.md +++ b/public/static/docs/user-guide/dvc-files-and-directories.md @@ -104,8 +104,9 @@ $ tree .dvc/cache The cache file with `.dir` extension is a special text file that contains the mapping of files in the `data/` directory (as a JSON array), along with their -checksums. The other two cache files are the files inside `data/`. A typical -`.dir` cache file looks like this: +checksums. The other two cache files are the files inside `data/`. + +A typical `.dir` cache file looks like this: ```dvc $ cat .dvc/cache/19/6a322c107c2572335158503c64bfba.dir diff --git a/public/static/docs/user-guide/dvcignore.md b/public/static/docs/user-guide/dvcignore.md index c215804dc7..0528ca2d33 100644 --- a/public/static/docs/user-guide/dvcignore.md +++ b/public/static/docs/user-guide/dvcignore.md @@ -23,9 +23,9 @@ similar to `.gitignore` in Git. ## Remarks -Ignored files will not be saved in cache, they will be non-existent for DVC. -It's worth to remember that, especially when ignoring files inside DVC-handled -directories. +Ignored files will not be saved in cache, they will be non-existent +for DVC. It's worth to remember that, especially when ignoring files inside +DVC-handled directories. **It is crucial to understand, that DVC might remove ignored files upon `dvc run` or `dvc repro`. If they are not produced by a @@ -44,9 +44,9 @@ it raises an error. Ignoring files inside such directory should be handled from The same as for [`.gitignore`](https://git-scm.com/docs/gitignore). -## Example: Modification of ignored data +## Example: Ignoring specific files -Let's see what happens when we modify ignored file. +Let's see what happens when we add a file to `.dvcignore`. ```dvc $ mkdir data @@ -60,8 +60,8 @@ $ tree . └── data2 ``` -We created the `data/` directory. Let's ignore part of the `data` and add it -under DVC control. +We created the `data/` directory with two files. Let's ignore one of them, and +add track the directory with DVC. ```dvc $ echo data/data1 >> .dvcignore @@ -70,6 +70,7 @@ $ cat .dvcignore data/data1 $ dvc add data + $ tree .dvc/cache .dvc/cache @@ -79,10 +80,16 @@ $ tree .dvc/cache └── c3d3797971f12c7f5e1d106dd5cee2 ``` -As we can see, `data1` has been ignored. Cache contains only one file entry (for -`data2`) and one dir entry (`data`). +Only the checksums of a directory (`data/`) and one files have been +cached. This means that `dvc add` ignored one of the files +(`data1`). + +> The cache file with `.dir` extension is a special text file that contains the +> mapping of files in the `data/` directory (as a JSON array), along with their +> checksums. (Refer to +> [Structure of cache directory](/doc/user-guide/dvc-files-and-directories#structure-of-cache-directory).) -Now, let's modify `data1` and see if it affects `dvc status`. +Now, let's modify file `data1` and see if it affects `dvc status`. ```dvc $ dvc status @@ -95,8 +102,8 @@ $ dvc status Data and pipelines are up to date. ``` -Same modification applied to not ignored file will make `dvc status` inform -about change: +`dvc status` also ignores `data1`. The same modification on a tracked file will +produce a different output: ```dvc $ echo "123" >> data/data2