Skip to content

Commit

Permalink
cache: improve note about .dir files and add in appropriate docs
Browse files Browse the repository at this point in the history
  • Loading branch information
jorgeorpinel committed Jan 22, 2020
1 parent f4900e8 commit 0324fa2
Show file tree
Hide file tree
Showing 4 changed files with 31 additions and 18 deletions.
8 changes: 4 additions & 4 deletions public/static/docs/command-reference/add.md
Original file line number Diff line number Diff line change
Expand Up @@ -211,10 +211,10 @@ outs:
wdir: .
```

> The cache file with `.dir` extension is a special text file that records the
> mapping of files in the `pics/` directory. (Refer to
> [Structure of cache directory](/doc/user-guide/dvc-files-and-directories#structure-of-cache-directory)
> for an example.)
> The cache file with `.dir` extension is a special text file that contains the
> mapping of files in the `pics/` directory (as a JSON array), along with their
> checksums. (Refer to
> [Structure of cache directory](/doc/user-guide/dvc-files-and-directories#structure-of-cache-directory).)

This allows us to treat the entire directory structure as one unit (a dependency
or an <abbr>output</abbr>) with DVC commands. For example, it lets you pass the
Expand Down
5 changes: 5 additions & 0 deletions public/static/docs/command-reference/diff.md
Original file line number Diff line number Diff line change
Expand Up @@ -168,6 +168,11 @@ diff for 'data/features'
0 files deleted, size was increased by 2.9 MB
```

> The cache file with `.dir` extension is a special text file that contains the
> mapping of files in the `data/features/` directory (as a JSON array), along
> with their checksums. (Refer to
> [Structure of cache directory](/doc/user-guide/dvc-files-and-directories#structure-of-cache-directory).)
## Example: Confirming that a target has not changed

Let's use our example repo once again, that has several
Expand Down
5 changes: 3 additions & 2 deletions public/static/docs/user-guide/dvc-files-and-directories.md
Original file line number Diff line number Diff line change
Expand Up @@ -104,8 +104,9 @@ $ tree .dvc/cache

The cache file with `.dir` extension is a special text file that contains the
mapping of files in the `data/` directory (as a JSON array), along with their
checksums. The other two cache files are the files inside `data/`. A typical
`.dir` cache file looks like this:
checksums. The other two cache files are the files inside `data/`.

A typical `.dir` cache file looks like this:

```dvc
$ cat .dvc/cache/19/6a322c107c2572335158503c64bfba.dir
Expand Down
31 changes: 19 additions & 12 deletions public/static/docs/user-guide/dvcignore.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,9 +23,9 @@ similar to `.gitignore` in Git.

## Remarks

Ignored files will not be saved in cache, they will be non-existent for DVC.
It's worth to remember that, especially when ignoring files inside DVC-handled
directories.
Ignored files will not be saved in <abbr>cache</abbr>, they will be non-existent
for DVC. It's worth to remember that, especially when ignoring files inside
DVC-handled directories.

**It is crucial to understand, that DVC might remove ignored files upon
`dvc run` or `dvc repro`. If they are not produced by a
Expand All @@ -44,9 +44,9 @@ it raises an error. Ignoring files inside such directory should be handled from

The same as for [`.gitignore`](https://git-scm.com/docs/gitignore).

## Example: Modification of ignored data
## Example: Ignoring specific files

Let's see what happens when we modify ignored file.
Let's see what happens when we add a file to `.dvcignore`.

```dvc
$ mkdir data
Expand All @@ -60,8 +60,8 @@ $ tree .
└── data2
```

We created the `data/` directory. Let's ignore part of the `data` and add it
under DVC control.
We created the `data/` directory with two files. Let's ignore one of them, and
add track the directory with DVC.

```dvc
$ echo data/data1 >> .dvcignore
Expand All @@ -70,6 +70,7 @@ $ cat .dvcignore
data/data1
$ dvc add data
$ tree .dvc/cache
.dvc/cache
Expand All @@ -79,10 +80,16 @@ $ tree .dvc/cache
└── c3d3797971f12c7f5e1d106dd5cee2
```

As we can see, `data1` has been ignored. Cache contains only one file entry (for
`data2`) and one dir entry (`data`).
Only the checksums of a directory (`data/`) and one files have been
<abbr>cached</abbr>. This means that `dvc add` ignored one of the files
(`data1`).

> The cache file with `.dir` extension is a special text file that contains the
> mapping of files in the `data/` directory (as a JSON array), along with their
> checksums. (Refer to
> [Structure of cache directory](/doc/user-guide/dvc-files-and-directories#structure-of-cache-directory).)
Now, let's modify `data1` and see if it affects `dvc status`.
Now, let's modify file `data1` and see if it affects `dvc status`.

```dvc
$ dvc status
Expand All @@ -95,8 +102,8 @@ $ dvc status
Data and pipelines are up to date.
```

Same modification applied to not ignored file will make `dvc status` inform
about change:
`dvc status` also ignores `data1`. The same modification on a tracked file will
produce a different output:

```dvc
$ echo "123" >> data/data2
Expand Down

0 comments on commit 0324fa2

Please sign in to comment.