Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docs for new command dvc check-ignore #1629

Merged
merged 21 commits into from
Aug 7, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
21 commits
Select commit Hold shift + click to select a range
702172e
Docs for new command `dvc check-ignore`
karajan1001 Jul 26, 2020
214131d
Update content/docs/command-reference/check-ignore.md
jorgeorpinel Jul 29, 2020
20cbcf5
Update content/docs/command-reference/check-ignore.md
jorgeorpinel Jul 29, 2020
ead5f4f
Update content/docs/command-reference/check-ignore.md
jorgeorpinel Jul 29, 2020
3b5e1a0
Update content/docs/command-reference/check-ignore.md
jorgeorpinel Jul 29, 2020
2fe85d3
Update content/docs/command-reference/check-ignore.md
jorgeorpinel Jul 29, 2020
98add0a
Update content/docs/command-reference/check-ignore.md
jorgeorpinel Jul 29, 2020
feaa074
Update content/docs/command-reference/check-ignore.md
jorgeorpinel Jul 29, 2020
545b263
Update content/docs/command-reference/check-ignore.md
jorgeorpinel Jul 29, 2020
34fdb5f
Update content/docs/command-reference/check-ignore.md
jorgeorpinel Jul 29, 2020
44277dc
Update content/docs/command-reference/check-ignore.md
jorgeorpinel Aug 6, 2020
1288407
Update content/docs/command-reference/check-ignore.md
jorgeorpinel Aug 6, 2020
d99e05c
Update content/docs/command-reference/check-ignore.md
jorgeorpinel Aug 6, 2020
3ae42a8
Update content/docs/command-reference/check-ignore.md
jorgeorpinel Aug 6, 2020
c61c42a
Update content/docs/command-reference/check-ignore.md
jorgeorpinel Aug 6, 2020
f10806b
Update content/docs/command-reference/check-ignore.md
jorgeorpinel Aug 6, 2020
6d3930d
Update content/docs/command-reference/check-ignore.md
jorgeorpinel Aug 7, 2020
a9a5242
Merge branch 'master' into fix_1628
jorgeorpinel Aug 7, 2020
1c32b8d
cmd: complete check-ignore ref.
jorgeorpinel Aug 7, 2020
524b01e
guide: anoter link from dvcignore guide to check-ignore cmd ref.
jorgeorpinel Aug 7, 2020
c759585
cmd: update check-ignore targets arg desc
jorgeorpinel Aug 7, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions config/prismjs/dvc-commands.js
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,7 @@ module.exports = [
'config',
'commit',
'checkout',
'check-ignore',
'cache dir',
'cache',
'add'
Expand Down
90 changes: 90 additions & 0 deletions content/docs/command-reference/check-ignore.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,90 @@
# check-ignore

Check whether any given files or directories are excluded from DVC due to the
patterns found in [`.dvcignore`](/doc/user-guide/dvcignore).

## Synopsis

```usage
usage: usage: dvc check-ignore [-h] [-q | -v] [-d] [-n]
targets [targets ...]

positional arguments:
targets File or directory paths to check (wildcards supported)
jorgeorpinel marked this conversation as resolved.
Show resolved Hide resolved
```

## Description

This helper command checks whether the given `targets` are ignored by DVC
according to the [`.dvcignore` file](/doc/user-guide/dvcignore) (if any). The
ones that are ignored indeed are printed back.

> Note that your shell may support path wildcards such as `dir/file*` and these
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same - it's fine to keep it here, but that what I would expect anyways since multiple targets are supported. A bit not consistent since we don't make notes like this in other commands.

Copy link
Contributor

@jorgeorpinel jorgeorpinel Aug 7, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I see you found my note. Yeah any other commands where this note would be especially relevant? add/commit/remove? status/repro? fetch/pull/push? metrics/plots? (un)freeze?

Copy link
Contributor

@jorgeorpinel jorgeorpinel Aug 7, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess best to just remove it too...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also removed in #1673

> can be fed as `targets` to `dvc check-ignore`, as shown in the
> [examples](#examples).

## Options

- `-d`, `--details` - show the exclude pattern together with each target path.

- `-n`, `--non-matching` - show the target paths which don’t match any pattern.
Only usable when `--details` is also employed

- `-h`, `--help` - prints the usage/help message, and exit.

- `-q`, `--quiet` - do not write anything to standard output. Exit with 0 if no
problems arise, otherwise 1.

- `-v`, `--verbose` - displays detailed tracing information.

## Examples

First, let's create a `.dvcignore` file with some patterns in it, and some files
to check against it.

```dvc
$ echo "file*\n\!file2" >> .dvcignore
$ cat .dvcignore
file*
!file2
$ touch file1 file2 other
$ ls
file1 file2 other
jorgeorpinel marked this conversation as resolved.
Show resolved Hide resolved
```

Then, let's use `dvc check-ignore` to see which of these files would be excluded
given our `.dvcignore` file:

```dvc
$ dvc check-ignore file1
file1
$ dvc check-ignore file1 file2
file1
file2
$ dvc check-ignore other
jorgeorpinel marked this conversation as resolved.
Show resolved Hide resolved
# There's no command output, meaning `other` is not excluded.
$ dvc check-ignore file*
file1
file2
```

If the `--details` option is used, a series of lines are printed using this
format: `<path/to/.dvcignore>:<line_num>:<pattern> | <target_path>`
jorgeorpinel marked this conversation as resolved.
Show resolved Hide resolved

```dvc
$ dvc check-ignore -d file1 file2
.dvcignore:1:file* file1
.dvcignore:2:!file2 file2
$ dvc check-ignore -d other
$ dvc check-ignore -d file*
.dvcignore:1:file* file1
.dvcignore:2:!file2 file2
```

With the `--non-matching` option, non-matching `targets` will also be included
in the list. All fields in each line, except for `<target path>`, will be empty.

```dvc
$ dvc check-ignore -d -n other
:: other
```
4 changes: 4 additions & 0 deletions content/docs/sidebar.json
Original file line number Diff line number Diff line change
Expand Up @@ -163,6 +163,10 @@
}
]
},
{
"label": "check-ignore",
"slug": "check-ignore"
},
{
"label": "checkout",
"slug": "checkout"
Expand Down
2 changes: 1 addition & 1 deletion content/docs/user-guide/contributing/docs.md
Original file line number Diff line number Diff line change
Expand Up @@ -200,7 +200,7 @@ We also use "emoji" symbols sparingly for visibility on certain notes. Mainly:
- πŸ“– For notes that link to other related documentation
- ⚠️ Warnings about possible problems related to DVC usage (similar to **Note!**
and "Note that..." notes)
- πŸ’‘ Useful tips related to external tools/integrations
- πŸ’‘ Useful tips related to related or external tools and integrations

> Some other emojis currently in use here and there: βš‘βœ…πŸ™πŸ›β­β— (among
> others).
93 changes: 51 additions & 42 deletions content/docs/user-guide/dvcignore.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,16 +8,17 @@ project. For example, when working in a <abbr>workspace</abbr> directory with a
large number of data files, you might encounter extended execution time for
operations as simple as `dvc status`. In other case you might want to omit files
or folders unrelated to the project (like `.DS_Store` on MacOS). To address
these scenarios, DVC supports optional `.dvcignore` files. `.dvcignore` works
similar to `.gitignore` in Git.
these scenarios, DVC supports optional `.dvcignore` files.

`.dvcignore` is similar to `.gitignore` in Git, and can be tested with our
helper command `dvc check-ignore`.

## How does it work?

- You need to create the `.dvcignore` file. It can be placed in the root of the
project or inside any subdirectory (see also [remarks](#Remarks) below).
- Populate it with [patterns](https://git-scm.com/docs/gitignore) that you would
like to ignore. You can find useful templates
[here](https://github.com/github/gitignore).
- You need to create a `.dvcignore` file. These can be placed in the root of the
project, or in any subdirectory (see the [remarks](#Remarks) below).
- Populate it with [.gitignore patterns](https://git-scm.com/docs/gitignore).
You can find useful templates [here](https://github.com/github/gitignore).
- Each line should contain only one pattern.
- During execution of commands that traverse directories, DVC will ignore
matching paths.
Expand All @@ -28,87 +29,95 @@ Ignored files will not be saved in <abbr>cache</abbr>, they will be non-existent
for DVC. It's worth to remember that, especially when ignoring files inside
DVC-handled directories.

**It is crucial to understand, that DVC might remove ignored files upon
`dvc run` or `dvc repro`. If they are not produced by a
[pipeline](/doc/command-reference/dag) [stage](/doc/command-reference/run), they
can be deleted permanently.**
⚠️ Important! Note that `dvc run` and `dvc repro` might remove ignored files. If
they are not produced by a pipeline [stage](/doc/command-reference/run), they
can be lost permanently.

Keep in mind, that when you add `.dvcignore` patterns that affect an existing
<abbr>output</abbr>, its status will change and DVC will behave as if that
affected files were deleted.

Keep in mind, that when you add to `.dvcignore` entries that affect one of the
existing <abbr>outputs</abbr>, its status will change and DVC will behave as if
that affected files were deleted.
πŸ’‘ Note that you can use the `dvc check-ignore` command to check whether given
files or directories are ignored by the patterns in a `.dvcignore` file.

If DVC finds a `.dvcignore` file inside a dependency or output directory, it
raises an error. Ignoring files inside such directories should be handled from a
`.dvcignore` in higher levels of the project tree.

## Syntax

The same as for [`.gitignore`](https://git-scm.com/docs/gitignore).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that was useful. Also, we do quite a bit of changes not related directly to the command, @jorgeorpinel :)

Copy link
Contributor

@jorgeorpinel jorgeorpinel Aug 7, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, sorry. But had to read the whole thing to find the best places to link to the command and did copy edits along the way.

This section was repetitive since the format (and templates) is already linked to from the bullets above. It was also an entire H2 section for 1-line β€” looked funny once rendered.

Should I restore it and move all the syntax/format and templates info here? (remove it from the bullet under How it works)


## Examples

Let's see what happens when we add a file to `.dvcignore`.
Let's see what happens when we add a file to `.dvcignore`:

```dvc
$ mkdir data
$ echo data1 >> data/data1
$ echo data2 >> data/data2
$ tree .

$ echo 1 > data/data1
$ echo 2 > data/data2
$ tree
.
└── data
β”œβ”€β”€ data1
└── data2
```

We created the `data/` directory with two files. Let's ignore one of them, and
track the directory with DVC.
We created the `data/` directory with two data files. Let's ignore one of them,
and double check that it's being ignored by DVC:

```dvc
$ echo data/data1 >> .dvcignore
$ cat .dvcignore

data/data1
$ dvc check-ignore data/*
data/data1
```

$ dvc add data
> Refer to `dvc check-ignore` for more details on that command.

$ tree .dvc/cache
## Example: Skip specific files when adding directories

Let's now track the directory with `dvc add`, and see what happens in the
<abbr>cache</abbr>:

```dvc
$ dvc add data
...
$ tree .dvc/cache
.dvc/cache
β”œβ”€β”€ 54
β”‚Β Β  └── 40cb5e4c57ab54af68127492334a23.dir
└── ed
└── c3d3797971f12c7f5e1d106dd5cee2
β”œβ”€β”€ 26
β”‚Β Β  └── ab0db90d72e28ad0ba1e22ee510510
└── ad
└── 8b0ddcf133a6e5833002ce28f97c5a.dir
$ md5 data/*
b026324c6904b2a9cb4b88d6d61c81d1 data/data1
26ab0db90d72e28ad0ba1e22ee510510 data/data2
```

Only the hash values of a directory (`data/`) and one file have been
<abbr>cached</abbr>. This means that `dvc add` ignored one of the files
(`data1`).
Only the cache entries of the `data/` directory itself and one file have been
stored. Checking the hash value of the data files manually, we can see that
`data2` was cached. This means that `dvc add` did ignore `data1`.

> Refer to
> [Structure of cache directory](/doc/user-guide/dvc-files-and-directories#structure-of-the-cache-directory)
> for more info.

## Example: Ignore file state changes

Now, let's modify file `data1` and see if it affects `dvc status`.

```dvc
$ dvc status

Data and pipelines are up to date.

$ echo "123" >> data/data1
$ echo "2345" >> data/data1
$ dvc status

Data and pipelines are up to date.
```

`dvc status` also ignores `data1`. The same modification on a tracked file will
produce a different output:
`dvc status` ignores `data1`. Modifications on a tracked file produce a
different output:

```dvc
$ echo "123" >> data/data2
$ echo "345" >> data/data2
$ dvc status

data.dvc:
changed outs:
modified: data
Expand Down