Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

guide: reorder how-tos and copy edits; expand on adding existing dependencies to a stage #1914

Merged
merged 27 commits into from
Nov 27, 2020
Merged
Show file tree
Hide file tree
Changes from 6 commits
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
83a3157
guide: reorder how-tos and some copy edits
jorgeorpinel Nov 9, 2020
82812e2
cases: remove unnecessary file
jorgeorpinel Nov 10, 2020
d82b83a
adding dependencies to a stage
imhardikj Nov 11, 2020
17e40d2
How-to title change
imhardikj Nov 11, 2020
20e932c
How-to reference update in commit/run
imhardikj Nov 11, 2020
77871e6
Update content/docs/sidebar.json
jorgeorpinel Nov 12, 2020
f578ddd
Update content/docs/user-guide/how-to/add-deps-or-outs-to-a-stage.md
jorgeorpinel Nov 12, 2020
4abbee5
minor updates
imhardikj Nov 12, 2020
cec2090
minor updates
imhardikj Nov 13, 2020
e005c2d
run/commit Updates
imhardikj Nov 18, 2020
7f70085
removing extra information from run/commit
imhardikj Nov 19, 2020
2968e79
guide: update how-to add deps/outs to stage title
jorgeorpinel Nov 20, 2020
28b581b
guide: remove label since how-to add des/outs slug is the same
jorgeorpinel Nov 20, 2020
964571e
Updates to run and commit
imhardikj Nov 20, 2020
78df111
How-to simplification
imhardikj Nov 21, 2020
41d86db
How-to description update
imhardikj Nov 22, 2020
dabb188
how-to: generalize adding depts/outs to existing stages
jorgeorpinel Nov 25, 2020
f77e100
how-to: rearrange and rename them, copy edits and simplfications
jorgeorpinel Nov 25, 2020
a3bed40
Merge branch 'master' into how-to
jorgeorpinel Nov 25, 2020
5bb6255
how-to: Un-track -> Stop Tracking
jorgeorpinel Nov 25, 2020
2fbcc9b
how-to: remove already unprotects + clarify about gc
jorgeorpinel Nov 25, 2020
b741d7c
how-to: clarify conditional note about existing out files (added to s…
jorgeorpinel Nov 25, 2020
590789e
how-to: simplify add deps/outs to stages and add SEO fields
jorgeorpinel Nov 25, 2020
b6a384d
how-to: rename Stop tracking data -> Reverse mistakes
jorgeorpinel Nov 27, 2020
9ebe6d0
cases: generalize add deps/outs to stages
jorgeorpinel Nov 27, 2020
210de90
Merge branch 'master' into how-to
jorgeorpinel Nov 27, 2020
c1d3ad9
how-to: rename Common Mistakes -> Stop Tracking Data again
jorgeorpinel Nov 27, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 4 additions & 4 deletions content/docs/command-reference/commit.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,10 +42,10 @@ scenarios are further detailed below.

- In cases where we have previously executed a stage (either by writing
`dvc.yaml` manually and using `dvc repro`, or with `dvc run`), but later
notice that some of the output files or directories it creates, which are
already in the <abbr>workspace</abbr>, are missing from `dvc.yaml` (`outs`
field). We can
[add missing outputs to an existing stage](/docs/user-guide/how-to/add-output-to-stage)
notice that some of the existing dependencies, or output files/directories it
creates, which are already in the <abbr>workspace</abbr>, are missing from
jorgeorpinel marked this conversation as resolved.
Show resolved Hide resolved
`dvc.yaml` (`deps` and `outs` field respectively). We can
[add missing dependencies/outputs to an existing stage](/docs/user-guide/how-to/add-deps-or-outs-to-a-stage)
without having to execute it again. Use `dvc commit` to update the `dvc.lock`
file and save outputs to the cache.

Expand Down
7 changes: 4 additions & 3 deletions content/docs/command-reference/run.md
Original file line number Diff line number Diff line change
Expand Up @@ -107,9 +107,10 @@ Relevant notes:
defined as outputs every time its executed by DVC.

- In some situations we have executed a stage and later notice that some of the
output files or directories it creates, which are already in the workspace,
are missing from `dvc.yaml` (`outs` field). We can
[add missing outputs to an existing stage](/docs/user-guide/how-to/add-output-to-stage)
existing dependencies, or output files/directories it creates, which are
jorgeorpinel marked this conversation as resolved.
Show resolved Hide resolved
already in the workspace, are missing from `dvc.yaml` (`deps` and `outs` field
respectively). We can
[add missing dependencies/outputs to an existing stage](/docs/user-guide/how-to/add-deps-or-outs-to-a-stage)
without having to execute it again.

- Renaming dependencies or outputs requires a
Expand Down
5 changes: 4 additions & 1 deletion content/docs/sidebar.json
Original file line number Diff line number Diff line change
Expand Up @@ -100,8 +100,11 @@
"slug": "how-to",
"source": false,
"children": [
"add-output-to-stage",
"undo-adding-data",
{
"label": "Add Tracked Data to a Stage",
jorgeorpinel marked this conversation as resolved.
Show resolved Hide resolved
"slug": "add-deps-or-outs-to-a-stage"
},
"update-tracked-files"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

]
},
Expand Down
56 changes: 56 additions & 0 deletions content/docs/user-guide/how-to/add-deps-or-outs-to-a-stage.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
# Add Dependencies or Outputs to a Stage

There are situations where we have executed a stage (either by writing
jorgeorpinel marked this conversation as resolved.
Show resolved Hide resolved
`dvc.yaml` manually and using `dvc repro`, or with `dvc run`), but later notice
that some of the build requirements are missing from `dvc.yaml`:
jorgeorpinel marked this conversation as resolved.
Show resolved Hide resolved

- Files or directories in the <abbr>workspace</abbr> that are dependencies of
the stage, are missing from `deps` field.
jorgeorpinel marked this conversation as resolved.
Show resolved Hide resolved

jorgeorpinel marked this conversation as resolved.
Show resolved Hide resolved
jorgeorpinel marked this conversation as resolved.
Show resolved Hide resolved
- Output files or directories that the stage creates, which are already in the
workspace, are missing from `outs` field.

Follow the steps below to add existing files/directories as
<abbr>dependencies</abbr> or <abbr>outputs</abbr> to a stage without
re-executing it again, which can be expensive/time-consuming, and is
jorgeorpinel marked this conversation as resolved.
Show resolved Hide resolved
unnecessary.

We start with an example `prepare`, which has a single dependency and output. To
jorgeorpinel marked this conversation as resolved.
Show resolved Hide resolved
add a missing dependency `data.csv`, and output `data/validate` to this stage,
jorgeorpinel marked this conversation as resolved.
Show resolved Hide resolved
we can edit `dvc.yaml` like this:

```git
stages:
prepare:
cmd: python src/prepare.py
deps:
+ - data.csv
jorgeorpinel marked this conversation as resolved.
Show resolved Hide resolved
- src/prepare.py
outs:
- data/train
+ - data/validate
```

> Note that you can also use `dvc run` with the `-f` and `--no-exec` options to
> add another dependency/output to the stage:
>
> ```dvc
> $ dvc run -f --no-exec \
> -n prepare \
> -d data.csv \
> -d src/prepare.py \
> -o data/train \
> -o data/validate \
> python src/prepare.py
> ```
>
> `-f` overwrites the stage in `dvc.yaml`, while `--no-exec` updates the stage
> without executing it.

Finally, we need to run `dvc commit` to save the newly specified output(s) to
the <abbr>cache</abbr> (and to update the hash values of `deps` and `outs` in
`dvc.lock`):

```dvc
$ dvc commit
```
46 changes: 0 additions & 46 deletions content/docs/user-guide/how-to/add-output-to-stage.md

This file was deleted.

26 changes: 9 additions & 17 deletions content/docs/user-guide/how-to/undo-adding-data.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,41 +3,33 @@
There are situations where you want to stop tracking data added previously.
Follow the steps listed here to undo `dvc add`.

Let's first add a data file into an example <abbr>project</abbr> using
`dvc add`, which creates a `.dvc` file to track the data:
Let's first add a data file into an example <abbr>project</abbr>, which creates
jorgeorpinel marked this conversation as resolved.
Show resolved Hide resolved
a `.dvc` file to track the data:

```dvc
$ dvc add data.csv
$ ls
data.csv data.csv.dvc
```

> Note, if you are using `symlink` or `hardlink` as
> [link type](/doc/user-guide/large-dataset-optimization#file-link-types-for-the-dvc-cache)
> for DVC <abbr>cache</abbr>, you will have to unprotect the tracked file first
> (see `dvc unprotect`):
>
> ```dvc
> $ dvc unprotect data.csv
> ```
> Note, if you're using `symlink` or `hardlink` as the project's
> [link type](/doc/user-guide/large-dataset-optimization#file-link-types-for-the-dvc-cache),
> you'll have to unprotect the tracked file first (see `dvc unprotect`).

Now let's reverse `dvc add` by removing the corresponding `.dvc` file and
`.gitignore` entry using `dvc remove`:
Now let's reverse that with `dvc remove`. This removes the `.dvc` file (and
corresponding `.gitignore` entry). The data file is now no longer being tracked
after this:

```dvc
$ dvc remove data.csv.dvc
```

Data file `data.csv` is now no longer being tracked by DVC.

```dvc
$ git status
Untracked files:
data.csv
```

You can run `dvc gc` with the `-w` option to remove the data that isn't
referenced in the current workspace from the cache:
referenced in the current workspace from the <abbr>cache</abbr>:

```dvc
$ dvc gc -w
Expand Down
2 changes: 1 addition & 1 deletion content/docs/user-guide/how-to/update-tracked-files.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Updating Tracked Files
# Update Tracked Files

Due to the way DVC handles linking between the data files between the
<abbr>cache</abbr> and their counterparts in the <abbr>workspace</abbr> (refer
Expand Down