iterative · dberenbaum · Jun 13, 2023 · Jun 1, 2023 · Jun 2, 2023 · Jun 6, 2023
diff --git a/content/docs/user-guide/pipelines/running-pipelines.md b/content/docs/user-guide/pipelines/running-pipelines.md
@@ -86,8 +86,7 @@ DVC will skip that stage:
 Stage 'prepare' didn't change, skipping
 ```
 
-DVC will also recover the outputs from previous runs using the
-[run cache](/doc/user-guide/pipelines/run-cache):
+DVC will also recover the outputs from previous runs using the [run cache].
 
 ```
 Stage 'prepare' is cached - skipping run, checking out outputs
@@ -108,6 +107,165 @@ stages:
     always_changed: true
 ```
 
+## Pull Missing Data
+
+`--pull` will download missing dependencies (and will download the cached
+outputs of previous runs saved in the [run cache]), so you don't need to pull
+all data for your project before running the pipeline. `--allow-missing` will
+skip stages with no other changes than missing data. You can combine the
+`--pull` and `--allow-missing` flags to run a pipeline while only pulling the
+data that is actually needed to run the changed stages.
+
+Given the pipeline used in
+[example-get-started-experiments](https://github.com/iterative/example-get-started-experiments):
+
+```cli
+$ dvc dag
+    +--------------------+
+    | data/pool_data.dvc |
+    +--------------------+
+               *
+               *
+               *
+        +------------+
+        | data_split |
+        +------------+
+         **        **
+       **            **
+      *                **
++-------+                *
+| train |              **
++-------+            **
+         **        **
+           **    **
+             *  *
+         +----------+
+         | evaluate |
+         +----------+
+```
+
+If we are in a machine where all the data is missing:
+
+```cli
+$ dvc status
+Not in cache:
+  (use "dvc fetch <file>..." to download files)
+        models/model.pkl
+        data/pool_data/
+        data/test_data/
+        data/train_data/
+```
+
+We can modify the `evaluate` stage (for example, we changed the code to add a
+new evaluation method) and DVC will only pull the necessary data to run that
+stage (`models/model.pkl` `data/test_data/`) while skipping the rest of the
+stages:
+
+```cli
+$ dvc exp run
+'data/pool_data.dvc' didn't change, skipping
+Stage 'data_split' didn't change, skipping
+Stage 'train' didn't change, skipping
+Running stage 'evaluate':
+...
+```
+
+## Verify Pipeline Status
+
+In scenarios like CI jobs, you may want to check that the pipeline is up to date
+without pulling or running anything. You can check that nothing has changed:
+
+<details>
+
+### Clean example
+
+In the example below, data is missing because nothing has been pulled, but
+otherwise the pipeline is up to date.
+
+```cli
+$ dvc status
+data_split:
+        changed deps:
+                deleted:            data/pool_data
+        changed outs:
+                not in cache:       data/test_data
+                not in cache:       data/train_data
+train:
+        changed deps:
+                deleted:            data/train_data
+        changed outs:
+                not in cache:       models/model.pkl
+evaluate:
+        changed deps:
+                deleted:            data/test_data
+                deleted:            models/model.pkl
+data/pool_data.dvc:
+        changed outs:
+                not in cache:       data/pool_data
+```
+
+</details>
+
+```cli
+$ dvc exp run --allow-missing --dry
+Reproducing experiment 'agley-nuke'
+'data/pool_data.dvc' didn't change, skipping
+Stage 'data_split' didn't change, skipping
+Stage 'train' didn't change, skipping
+Stage 'evaluate' didn't change, skipping
+```
+
+If anything is not up to date, the pipeline will fail:
+
+<details>
+
+### Dirty example
+
+In the example below, the `data_split` parameter in `params.yaml` was modified,
+so the pipeline is not up to date.
+
+```cli
+$ dvc status
+data_split:
+        changed deps:
+                deleted:            data/pool_data
+                params.yaml:
+                        modified:           data_split
+        changed outs:
+                not in cache:       data/test_data
+                not in cache:       data/train_data
+train:
+        changed deps:
+                deleted:            data/train_data
+        changed outs:
+                not in cache:       models/model.pkl
+evaluate:
+        changed deps:
+                deleted:            data/test_data
+                deleted:            models/model.pkl
+data/pool_data.dvc:
+        changed outs:
+                not in cache:       data/pool_data
+```
+
+</details>
+
+```cli
+$ dvc exp run --allow-missing --dry
+Reproducing experiment 'dozen-jogs'
+'data/pool_data.dvc' didn't change, skipping
+ERROR: failed to reproduce 'data_split': [Errno 2] No such file or directory: '/private/tmp/example-get-started-experiments/data/pool_data'
+```
+
+You can also check that all data exists on the remote. The command below will
+succeed (set the exit code to `0`) if all data is found in the remote.
+Otherwise, it will fail (set the exit code to `1`).
+
+```cli
+$ dvc data status --not-in-remote --json | grep -v not_in_remote
+true
+```
+
 ## Debugging Stages
 
 If you are using advanced features to interpolate values for your pipeline, like
@@ -132,3 +290,4 @@ stage train: {'model': {'batch_size': 512, 'latent_dim': 8,
 
 [templating]: /doc/user-guide/project-structure/pipelines-files#templating
 [hydra composition]: /docs/user-guide/experiment-management/hydra-composition
+[run cache]: /doc/user-guide/pipelines/run-cache