From c737e48a3726e7d00a69b399358cadc8f7d7bfdf Mon Sep 17 00:00:00 2001 From: Ruslan Kuprieiev Date: Mon, 16 Dec 2019 13:34:11 +0200 Subject: [PATCH] document locking in `dvc run/repro` Fixes #860 --- public/static/docs/command-reference/repro.md | 38 +++++++++++++++++++ public/static/docs/command-reference/run.md | 8 ++++ 2 files changed, 46 insertions(+) diff --git a/public/static/docs/command-reference/repro.md b/public/static/docs/command-reference/repro.md index a990c2b76c2..9b6baa2035f 100644 --- a/public/static/docs/command-reference/repro.md +++ b/public/static/docs/command-reference/repro.md @@ -45,6 +45,44 @@ files, intermediate or final results. It saves all the data files, intermediate or final results into the DVC cache (unless `--no-commit` option is specified), and updates stage files with the new checksum information. +### Running other dvc commands in parallel + +See +[Running other dvc commands in parallel](/doc/command-reference/run#running-other-dvc-commands-in-parallel). + +### Parallel stage execution + +Currently `dvc repro` is not able to parallelize execution by itself (see +[iterative/dvc#755](https://github.com/iterative/dvc/issues/755)), so if you +need to do that you could launch multiple `dvc repro`s yourself. For example, +say your DAG looks something like: + +``` +$ dvc pipeline show --ascii result.py ++--------+ +--------+ +| A1.dvc | | B1.dvc | ++--------+ +--------+ + * * + * * + * * ++--------+ +--------+ +| A2.dvc | | B2.dvc | ++--------+ +--------+ + * * + ** ** + * * + +------------+ + | result.dvc | + +------------+ +``` + +so it consists of two pipeline branches (pipeline `A` and pipeline `B`) and the +final `result` stage. To reproduce both branches at the same time, you could run +`dvc repro A2.dvc` and `dvc repro B2.dvc` at the same time (e.g. by running them +in separate terminals). After both are done running, you could then run +`dvc repro result.dvc` that will see that both branches are already up-to-date +and will only run the final stage. + ## Options - `-f`, `--force` - reproduce a pipeline, regenerating its results, even if no diff --git a/public/static/docs/command-reference/run.md b/public/static/docs/command-reference/run.md index 7c953794059..1191e61aa3d 100644 --- a/public/static/docs/command-reference/run.md +++ b/public/static/docs/command-reference/run.md @@ -52,6 +52,14 @@ captures data and caches relevant data artifacts along the way. See [this example](/doc/get-started/example-pipeline) to learn more and try creating a pipeline. +### Running other dvc commands in parallel + +When running your command, DVC will remove the project lock (`.dvc/lock` file), +so that you will be able to run other DVC commands (e.g. `dvc run`, +`dvc import`, `dvc repro` etc) in parallel. However, it uses per-path read-write +locking instead, to guarantee that no two DVC instances would be writing to the +same path and don't write to paths that are being read from by another instance. + ### Avoiding unexpected behavior We don't want to tell you how to write your code! However, please be aware that