From 89a982de555ececf45e1d2f9dc33c5e111f032ca Mon Sep 17 00:00:00 2001 From: Ruslan Kuprieiev Date: Mon, 16 Dec 2019 13:34:11 +0200 Subject: [PATCH] document locking in `dvc run/repro` Fixes #860 --- static/docs/command-reference/repro.md | 38 ++++++++++++++++++++++++++ static/docs/command-reference/run.md | 8 ++++++ 2 files changed, 46 insertions(+) diff --git a/static/docs/command-reference/repro.md b/static/docs/command-reference/repro.md index a990c2b76c2..9b6baa2035f 100644 --- a/static/docs/command-reference/repro.md +++ b/static/docs/command-reference/repro.md @@ -45,6 +45,44 @@ files, intermediate or final results. It saves all the data files, intermediate or final results into the DVC cache (unless `--no-commit` option is specified), and updates stage files with the new checksum information. +### Running other dvc commands in parallel + +See +[Running other dvc commands in parallel](/doc/command-reference/run#running-other-dvc-commands-in-parallel). + +### Parallel stage execution + +Currently `dvc repro` is not able to parallelize execution by itself (see +[iterative/dvc#755](https://github.com/iterative/dvc/issues/755)), so if you +need to do that you could launch multiple `dvc repro`s yourself. For example, +say your DAG looks something like: + +``` +$ dvc pipeline show --ascii result.py ++--------+ +--------+ +| A1.dvc | | B1.dvc | ++--------+ +--------+ + * * + * * + * * ++--------+ +--------+ +| A2.dvc | | B2.dvc | ++--------+ +--------+ + * * + ** ** + * * + +------------+ + | result.dvc | + +------------+ +``` + +so it consists of two pipeline branches (pipeline `A` and pipeline `B`) and the +final `result` stage. To reproduce both branches at the same time, you could run +`dvc repro A2.dvc` and `dvc repro B2.dvc` at the same time (e.g. by running them +in separate terminals). After both are done running, you could then run +`dvc repro result.dvc` that will see that both branches are already up-to-date +and will only run the final stage. + ## Options - `-f`, `--force` - reproduce a pipeline, regenerating its results, even if no diff --git a/static/docs/command-reference/run.md b/static/docs/command-reference/run.md index 7c953794059..a399975dd68 100644 --- a/static/docs/command-reference/run.md +++ b/static/docs/command-reference/run.md @@ -52,6 +52,14 @@ captures data and caches relevant data artifacts along the way. See [this example](/doc/get-started/example-pipeline) to learn more and try creating a pipeline. +### Running other dvc commands in parallel + +When running your command, DVC will remove the project lock (`.dvc/lock` file), +so that you will be able to run other DVC commands in parallel. However, it uses +per-path read-write locking instead, to guarantee that no two DVC instances +would be writing to the same path and don't write to paths that are being read +from by another instance. + ### Avoiding unexpected behavior We don't want to tell you how to write your code! However, please be aware that