-
Notifications
You must be signed in to change notification settings - Fork 393
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
guide: explain usage of multiple dvc.yaml files #2494
Comments
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
A mention of the "--all-pipelines" argument to |
How does one reference parameters when having multiple |
@JulianoLagana here is a very brief and small example that I tested:
To reference a param file in a different directory, try an explicit syntax for param files:
within a stage, or globally per |
I did create a directory tree like the one mentioned above. How can I choose to run only one of them? Will dvc repro --all-pipelines run them all? I want to select only one to run, how can I do that? lets say I want to run pipeline1/dvc.yaml only |
One way to run it is to do:
Another way to do this:
or
|
@shcheklein Thank you for your answer. I tried
before but did not work for some reason and I think this might be because I moved the dvc.yaml from its original location in the root directory. So lets say I currently have one dvc.yaml along with a dvc.lock file in the root directory of my repo ~/repo, and I want to move the files to ~/repo/pipeline1. Do I need to move the dvc.lock file as well? How should I make this transition? Also I have already finished training while the dvc.yaml was at ~/repo/dvc.yaml and I do not want to retrain. I just want to relocate the files for future training and to combine multiple models in the same repo |
Yes, if it's a heavy pipeline and you don't want to run it again. If you need to change
Moving files is fine. One thing you would need to check and potentially change, or also move - are paths to different dependencies, outputs, etc. You might need update them, or move some additional files, etc. It really depends on the |
When editing paths in dvc.yaml and dvc.lock are the paths relative to the root directory of the repo or relative to the location of the dvc.yaml file or relatve to where I execute the dvc repro command? for example I have my output in
and I currently have my dvc file in
before I had the output as following:
Should the new output path be:
so that it is relatve to the new dvc file location? Right now when I try to run dvc status dvc/pipeline1/dvc.yaml, it reports back that the files are deleted because it is looking for them inside the ~/repo/dvc directory while they are one level up |
I think they are relative to the
Looks like it should be Optional, and only if it's needed - there are a few ways to manipulate this. Use |
I really appreciate your help @shcheklein Thank you so much. Yes I made a mistake its two levels up. Is it possible the "wdir" be set as a global in dvc.yaml? Also what about the paths in the dvc.lock file? do I need to manully modify them as well if I do not run the pipeline? and when modifying the dvc.lock file is the wdir variable recognized in this file I noticed the paths in the dvc.lock file are the old ones. |
No, not at the moment :(
You can run |
Adding back as a p1 since it relates to general monorepo usage, which we are seeing is increasingly common |
Another topic to cover here is how to view experiment results when there are multiple pipelines or projects. From a recent email response:
|
In #1641, it was added that multiple
dvc.yaml
files are supported. I think it would be good to give extra information on how this works and even encourage it where relevant.Specifically one or more of the following:
dvc.yaml
files can be in any subdirectory or nested subdirectory in the project structure and DVC will find themdvc.yaml
files are still respecteddvc.yaml
file will have its owndvc.lock
file in the same directorydvc.yaml
file into multiple files is encouraged where there are clear logical groupings between stages. It avoids confusion, improves readability and shortens commands by avoiding long paths preceding every filenameOther Details
(Added by @shcheklein)
--all-pipelines
or--recursive
to find and run all pipelinesdvc.yaml
can be run withdvc exp run pipeline1/dvc.yaml
orcd pipeline1; dvc exp run
(works fordvc repro
as well)params.yaml
that will be used as a default params file for a particular pipelineExample
An artificial example. We should modify it a bit to be more realistic when we write docs:
Example
To reference a param file in a different directory, try an explicit syntax for param files:
within a stage, or globally per
dcv.yaml
.Tasks
The text was updated successfully, but these errors were encountered: