-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
configurable default params.yaml (or templating entire pipelines) #7939
Comments
The |
yeah, that makes sense. thanks. Is it then possible to specify the params path on the entire pipeline level? I could them simply write a simple loop in a shell script to go through all the different params files and call dvc repro "on" each one of them. It would still be nicer to do that explicitly in the |
No, not at the moment. Doing this via templating the way you have it set up now is probably still the best way to accomplish it. |
Is that something I could help with perhaps (in case this is a feature you'd like to include)? I am not very familiar with the inner workings of dvc at this level of detail but this (a configurable default params file) does not sound particularly complicated and it would definitely help me a lot so I'd love to help implementing that. |
Actually, I still don't quite get how the
I understand that first, dvc looks for the
but the |
@tibor-mach This answer might help you understand the differences between params and templating resolving: #7316 (comment) |
I think configuring a default params file could be a good simple feature to add. The default path is defined here: Line 38 in af649af
And (I hope) that it is the single source of truth. If you would like to make it configurable, you would need to first add a new config option (https://github.com/iterative/dvc/blob/af649af46276b662b4fa03fd6ab63c36521f28aa/dvc/config.py. The way I would do it is by updating the |
I see, that setup is a bit contraintuitive to me, but I gues I understand the behaviour better now :-)
Cool, seems simple enough. I'll have a look at it, thanks! |
@daavoo Just one more thing... How is this going to work with |
I hacked something like this together using a hydra Let's assume you need to do some multi-objective search using hydra where you might have a set number of trials and a parameter space, but the search space is large enough that naming each data, model, metric, plot, manually becomes burdensome. Let's say 1000 trials across 5 hyper-parameters that could be categorical, ints, floats, ranges, distributions, etc. It shouldn't be too difficult to add syntax to support this kind of reproducible search for example using:
In this way, you can define an arbitrary search with tracked inputs and outputs, but not have to name the outputs explicitly. The flexibility of the full hydra launcher syntax (supporting distributed queues, multi-objective search, minute joblib configuration, etc) is far preferable to the limitations of In this way, you could test a set of model configurations across several reproducible set of samples without having giant files like but results/<hash>.json or results/<hash>/ depending on the presence of a path suffix. |
Hi, I have a setup where I use a single pipeline (with several stages) for training multiple models which are almost the same, but use different training data and parameters.
I currently have a copy of a dvc.yaml pipeline in a folder with the respetice
params.yaml
file used for each model. It looks more or less like thisThis works (I then always run
dvc repro -P
) but I have to copy the pipeline file which makes versioning difficult. The only part that is not (since it cannot be AFAIK) templated is the default params file.I would love to have a dvc.yaml file in the root folder of my project which can be run with several different params.yaml files from several locations. Kind of like
foreach ... do
but on the level of the entire pipeline.Also, I believe I have to explicitly add the path to the params file under the
params
keyword when I am running the stage from a different working directory...Not sure if that is a bug or a feature :-)Thanks a lot!
P.S.: I tried a similar setup with templating all the stages but there are limitations in the way templating and foreach do work right now and also I feel like this would be a more elegant way to do this. The pipelines and the overall architecture are the same, what is different are the training data and (some) parameters, so having an option like "for each params file in list reproduce a separate instance of the pipeline" would make a lot of sense to me (it would them make sense to have separate lock files as well)
The text was updated successfully, but these errors were encountered: