-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
exp run: Support composing and dumping Hydra config. #8093
Conversation
@dberenbaum please take a look |
32b94a2
to
0c74211
Compare
Nice work @daavoo! I really like how it achieves two important product goals:
I came across two big questions when trying it:
More minor questions:
Interesting side effect: |
I thought about merging but found that it would be confusing when combined with Perhaps it would be better to clearly establish that hydra |
It should not, because of the same reason |
Did you find any particular issue? I copy-pasted the structure of the link to [hydra]
output_file = params.yaml
config_dir = multiconf # dvc.yaml
stages:
train:
cmd: python train.py
deps:
- train.py
params:
- params.yaml: # train.py
import dvc.api
def my_app() -> None:
print(dvc.api.params_show())
if __name__ == "__main__":
my_app() $ dvc exp run
Running stage 'train':
> python train.py
{'server': {'site': {'fb': {'domain': 'facebook.com'}, 'google': {'domain': 'google.com'}}, 'host': 'localhost', 'port': 443}} $ dvc exp run -S 'server/site=[google,amazon]'
Running stage 'train':
> python train.py
{'server': {'site': {'google': {'domain': 'google.com'}, 'amazon': {'domain': 'amazon.com'}}, 'host': 'localhost', 'port': 443}} |
My main motivation for going with
Is there something specific that resulted obscure to you? I am not sure how much it comes from What do you think about having a special key for
Not sure I like the idea of entangling the hydra compose and dump with the DVC defaults parameters file. |
No strong reasons. I think the main reason was that I didn't know how it could be configured without requiring a new section or a change in the semantics of the existing sections. Wanted to add it in a way that could reuse the existing semantics. |
Another issue with configuring at the |
Some ideas for enabling this at the
If stages:
train:
cmd: python train.py ${train_config}
params:
- hydra Q) How to configure? Something like the following might look appealing but completely change the semantics / internal logic of stages:
train:
cmd: python train.py ${train_config}
params:
- hydra:
output_file: params.yaml
config_dir: conf
There were old discussions about this. But basically, the hydra compose and dump could be implemented/understood as a stage callback: stages:
train:
cmd: python train.py ${train_config}
callbacks:
- hydra:
output_file: params.yaml
config_dir: conf
params:
- train_config
|
Pros and cons of the Pros
Cons
IMO the most likely and useful way to integrate Hydra and DVC is to have a massive Hydra config dir that gets reused throughout the project. For existing Hydra users, they are more likely to have a single Hydra config dir, and it's less work to integrate Hydra if it doesn't have to be configured at each stage in the pipeline. For that scenario, configuring at the project level in |
For future reference, I set up https://github.com/dberenbaum/complex_config_example when I was testing this out 😅 (edit: main branch uses current dvc functionality, hydra branch uses this PR) |
5d61212
to
a97966e
Compare
Latest update:
|
I'm worried this will confuse more people than it helps. Once set, it seems hard to explain how it works and clunky to have to use We are only supporting a single Hydra output file, so why not use the default params file? If people don't want the path |
I don't have a specific use case in mind. I guess I was trying to cover most scenarios. |
a97966e
to
b185339
Compare
Hi and thank you for working on this! As a user of both Hydra and DVC I thought it might be helpful to contribute a realistic use case. It may already be handled well by the current implementation, I don’t know. Hydra’s great for modular projects where interchangeable modules may have heterogeneous configuration options. For example, I may want to use either a random forest or a multi-layer perceptron as my model architecture. Its two config files may look like:
I’d like to now run a sequence of experiments.
The final experiment should be immediately recovered from cache but experiment 2 may have overwritten the value of Is this case correctly handled by your parameter storage? |
@d-miketa I think so, with proper setup. I am making some assumptions here based on what you shared (like the use of Here is the setup: `conf` directory$ tree conf
conf
├── config.yaml
└── model
├── mlp.yaml
└── rf.yaml
$ cat conf/config.yaml
defaults:
- model: rf
$ cat conf/model/mlp.yaml
_target_: models.MLP
n_hidden: 128
$ cat conf/model/rf.yaml
_target_: models.RandomForest
tree_depth: 3 `models/__init__.py`@dataclass
class MLP:
n_hidden: int
@dataclass
class RandomForest:
tree_depth: int `train.py`import dvc.api
from hydra.utils import instantiate
def my_app() -> None:
model = instantiate(dvc.api.params_show()["model"])
print(model)
if __name__ == "__main__":
my_app() `dvc.yaml`stages:
train:
cmd: python train.py
params:
- params.yaml: So, with that, I run:
$ dvc exp run -S model.tree_depth=4
...
> python train.py
RandomForest(tree_depth=4)
... $ dvc exp run -S model=mlp -S model.n_hidden=192
...
> python train.py
MLP(n_hidden=192)
... $ dvc exp run -S model.tree_depth=5
...
> python train.py
RandomForest(tree_depth=5)
... And: $ dvc exp show
──────────────────────────────────────────────────────────────────────────────────────────────
Experiment Created model._target_ model.tree_depth model.n_hidden
──────────────────────────────────────────────────────────────────────────────────────────────
workspace - models.RandomForest 5 -
master 11:47 AM models.RandomForest 3 -
├── ea452af [exp-bf727] 11:48 AM models.RandomForest 5 -
├── bb97bdf [exp-c8cde] 11:48 AM models.MLP - 192
└── 730ecec [exp-5b346] 11:48 AM models.RandomForest 4 -
────────────────────────────────────────────────────────────────────────────────────────────── |
Sorry to come back to this so late, but I'm still slightly confused.
This could lead to situations like setting only
|
9cdb7f1
to
a05e7fe
Compare
I think this one option has the cleanest UX. Added |
@daavoo Ahhh that’s amazing! So good to see the integration’s naturally taking care of this problem - you clearly made some sound architectural choices. 😁 |
@iterative/dvc Could someone review please? |
a05e7fe
to
569740a
Compare
@daavoo, I have rebased it to main, and added some skips to the tests that does not work in 3.11. |
dvc/utils/hydra.py
Outdated
overrides: List of `Hydra Override`_ patterns. | ||
|
||
.. _Hydra Override: | ||
https://hydra.cc/docs/next/advanced/override_grammar/basic/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This link is dead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The feature will be used depending on whether `config.hydra.enabled` is True or False. Uses `hydra.initialize_config_dir` and `hydra.compose` (from https://hydra.cc/docs/advanced/compose_api/) to build the config and dump it to `params.yaml`. The content of the output file will be overwritten. `config.hydra.config_dir` and `config.hydra.config_name` can be used to customize the values passed to `hydra.initialize_config_dir` and `hydra.compose`. Can be combined with `--set-param` overrides. Closes #8082
The feature will be used depending on whether
config.hydra.enabled
is True or False.Uses https://hydra.cc/docs/advanced/compose_api/ to build the config and dump it to
params.yaml
. The content of the output file will be overwritten.config.hydra.config_dir
andconfig.hydra.config_name
can be used to customize the values passed to the APIs used from hydra (hydra.initialize_config_dir
andhydra.compose
).Can be combined with
--set-param
overrides.Closes #8082
❗ I have followed the Contributing to DVC checklist.
📖 If this PR requires documentation updates, I have created a separate PR (or issue, at least) in dvc.org and linked it here.