-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hydra Configuration for Pytorch Lightning #2639
Conversation
Hello @anthonytec2! Thanks for updating this PR.
Comment last updated at 2020-07-21 05:27:04 UTC |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@anthonytec2:
Can you squash all the commits? it will make for a better clean start for this review phase.
Big questions for the PL team:
A. Parts of this example belongs in PL core in my opinion, specifically the generic configuration classes.
B. The registration of the configs with the Hydra ConfigStore, as well as as the example itself - depends on Hydra 1.0.0rc2 or newer.
-
We need to enforce that we are running against a supported version somehow.
From my perspective, the correct way to achieve that is via a pip dependency.
1.1. If this is not viable, will anextra
dependency work?
pip install pip pytorch-lightning[hydra]
1.2. if not viable, will a runtime check + warning on old version work? -
Will the PL team accept the generic configs into the core and maintain them moving forward?
I know those are big decisions.
Please take your time evaluating this PR and consider the usefulness versus the cost of maintaining those config classes and adding a more explicit dependency on Hydra.
@@ -0,0 +1,21 @@ | |||
## Hydra Pytorch Lightning Example |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs a pass for capitalization and typos.
pl_examples/hydra_examples/README.md
Outdated
|
||
All of the above hyperparameters are configured in the config.yaml file which contains the top level configuration for all these configurations. Under this file is a defaults list which highlights for each of these Hydra groups what is the default configuration. Beyond this configuration file, all of the parameters defined can be overriden via the command line. | ||
|
||
Additionally, for type safety we highlight in our file `user_config.py` an example of extending the `PLConfig` data class with a user configuration. Hence, we can get the benefits of type safety for our entire config.yaml. For further examples of this, [checkout](https://hydra.cc/docs/next/tutorials/structured_config/intro). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are no more examples for this there, but it is recommended that people using Hydra read both the Basic tutorial and the Structured Configs tutorial.
# @package _group_ | ||
|
||
functions: | ||
print: | ||
target: pl_examples.hydra_examples.user_config.MyPrintingCallback | ||
message: | ||
target: pl_examples.hydra_examples.user_config.MessageCallback | ||
params: | ||
iter_num: 12 | ||
|
||
callbacks_list: | ||
- ${callbacks.functions.print} | ||
- ${callbacks.functions.message} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we are not using composition here it will be simpler to just inline the callbacks:
# @package _group_ | |
functions: | |
print: | |
target: pl_examples.hydra_examples.user_config.MyPrintingCallback | |
message: | |
target: pl_examples.hydra_examples.user_config.MessageCallback | |
params: | |
iter_num: 12 | |
callbacks_list: | |
- ${callbacks.functions.print} | |
- ${callbacks.functions.message} | |
# @package _group_ | |
callbacks: | |
- target: pl_examples.hydra_examples.user_config.MyPrintingCallback | |
- target: pl_examples.hydra_examples.user_config.MessageCallback | |
params: | |
iter_num: 12 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may not want to use composition here, but we should make things more explicit to allow the user to use composition upon this quite easily...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OmegaConf replaces lists completely during merge:
c1 = OmegaConf.create({"list": [1,2,3]})
c2 = OmegaConf.create({"list": [4,5,6]})
c3 = OmegaConf.merge(c1,c2)
assert c3 == {"list": [4,5,6]}
One way to compose list element is what this looked like before:
e1 = {"elements": {"e1": {"foo":"bar"}}}
e2 = {"elements": {"e2": {"zoo":"var"}}}
l1 = {"list": ["${elements.e1}", "${elements.e2}"]}
cfg = OmegaConf.merge(l1, e1, e2)
print(cfg.pretty(resolve=True))
outputs:
list:
- foo: bar
- zoo: var
elements:
e1:
foo: bar
e2:
zoo: var
This lets you reuse e1 and e2 in different lists without repeating them.
This is a bit advanced, and for the sake of this example I think we can just keep things simple.
This usage pattern is a good candidate for a feature I am thinking about of hiding a config node. in this case we want to hide the elements node (but keep it there to support the runtime interpolation from list). This feature is planned for the next major version of OmegaConf.
trainer = Trainer( | ||
**cfg.trainer, | ||
logger=instantiate(cfg.logger), | ||
profiler=instantiate(cfg.profiler), | ||
checkpoint_callback=instantiate(cfg.checkpoint), | ||
early_stop_callback=instantiate(cfg.early_stopping), | ||
callbacks=callbacks, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PL Team, any thoughts about simplifying this?
Is it possible to create standard config driven initialization method (or factory) for the trainer?
How do you feel about:
trainer = Trainer.from_cfg(cfg)
@anthonytec2 :
I think all the PL specific configs should be under the same node:
pl:
trainer:
...
logger:
...
profiler:
...
...
This would allow more easier initialization of PL (without having to worry about the user adding his/her own things to the top level config:
trainer = Trainer.from_cfg(cfg.pl)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually I like this proposal, but I feel we should change the name to Trainer.from_hydra_config
since otherwise it may not be clear, what kind of config to expect here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought the same. I agree with Justus
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The config object is just an object with some fields, there is nothing Hydra about it.
num_tpu_cores: Optional[int] = None | ||
|
||
|
||
cs.store(group="trainer", name="trainer", node=LightningTrainerConf) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please namespace all of the pl configs:
pl/trainer
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@omry Can you please explain what are benefits of this?
Won't this just only make the configuration deeper? And as a result, when the user will like to override trainer parameters from command line, he/she will have to add pl
to each one of them, like pl.trainer.gpus=[2,3]
etc.?
Asking as in NeMo (where we independently started to merge PT Lightning with Hydra like two weeks ago;)) we just yesterday did the opposite and got rid of pl
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tkornuta-nvidia, yes:
You can think of the configuration object Hydra is composing as a shared object that is potentially the home for more than one user.
As an example, the configuration is already hosting both Hydra itself under the hydra node (and the hydra/ config groups) and the user config.
Different frameworks that are installed can make their configs accessible via Hydra (via a Config Searchpath plugin and/or by storing configs in the ConfigStore like in the example above).
To avoid collisions between themselves and with the end user, I am encouraging framework owners to place their configs in some kind of namespace.
Imagine the following scenario:
A user wants to use NeMo and Fairseq at the same time assuming they both use Hydra.
If both conveniently places their model configs in the config group model and in the config node model, the user can now easily try to configure the NeMo model with a configuration for a Fairseq model.
namespacing is solving this.
This of it as the equivalent of Java packages or Python modules, but for configuration.
The cost that you are mentioning about the command line being longer is real, but I think It's still the right choice.
@@ -0,0 +1,108 @@ | |||
from dataclasses import dataclass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the dataclasses for optimizer and scheduler should probably be a part of PL. It makes more sense than evey user re-defining Adam, AdamW, CosineConf etc.
Speaking of which, we should be consistent. Why Either consistently add Conf suffic or consistently not add it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1.) IMO you're right, this is not specific to any research area so this should be part of lightning if we decide to move major things from this pr to lightning core
2.) We should definitely add the suffix to make clear that it actually is a config
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to add that in my present use cases, although I don't necessarily want to use the structured configs paradigm for everything at the moment as I'm kind of enjoying the simplicity of a terse yaml heirarchy, I really like the idea of supporting trainer_conf.py
, optimizer.py
, and scheduler.py
via the structured configs.
It feels very similar in spirit to maintaining Trainer
's static method add_argparse_args()
. On that note, maybe I missed this in the earlier discussion, but what are the actual cons to including optimizer_conf.py
and scheduler_conf.py
in PTL? Perhaps aggregating them into a singular hydra_conf.py
in core so there's only one file to maintain?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@romesc-CMU:
As a user you can still use yaml config files and they will be validated automatically against the dataclasses as long as their name and config group matches.
fix job name template change to model create hydra examples folder fix error with none values optimizers and lr schedules clean up model structure model has data included dont configure outputs document hydra example update readme rename trainer conf scheduler example schedulers update change out structure for opt and sched flatten config dirs reduce number of classes scheduler and opt configs spelling change group config store location change import and store structured conf remaining classes fix for date change location of trainer config fix package name trainer instantiation clean up init trainer type fixes clean up imports update readme add in seed Update pl_examples/hydra_examples/README.md Co-authored-by: Omry Yadan <[email protected]> Update pl_examples/hydra_examples/README.md Co-authored-by: Omry Yadan <[email protected]> change to model clean up hydra example data to absolute path update file name fix path isort run name change hydra logging change config dir use name as logging group load configs in init py callout callbacks fix callbacks empty list example param data params example with two other data classes fix saving params dataset path correction comments in trainer conf logic in user app better config clean up arguments multiprocessing handled by PL settings cleaner callback list callback clean up top level config wip user config add in callbacks fix callbacks in user config fix group names name config fix user config instantiation without + change type split for readability user config move master config yaml hydra from master changes remove init py clean up model configuration add comments add to readme function doc need hydra for instantiate defaults defined in config yaml remove to do lines issue note remove imports unused cfg init removal double define instantiate changes change back to full config Update pl_examples/hydra_examples/pl_template.py Co-authored-by: Omry Yadan <[email protected]> Revert "double define" This reverts commit 4a9a962. fix data configuration remove bug comment, fixed already fix callbacks instantiate
I'm very happy this is moving forward, thank you @anthonytec2 for all the hard work! One thing I'm still wondering about (after testing the latest commit) is whether we can also support an alternative to passing only When passing configs like this, it is somewhat incompatible with the current
as opposed to:
This was one of the recommendations following the removal of passing an With 2519, I believe we can also offer this option. |
Actually I like this PR a lot. We should be clear on the interface and we should check about where to put those configs to make them accessible from the package. @omry how long until the hydra release? is there already an ETA? A) I agree that some parts of this belong to lightning core. But we should carefully sort them out from this PR. However this is just my personal opinion, so we should decide this together @PyTorchLightning/core-contributors |
I released Hydra 1.0.0rc2 yesterday. No deadline for Hydra 1.0.0 but it's getting close (I think within a few weeks).
I realized that users can choose multiple extras yesterday:
Sounds good. |
One limitation I want to call out to avoid surprises: |
@omry regarding extras: I've used them a lot and I experienced users not to know about it at all. In most cases users will just install the plain package and when they hit an import error they will just install the missing package afterwards |
I think if we go there it will have to be documented clearly. |
thank you for putting all these examples together! I do feel like these configs should live on the main hydra repo and not the pl lightning repo for these reasons:
So, my requested changes here are to:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please apply the requested changes
I like this option to install even some from extras :] |
@yukw777, thanks for digging. I would like to understand this better. can you jump into the Hydra chat? |
@omry Actually I had a serious problem when I was implementing a prototype of a Structured Config for PL trainer in NeMo, in particular for gpus I did @anthonytec2 did better job with |
The reason is that I am working on releasing Hydra and I did not have the time to work on this.
You can do something like
|
let's hare hydra thread in PL slack... |
@williamFalcon What is the status of this PR? What is missing? What are the showstoppers? And when it can be merged? Asking as I am one of people working on NVIDIA NeMo and recently we have started using a) PTL as NeMo backend for training and b) hydra for configuration management. So as you might guess, we have faced several issues that this PR is actually trying to solve... ;) |
yup. The main blocker is that we still need to figure out where these configs will be hosted. Since we don't want to couple PL to any config system, we have a few options:
Basically, we want to make sure the hydra team is also involved in maintaining these configs, since it can shift a lot to us which we have already seen with the loggers. But 100% agree, that i'd like to get this solved asap |
Thanks for reply. So most of the structured configs introduced by this PR is in fact associated with pure PyTorch. Tiny problem: PyTorch is not depending on hydra. Moreover, hydra, by definition, is "framework-agnostic" - so IMO they should not be a part of hydra. Option number two (bolts) doesn't sound good to me only by looking at package dependencies - PTL Bolts depend on PTL, not the opposite. So if you do not want them to be part of PTL project, then I guess option 3 is the way: they should be moved to a separate repository. The question is who/where it should be hosted. If you don't want to host it under PTL organization, then the natural choice to me is pytorch organization. But I can also arrange this and start hosting it on NVIDIA open-source organization. @omry ? @anthonytec2 ? Let's get that ball rolling... |
@tkornuta-nvidia thanks for the great suggestion. Yeah, i think hosting it at nvidia would be great. That way both frameworks can contribute to it as needed! @PyTorchLightning/core-contributors thoughts? |
@tkornuta-nvidia, @williamFalcon @anthonytec2 I'd be happy to help maintain if we host elsewhere. Sort of funny that the project I'm working on which uses the PTL+hydra combo is also parttime @ nvidia (the seattle robotics group). If it's not too much trouble, it would definitely be great to get on the PTL slack and discuss in the #configs-hydra channel 😄 |
Just want to chip in, the neovim project did something similar that they create a separate repo under the same org that hosts configurations for a specific functionality, in a best effort manner. https://github.com/neovim/nvim-lsp |
callbacks = [instantiate(c) for c in cfg.callbacks.callbacks] if cfg.callbacks else [] | ||
|
||
trainer = Trainer( | ||
**cfg.trainer, | ||
logger=instantiate(cfg.logger), | ||
profiler=instantiate(cfg.profiler), | ||
checkpoint_callback=instantiate(cfg.checkpoint), | ||
early_stop_callback=instantiate(cfg.early_stopping), | ||
callbacks=callbacks, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about separating this into a TrainerConf class.
We use target to this class that does these lines. So all users have to do it select TrainerConf and when they instantiate() on it, they get the real Trainer?
@anthonytec2 should we keep this PR open? or is everything already in the nvidia repo? |
Hi, at the end the project was moved to PyTorchEcosystem: and it is still work in progress. Once done, we will clean up this PR and remove the PT-related structured configs. Hopefully then we will merge. |
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 7 days if no further activity occurs. If you need further help see our docs: https://pytorch-lightning.readthedocs.io/en/latest/CONTRIBUTING.html#pull-request or ask the assistance of a core contributor here or on Slack. Thank you for your contributions. |
This pull request is going to be closed. Please feel free to reopen it create a new from the actual master. |
@tkornuta-nvidia You write "Once done, we will clean up this PR and remove the PT-related structured configs. Hopefully then we will merge.", I am curious if this PR is still in progress? I am interested in good PL and Hydra support |
Hey @turian, since this is a pretty broad goal that reaches beyond configuring just lightning classes, we've moved a lot of the work to: https://github.com/pytorch/hydra-torch We are also working on the analogous: https://github.com/romesco/hydra-lightning. If you want to start testing by using in your project, you can use a git dependency until the PyPI release! We would love the feedback. Both projects follow the same patterns and are being generated via an automatic tool we've been developing. My goal is to provide a minimal package that projects using hydra with lightning can simply import and have all necessary configs available. It doesn't try to do anything beyond that. The beauty of this format is that all End user code will have something like: from hydra_configs.torch.utils.data import DataLoaderConf
from hydra_configs.pytorch_lightning.trainer import TrainerConf My timeline for initial releases of these config packages (minimal critical path classes for torch, torchvision, and lightning) is before the end of January. If you'd like to jump in and help, I could definitely use it and we could probably get it finished even sooner =]. |
@romesco thank you so much for the pointers. I actually just started migrating a new pl project to hydra, following the conventions of @yukw777 in https://github.com/yukw777/leela-zero-pytorch I will check out the nascent projects. I am curious if you participate on the zulip chat or similar, so we can discuss a little more. I am thrilled to see this |
What does this PR do?
This merge request is the template for some initial discussion for using Hydra==1.0.0rc2 and Pytorch Lightning. @omry and I have been working on an example using the best features of Hydra to configure Pytorch Lightning. We want to understand what is the right place to put something like this MR. We have one file,
trainer_conf.py
which we think would be a good addition to the main repository if it could be maintained. This file consists of the base trainer configuration used for Pytorch Lightning. Users can then extend this base configuration with their own settings.We highlight one style of configuring Pytorch Lightning with Hydra, which is using structured configs. Structured configs are mostly data classes that define the types and variables for a given argument group. This enables argument parsing to have type safety. This MR, enables configuring all of the argparse setting defined for Pytorch Lightning's Trainer class and shows an example of a user configuration which extend a base configuration. We also highlight Hydra's ability to instantiate objects, by creating ObjectConfs, which is really useful for something like changing out your, optimizer, scheduler, model or dataset easily. One other feature we show is callbacks can still be defined and used with PL using the ObjectConf structure.
For example a user can change the dataset in this MR by:
should a user make an error in the configuration:
We experience error #2519 which limits tensorboard support at the moment.
Fixes #2322
Fixes #807
Before submitting