Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

remote execution with hydra #219

Closed
elinep opened this issue Oct 14, 2020 · 26 comments
Closed

remote execution with hydra #219

elinep opened this issue Oct 14, 2020 · 26 comments

Comments

@elinep
Copy link

elinep commented Oct 14, 2020

Hello trains team,

I'm trying to use trains and hydra.
hydra is nice to manage configuration and helps the user to build one from the command line with auto-completion and preset features.

Nevertheless I'm strugling with remote execution. As explained here, hydra changes the working dir which defeats trains
script information detector.
At runtime hydra creates a directory to compose the configuration and set the working dir inside that untracked directory.
If we tried to remote execute such a task, trains-agent complains that the working dir does not exist.

I came up with two non sastisfactoring partial solutions demonstrated here

First the defective script:
The trains task is created inside hydra main, working_dir and entrypoint are wrong.

Fisrt fix attempt:
The trains task is created before entering hydra's main function. Script info are indeed correct, but I'd like to modify the task name and project according to the application config. I did not find a way to set the project name once the task is created.
Anyway I don't find this solution to be elegant.

Second fix attempt:
The task is created inside hydra's context and I try to update the working_dir and entrypoint. I tested this solution in my closed source application. It looks very fragile as the script info is filled by a trains thread and I didn't find a way to porperly synchronize with it although this minimal example seems to work.

Any suggestion ?

@bmartinn
Copy link
Member

bmartinn commented Oct 14, 2020

This is awesome stuff @elinep !
I think connecting hydra with trains will be greatly appreciated by the community :)

  1. Regrading the initial working directory
  • We can introduce a decorator to add above hydra.main and have this decorator function hold the correct working dir.
    Then make sure Task.init retrieves it correctly, from the stored value.
    Basically replacing the "Path.cwd" here with the correct path.

  • Another option is to see if the original path is stored inside hydra, and retrieve that automatically from hydra

  1. I would also suggest we store automatically the configuration files on the Task itself.
    I think that basically we need to copy config.yaml as is, and overrides.yaml as Args, I'm not actually sure on the functionality of hyrda.yaml but we could store that as well.
    My goal is to be able to edit these files in the UI, then launch it remotely via trains-agent and have the config.yaml be automatically replaced so from hydra's perspective it is as if this is the original one.

What do you think?

Edit:
I think that based on your Second fix attempt we can fix the Trains get_working_dir to test if running within hydra, use the hydra working directory, instead of cwd(). BTW, is the cwd() always a subfolder of the original working directory ?

@elinep
Copy link
Author

elinep commented Oct 15, 2020

Glad to read your willing to support hydra ;)

  1. I think trains relying on hydra.utils.get_original_cwd() to detect path is really good.

  2. This is the snippet I use to connect model config

def update_config(config: OmegaConf):
    """
    Serialize and sync config with trains
    :param config: the config to sync
    :return:
    """
    # expected config_global format
    schema = OmegaConf.structured(config._metadata.ref_type)

    # serialize config
    # For config logging we use yaml format (Trains: Artifacts ->  Model configuration)
    # save config in a temp yaml file
    config_global_file = tempfile.NamedTemporaryFile("w+t")
    config_global_file.write(OmegaConf.to_yaml(config))
    config_global_file.flush()
    config_global_file_name = config_global_file.name

    # sync with server if a task has been created
    current_task = Task.current_task()
    if current_task:
        # send yaml to trains server
        config_global_file_name = Task.current_task().connect_configuration(config_global_file_name)

        # for visualization (Trains: Hyperparameters)
        Task.current_task().connect(generate_trains_hyperparameter_dict(config))

    config_back_ = OmegaConf.load(config_global_file_name)
    config_back = OmegaConf.merge(schema, config_back_)

    return config_back

And here a snippet to generate a trains hyperparameter view of the yaml file. The purpose is to profit from the hyperparameter diff provided by trains when comparing experiments.

def generate_trains_hyperparameter_dict(config: OmegaConf):
    res_dict = {
        "!WARNING!": """Cloning and editing these hyperparameters will have no effect.
The true hyperparameters are stored in Artifact->Model Configuration
These view has been generated to ease the comparison with other experiments.""",
        "CLI": " ".join(sys.argv),
    }
    return flatten_config(config, prefix="yaml", res_dict=res_dict)

def flatten_config(config: OmegaConf, prefix="", res_dict={}):
    for key in config:
        if key not in config:
            value = MISSING
        else:
            value = config[key]
        if prefix:
            next_prefix = prefix + "." + key
        else:
            next_prefix = key
        if isinstance(value, DictConfig):
            flatten_config(value, next_prefix, res_dict)
        elif isinstance(value, ListConfig):
            # TODO: if this is a list of complex object, this might not be a good representation
            res_dict[next_prefix] = str(value)
        else:
            res_dict[next_prefix] = str(value)
    return res_dict

Maybe a first step is for trains to support Omegaconf configuration (the object generated by hydra based on the composition of all the yaml files) and let the user deal with hydra. This way we can link hyperparameter as usual:

omegaconf_dict = trains.connect(omegaconf_dict)

As always thank you for your great support !

@bmartinn
Copy link
Member

  1. Commit is on its way :)

  2. What would be the object to store, is it the "Omegaconf" ?
    Is the singleton "ConfigStore" a scheme only (i.e. no need to store values)?
    Based on your update_config implementation, it seems that it will support configuration overriding from the UI (when Task is launched with trains-agent), did I miss anything?

I would imagine generate_trains_hyperparameter_dict will be the hydra "overrides" , which are argparser based, no?

Assuming (1) is done, and update_config just needs the right trigger (actually I don't know what it is, when is the "Omegaconf" fully updated ?), could we just leave out the "overrides" (assuming we allow users to literally edit the yaml files)?

As always thank you for your great support !

😊 Thank you!

@omry
Copy link

omry commented Oct 16, 2020

Lots of questions.
It will be great if people from trains jumps into the Hydra chat to questions there and discuss possible solutions.

@elinep
Copy link
Author

elinep commented Oct 16, 2020

@bmartinn

  1. Commit is on its way :)

👍
Do not hesitate to make me test it before publishing it ;)

  1. What would be the object to store, is it the "Omegaconf" ?
    Is the singleton "ConfigStore" a scheme only (i.e. no need to store values)?

Indeed, I think we only have to store the Omegaconf.

ConfigStore is just a tool to describe configurations from python instead of yaml files along the application. It can store values but hydra merges all informations from yaml files, ConfigStore and CLI to generate a OmegaConf config for the application.

I have not much experience with OmegaConf. It might be complex object for you to handle. It might be simpler and safer to rely on its yaml serialization/deserialization to connect it with trains as I did in the minimal example.

Based on your update_config implementation, it seems that it will support configuration overriding from the UI (when Task is launched with trains-agent), did I miss anything?

Yes it works (beside the working_dir detection issue) . When remotely executed, the config is provided by trains based on the model_configuration. Therefore we can edit the model_configuration on the UI to tweak the task. Although It is not as powerful as running from the command line since hydra can't help building the conf (autocompletion, preset suggestion, etc...)
Anyway It is still possible to build a config locally from the CLI and trigger remote execution with task.execute_remotely().

I would imagine generate_trains_hyperparameter_dict will be the hydra "overrides" , which are argparser based, no?

Yes it is argparse based.

I considered two options to save the config:

  1. Use trains argparse autoconnect and let the remote execution re-generate the config with hydra.
    I don't like this option for two reasons:
  • as already said, the merged OmegaConf depends on the CLI arguments, yaml files and ConfigStore(). I can imagine multiple reasons for the remote execution to fail. For example if the user didn't commit all the relevant yaml files, the apps would run locally but fail remotely.
  • CLI arguments do not really reflects the experiment configuration as it holds only partial information. Therefore it is hard to compare experiments, or to have a clear idea of what is the configuration of a given experiment from the UI.
    Nevertheless argparse autoconnect probably works fine with hydra.
  1. Disable trains argparse, and save "manually" the final config as trains model configuration.
    I like this option for two reasons:
  • the configuration is frozen after hydra has merge it. In my opinion it is better for reproductibility and remote execution.
  • I can "hack" the UI hyperparameters view to hold informations that will appear in the diff when comparing two experiments. In the minimal examples it is basically a copy of the model_configuration.

Assuming (1) is done, and update_config just needs the right trigger (actually I don't know what it is, when is the "Omegaconf" fully updated ?), could we just leave out the "overrides" (assuming we allow users to literally edit the yaml files)?

I think you can let the user decide to autoconnect or not with auto_connect_arg_parser argument from Task.init(). Argparse override should work fine.

When we enter the main function (the one decorated with hydra) the config should be complete. Although in my current application, I edit it inside the main() before calling task.connect_configuration().

@bmartinn
Copy link
Member

Hi @elinep
After a discussion with @omry, here are my thoughts:

  • Since hydra is using argparser for its overrides command line feature , as long as you import trains in the same python file you have your main app in, trains will log the argparser, and later will allow you to use the overrides in order to change the configuration. That's actually quite cool :)
  • The down side, it doesn't look very neat, in the web UI under Args you will have something like:
overrides   ['schema/db=postgresql', 'db=postgresql_prod', 'db.timeout=42']
  • Storing the OmegaConf must be one way only, i.e. readonly, and all changes made in the UI are ignored when executing via trains-agent since we want to keep the hydra dynamic configuration abilities
  • It will be cool to somehow store all the possible configuration on the Task itself, making sure users will always know what they can configure (a.k.a override) without the need to go into the code.

Points to think about

  • In order for Task.init to store the OmegaConf, it must be called inside the app, is this limitation acceptable ?
  • If hydra was imported and Task.init is called and we are not inside the app, do we want to output a warning?
  • Is there a way to store the full help on the Task?
  • Should we store all the configuration files (as read-only ?!) for reference, or is the OmegaConf object enough?

What do you think?

BTW:
Maybe after we have this scenario working, we could write a Trains launcher for hydra ?

@iakremnev
Copy link

I love this proposal. I would also like to add a suggestion from my use experience.
A neat feature of hydra is hierarchical configs. It is explained here. I want to be able to run my training scripts in the following fashion:

$ python train.py +optimizer=adam optimizer.lr=0.003

so that later I would be able to run experiment with trains-agent simply by selecting different optimizer type in trains-server GUI.
Is it possible to make such nested configuration tweaking in trains UI?

@omry
Copy link

omry commented Oct 22, 2020

Since hydra is using argparser for its overrides command line feature , as long as you import trains in the same python file you have your main app in, trains will log the argparser, and later will allow you to use the overrides in order to change the configuration. That's actually quite cool :)

That's an implementation detail, argparse is really nasty and I causes almost too much problems even with limited amount of usage Hydra is making with it.
I have no concrete plans to ditch it for now, but I might in the future.

Another thing that is worth considering is that argparse contains parameters for Hydra itself. some of which are not easy to translate. for example --config-dir points to a local directory that contains additional configs. --multirun completely changes the behavior of Hydra, and makes it run the function multiple times with different arguments etc.
I am not sure all of these correctly translates just by hooking into argparse.

@EgShes
Copy link

EgShes commented Oct 22, 2020

@elinep you can force hydra not to change working directory:

hydra:
    run:
        dir: .

@elinep
Copy link
Author

elinep commented Oct 23, 2020

@EgShes
thanks, this is the quick fix I needed.

@bmartinn
First here is another script to demonstrate how I currently use hydra and trains. Nothing new, I just put everything together.
You can find the corresponding experiment on trains demo server Here

  • Since hydra is using argparser for its overrides command line feature , as long as you import trains in the same python file you have your main app in, trains will log the argparser, and later will allow you to use the overrides in order to change the configuration. That's actually quite cool :)
  • The down side, it doesn't look very neat, in the web UI under Args you will have something like:
overrides   ['schema/db=postgresql', 'db=postgresql_prod', 'db.timeout=42']

Indeed, not very practical, that's why I ended up doing things like in the script I linked above.

  • Storing the OmegaConf must be one way only, i.e. readonly, and all changes made in the UI are ignored when executing via trains-agent since we want to keep the hydra dynamic configuration abilities

I'm not sure I'm following you on this one. Maybe it is a question of taste:

  • Omegaconf stored as read only and argparse_autoconnect enabled: Indeed the user can run any configuration from the UI by specifying new arguments. I think most of the time it will be cumbersome as we enter arguments blindly without hydra helping us to configure our run.
  • Omegaconf stored via task_connect (or equivalent) and argparse_autoconnect disabled or ignored as demonstrated in the script: The user can edit the configuration in the UI. For minor changes (ex: batch_size), I think it is easier as we have a global view of the whole config. For a totaly different configuration (ex: another model with different options) we are blind again and the user must figure out himself a valid config.

In my opinion, currently the most convenient way to run remotely a new experiment is to run it locally and use task.execute_remotely() to enqueue it.

  • It will be cool to somehow store all the possible configuration on the Task itself, making sure users will always know what they can configure (a.k.a override) without the need to go into the code.

Indeed that would be awesome! I guess it is a lot of work.

  • In order for Task.init to store the OmegaConf, it must be called inside the app, is this limitation acceptable ?

One app, one task, sounds right.

  • If hydra was imported and Task.init is called and we are not inside the app, do we want to output a warning?

yes probably.

  • Is there a way to store the full help on the Task?

You mean the custom app config help defined by the user ? If I'm correct there is not such things with hydra yet.

  • Should we store all the configuration files (as read-only ?!) for reference, or is the OmegaConf object enough?

I think the final OmegaConf object is enough but I might lack of experience with hydra.

Maybe after we have this scenario working, we could write a Trains launcher for hydra ?

Indeed, it could save the user writing some boilerplate code such as:

@dataclass
class AppConfig:
     trains_queue : str = ""

....

@hydra.main(config_name="config")
def main(config: AppConfig):
    task = Task.init(config.task_project,config.task_name)
    if config.trains_queue:
        task.execute_remotely(queue_name=config.trains_queue, exit_process=True)

@bmartinn
Copy link
Member

I think that we all agree that even though the argparse kind of already works, we can do better than that :)
I also think that it is beneficial to store the entire OmegaConf object for logging purposes, but using overrides is a lot more powerful when running remotely (with the OmegaConf giving some sort of context).

@elinep I love the new reference implementation script (the !warning! is a great touch), would connecting the config be enough? I have the feeling logging the entire OmegaConf would make more sense, no?

@omry as far as I understand there is no callback to register before the app is being launched, so my thinking is to monkey-patch the run_job.

Basically my suggestion is for trains to monkey-patch run_job.
This will allow us to get the list of overrides, and be able to change them when executed by trains-agent , it will also allow us to store the OmegaConf as readonly (not sure how to state it, other than in the section name itself)

The easiest way to implement it would be to make sure, you call the Task.init before the main hydra app, but as previously mentioned that is a reasonable assumption.

What do you guys think?

@omry
Copy link

omry commented Oct 24, 2020

@bmartinn, there is no callback YET.
Callback mechanism is planned, I think either for 1.1 or more likely - 1.2.

@omry
Copy link

omry commented Oct 24, 2020

Basically my suggestion is for trains to monkey-patch run_job.
This will allow us to get the list of overrides, and be able to change them when executed by trains-agent , it will also allow us to store the OmegaConf as readonly (not sure how to state it, other than in the section name itself)

I am going to close my eyes and pretend it didn't happen.
I think a better approach would be to look at integrating as a Launcher plugin.

@bmartinn
Copy link
Member

I am going to close my eyes and pretend it didn't happen.

😄

I think a better approach would be to look at integrating as a Launcher plugin.

Sure, I agree. Given these constrains, is that possible?

  • User running their hydra app as is (as regular python code)
  • Override & OmegaConf are logged (let's assume the Task.init logs them directly, and you can control the location of the Task.init call, for example it must be called inside the hydra app)
  • Given the location of the Task.init call is the same, and we are executed by the trains-agent, the Task.init call now has to somehow pass a new set of overrides to hydra, and log the newly generated OmegaConfa (the one as a by product of the new overrides)

@omry what do you think?

@omry
Copy link

omry commented Oct 24, 2020

User running their hydra app as is (as regular python code)

Yes.

Override & OmegaConf are logged (let's assume the Task.init logs them directly, and you can control the location of the
Task.init call, for example it must be called inside the hydra app)

It could also be called inside the Launcher.
At the moment, Launchers are used only in multirun (--multirun), but this something I am considering to change in the future.
So, there are two relevant modes:

python foo.py 

Launcher is not involved in Hydra 1.0. Since there are no callbacks yet users will have to call Task.init.

python foo.py -m [with optionally some sweeps]

Launcher is on the execution path before the user's function.
You have the opportunity to call init and submit the code for remote execution by the trains-agent.
Once you submit for execution by the trains agent, you are in control of manipulating and resubmitting if you want.

You can check existing Launcher plugins in Hydra repo for inspiration.
I recommend that you base your plugin on the example Launcher plugin though.

@bmartinn
Copy link
Member

bmartinn commented Oct 26, 2020

@elinep, @iakremnev, @EgShes, what do you think?

clearml-bot pushed a commit that referenced this issue Oct 30, 2020
@elinep
Copy link
Author

elinep commented Oct 30, 2020

Hi @bmartinn
I tried your commit (49b578b) on the demo script and it doesn't seem to work:

TRAINS Task: created new task id=7c908d1281e04fd1bcb0c780feedfe4a
2020-10-30 16:11:49,665 - trains.Task - WARNING - Failed auto-detecting task repository: Script file [PosixPath('/home/peline/Code/trains_issue_hydra_remote/outputs/2020-10-30/16-11-49/script_hydra.py'), PosixPath('/home/peline/Code/trains_issue_hydra_remote/outputs/2020-10-30/16-11-49/script_hydra.py')] could not be found

@bmartinn
Copy link
Member

Thank you for testing @elinep !
How do I reproduce this error?
(I tested the script you linked, inside a git repository and as standalone, neither produced any warning)

@elinep
Copy link
Author

elinep commented Oct 31, 2020

I did nothing special. The first time I ran the script I had no error but the script info on the web page results were blank.
But now I always get the path warning. I have no idea why, maybe the first time the script terminated to early for the message to be printed ?
When you ran the script, did you check the script information (script path, working directory) reported on the web page are correct ?

Here is how I tested the patch:

git clone https://github.com/elinep/trains_issue_hydra_remote.git
cd trains_issue_hydra_remote
python3.8 -m venv venv
source venv/bin/activate
pip install hydra-core==1.0.3
pip install git+https://github.com/allegroai/trains.git@49b578b979be762ef6523ddcccbe73d30d836889
python script_hydra.py

@bmartinn
Copy link
Member

bmartinn commented Nov 1, 2020

Thanks @elinep !
Took me sometime but I was able to reproduce, apparently the issue is running with hydra and using a relative path
(i.e. python script_hydra.py vs python ~/trains_issue_hydra_remote/script_hydra.py)

@elinep
Copy link
Author

elinep commented Nov 1, 2020

@bmartinn

I confirm it works now.

clearml-bot pushed a commit that referenced this issue Nov 7, 2020
@bmartinn
Copy link
Member

bmartinn commented Nov 7, 2020

Hi @elinep
An initial hydra configuration support was added, please feel free to test it :)

pip install git+https://github.com/allegroai/trains.git

Here is how it works.

  1. When Task.init(...) is called inside the Hydra app, it will output a warning, and store the entire OmegaConf as read-only section.
  2. When Task.init(...) is called before the Hydra app, it will store the overrides in a dedicated Hyper-Parameter section (named Hydra) and the full OmegaConf read-only configuration as well.

When executing remotely (only 2 supports overriding), you can add overrides in the Hyper-Parameter section, and they should work as expected. You can also set Hydra/_allow_omegaconf_edit_ in the UI to True, then you can fully edit the OmegConf section and it will override the entire OmegConf in runtime.

What do you think?

@elinep
Copy link
Author

elinep commented Nov 9, 2020

Hi @bmartinn,

I tried your commit. It is a good first step.

Here are my thoughts:

  1. Today I let the user set the task name and project name through the hydra config. For that purpose, I initialize the task inside the Hydra app (pattern 1).
# hydra config
@dataclass
class AppConfig:
    task_name: str = "a_task_name"
    task_project: str = "a_task_project"

@hydra.main(config_name="config")
def main(config: AppConfig):
    Task.init(config.task_project, config.task_name)

if __name__ == "__main__":
    main()

Unfortunately as you wrote it, this is incompatible with UI config edition and remote execution.
2. I'm not sure to understand why pattern 1 imposes read only configuration. Could you ellaborate ?
3. In pattern 2, setting Hydra/_allow_omegaconf_edit_ to True feels a bit hacky. Plus nothing prevents the user to edit the OmegaConf in the UI when it is set to False. It could be misleading. I assume trains-server needs to be updated to guide the user.

@bmartinn
Copy link
Member

bmartinn commented Nov 10, 2020

Hi @elinep ,
A few implementation details details :
When Task.init is called from within the Hydra application, it is too late for it to change the overrides (they were already processed and used inside the hydra infrastructure). This is why it is working in read-only mode. That said, we could definitely support pattern 2 (this is similar to argparse hooking, when the Task.init can be called after the argparse, but you still can override the parameters)

Regrading (3), I'm open to suggestions :)
The rational was to allow you to have full OmegaConf editing capabilities on the one hand, but on the other to realize that in most cases you will want to use the "overrides" mechanism. Somehow we need to pass the knowledge of whether we are overriding the OmegaConf or not to the runtime implementation, this is the goal of the Hydra/_allow_omegaconf_edit_ argument.
WDYT?

EDIT:
Regardless to the above, I think I'll start working on pattern 2, as you mentioned pattern 1 is limiting the usability, specifically, hydra configuration used in the Task.init call.

@bmartinn
Copy link
Member

Hi @elinep
Good news step 2 is implemented as well.
You can now have the Task.init call inside the hydra app 🚀
Fee free to test:

pip install git+https://github.com/allegroai/trains.git

@jkhenning
Copy link
Member

Hi @elinep,

Closing this as the support has been part of ClearML SDK for a long time. Please re-open if there are any issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants