Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[user application] which deepspeed flags are required if any #616

Closed
stas00 opened this issue Dec 23, 2020 · 8 comments
Closed

[user application] which deepspeed flags are required if any #616

stas00 opened this issue Dec 23, 2020 · 8 comments

Comments

@stas00
Copy link
Collaborator

stas00 commented Dec 23, 2020

In the context of integrating DeepSpeed into transformers, I have a question to you wrt the deepspeed cl args

The paradigm you propose is:

deepspeed myprog myargs --deepspeed --deepspeed_config ds_config.js

but as we were discussing elsewhere there are too many cl args in ML apps, and we were considering if it might make sense to collapse -deepspeed and --deepspeed_config into a single cl arg for the transformers trainer and then re-construct it into 2 cl args before it goes into deepspeed.initialize.

context: huggingface/transformers#9211 (comment)

Do you have a strong feeling that we should keep your proposed convention of 2 cl args facing users, or do you feel that it's fine to collapse the two? I may be just unaware of something important so I wanted to run this idea by you.

And secondly, do you by chance have any brilliant ideas on how to name a one-to-rule-them-all cl arg so that it reads nice and functionally unambiguous to users? I think --deepspeed_config_file ds_config.js isn't clear enough since it doesn't say deepspeed is activated, but I could be wrong.

One of devs also suggests that --deepspeed_config doesn't make it obvious that a file argument is expected.

I wonder if --deepspeed ds_config.js would do the trick - it's actually less obvious that it expects a file argument, but it's unambiguous about it activating deepspeed.

I totally understand that you may not have a strong opinion or want to spend any time on this since it's just our peculiar desire for succinctness and clarity, but if you do have suggestions I'm all ears.

Thank you.

@tjruwase
Copy link
Contributor

tjruwase commented Dec 23, 2020

Dropping --deepspeed is not a problem for DeepSpeed at all. As you can see here, we added this flag to save users, who want to add a deepspeed code path to their existing code, the burden of writing the associated parsing logic. It appears our good "intentions" has created a fair amount of confusion. So to clarify, --deepspeed flag is not a requirement for DeepSpeed engine. Please let me know if does not work.

@stas00
Copy link
Collaborator Author

stas00 commented Dec 23, 2020

That's perfect. Thank you so much for clarifying this, @tjruwase - we will go with one cl arg then.

@stas00 stas00 closed this as completed Dec 23, 2020
@tjruwase
Copy link
Contributor

tjruwase commented Dec 23, 2020

Sorry, I was not quite done with my answer. On the issue of handling config file argument, how about the following? Also I think this might address your other question about mismatches of cl and json clones of batch size related args.

The first observation to this proposal is that --deepspeed_config is also not a requirement to DeepSpeed! Strange as it may sound, it is the reality. Also, it is possible that a few asserts might get fired with this approach because it is not well tested.

The second observation requires digging into some internals to see that deepspeed engine configuration is encapsulated in the DeepSpeedConfig object. Although DeepSpeedConfig is commonly instantiated with the json file from --deepspeed_config, it can also be instantiated with an appropriately constructed python dict via the arg named param_dict. When DeepSpeedConfig is instantied with param_dict, the json_file argument is ignored as seen here. Client codes can instantiate DeepSpeedConfig with a dict by passing it via the config_params arg of deepspeed.initialize(). For concreteness, below is a snippet of instantiating DeepSpeedConfig object with a dict and without a json config file.

>>> from deepspeed.runtime.config import DeepSpeedConfig
>>> d={'train_batch_size':1, 'gradient_accumulation_steps':1}
>>> ds_config=DeepSpeedConfig(json_file=None, mpu=None, param_dict=d)
>>> ds_config.print('test')
[2020-12-23 20:25:41,062] [INFO] [config.py:644:print] test:
[2020-12-23 20:25:41,062] [INFO] [config.py:648:print]   activation_checkpointing_config  <deepspeed.runtime.activation_checkpointing.config.DeepSpeedActivationCheckpointingConfig object at 0x7f915af04c18>
[2020-12-23 20:25:41,062] [INFO] [config.py:648:print]   allreduce_always_fp32 ........ False
[2020-12-23 20:25:41,062] [INFO] [config.py:648:print]   amp_enabled .................. False

With these two observations, it means that while HF trainer needs to provide a way for users to configure the deepspeed engine, it does not need to use either --deepspeed or --deepspeed_config. Rather HF trainer can freely choose a most convenient method for users to specify deepspeed configuration, and then convert those specifications into an appropriately construction python dict to pass through deepspeed.initialize(). In terms of appropriateness, the dict should simply mirror the regular config json structure. An advantage of this approach is the HF trainer can configure the batch size related parameters (train_batch_size, gradient_accumulation_steps, train_micro_batch_size_per_gpu) of the deepspeed engine as it wants e.g., based on the cl arg values, to avoid mismatch issues.

Apologies that these possibilities are not currently documented, but your use case is a motivation to prioritize these documentations.

@tjruwase tjruwase reopened this Dec 23, 2020
@stas00
Copy link
Collaborator Author

stas00 commented Dec 23, 2020

OK, as posted here: #610 (comment) ideally we want to be able to use the config file as a base config and then to be able to augment what's not there and perhaps override what's already defined there.

So the ideal logic of configuration would be:

  1. load config file if any defined in args.deepspeed_config
  2. use config_params if any to override/set parts of what was in config file if any (so here we could easily override or set the batch size arguments dynamically)

There are probably more nuances that I'm not aware about, so I'm only thinking about our immediate need.

@stas00
Copy link
Collaborator Author

stas00 commented Dec 24, 2020

So reading your updated comment at #616 (comment) do you propose that:

  1. HF trainer

    1. loads the json config file on its own,
    2. converts it to dict
    3. and then adjusts what it needs to adjust/add
    4. pass it as the config_params dict
  2. or would deepspeed support what I proposed in [user application] which deepspeed flags are required if any #616 (comment) and then HF trainer can provide both the config file and the config_params dict to override/augment configuration?

I think we would prefer to stick to deepspeed convention of the config file since it'll make it easier for users who may be using your project elsewhere and we don't really want to create another huge set of cl args.

Everything else is loud and clear. Thank you!

@g-karthik
Copy link

g-karthik commented Dec 24, 2020

Hey @tjruwase @stas00 I use the --deepspeed, --deepspeed_mpi and --deepspeed_config flags when launching my multi-node train jobs with mpirun in a Kubernetes-based infrastructural setup. I do not use the deepspeed launcher. Seeing as you're referring to the removal of the --deepspeed flag, I presume you're also considering removing the --deepspeed_mpi flag.

I'd appreciate it if we could have discussions on breaking changes and loss of functionality in differing infrastructural setups as a result of flag removals. IMO, breaking changes to deepspeed for simplicity in transformers is okay until it leads to loss of functionality for some users.

@stas00
Copy link
Collaborator Author

stas00 commented Dec 24, 2020

@g-karthik, I think you may have misunderstood the intention/scope of this thread.

I was only asking for an advice of which flags would be best to use in HuggingFace transformers which will soon have DeepSpeed integrated.

I don't think there are any plans on removing support for any existing flags in the DeepSpeed project itself.

Looking at the title of this Issue, I can see how you could have interpreted this - I have rephrased it so I hope now it won't raise any alarms for others.

@stas00 stas00 changed the title collapsing --deepspeed and --deepspeed_config into a single cl arg (in the host) [user application] which deepspeed flags are required if any Dec 24, 2020
@tjruwase
Copy link
Contributor

@g-karthik Thanks for eliciting this very important clarification regarding the discussion. Just to confirm, there are no breaking changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants