-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[user application] which deepspeed flags are required if any #616
Comments
Dropping |
That's perfect. Thank you so much for clarifying this, @tjruwase - we will go with one cl arg then. |
Sorry, I was not quite done with my answer. On the issue of handling config file argument, how about the following? Also I think this might address your other question about mismatches of cl and json clones of batch size related args. The first observation to this proposal is that The second observation requires digging into some internals to see that deepspeed engine configuration is encapsulated in the DeepSpeedConfig object. Although DeepSpeedConfig is commonly instantiated with the json file from
With these two observations, it means that while HF trainer needs to provide a way for users to configure the deepspeed engine, it does not need to use either Apologies that these possibilities are not currently documented, but your use case is a motivation to prioritize these documentations. |
OK, as posted here: #610 (comment) ideally we want to be able to use the config file as a base config and then to be able to augment what's not there and perhaps override what's already defined there. So the ideal logic of configuration would be:
There are probably more nuances that I'm not aware about, so I'm only thinking about our immediate need. |
So reading your updated comment at #616 (comment) do you propose that:
I think we would prefer to stick to deepspeed convention of the config file since it'll make it easier for users who may be using your project elsewhere and we don't really want to create another huge set of cl args. Everything else is loud and clear. Thank you! |
Hey @tjruwase @stas00 I use the I'd appreciate it if we could have discussions on breaking changes and loss of functionality in differing infrastructural setups as a result of flag removals. IMO, breaking changes to |
@g-karthik, I think you may have misunderstood the intention/scope of this thread. I was only asking for an advice of which flags would be best to use in HuggingFace I don't think there are any plans on removing support for any existing flags in the DeepSpeed project itself. Looking at the title of this Issue, I can see how you could have interpreted this - I have rephrased it so I hope now it won't raise any alarms for others. |
@g-karthik Thanks for eliciting this very important clarification regarding the discussion. Just to confirm, there are no breaking changes. |
In the context of integrating DeepSpeed into transformers, I have a question to you wrt the deepspeed cl args
The paradigm you propose is:
but as we were discussing elsewhere there are too many cl args in ML apps, and we were considering if it might make sense to collapse
-deepspeed
and--deepspeed_config
into a single cl arg for the transformers trainer and then re-construct it into 2 cl args before it goes intodeepspeed.initialize
.context: huggingface/transformers#9211 (comment)
Do you have a strong feeling that we should keep your proposed convention of 2 cl args facing users, or do you feel that it's fine to collapse the two? I may be just unaware of something important so I wanted to run this idea by you.
And secondly, do you by chance have any brilliant ideas on how to name a one-to-rule-them-all cl arg so that it reads nice and functionally unambiguous to users? I think
--deepspeed_config_file ds_config.js
isn't clear enough since it doesn't say deepspeed is activated, but I could be wrong.One of devs also suggests that
--deepspeed_config
doesn't make it obvious that a file argument is expected.I wonder if
--deepspeed ds_config.js
would do the trick - it's actually less obvious that it expects a file argument, but it's unambiguous about it activating deepspeed.I totally understand that you may not have a strong opinion or want to spend any time on this since it's just our peculiar desire for succinctness and clarity, but if you do have suggestions I'm all ears.
Thank you.
The text was updated successfully, but these errors were encountered: