-
Notifications
You must be signed in to change notification settings - Fork 27.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Modify efficient GPU training doc with now-available adamw_bnb_8bit optimizer #25807
Modify efficient GPU training doc with now-available adamw_bnb_8bit optimizer #25807
Conversation
cc @younesbelkada for BNB related stuff 🙏 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good, thanks, I left one question!
cc @SunMarc as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I left some suggestions to make it more concise!
Co-authored-by: Steven Liu <[email protected]>
Thanks @stevhliu for the suggestions! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot!
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding!
…ptimizer (huggingface#25807) * Modify single-GPU efficient training doc with now-available adamw_bnb_8bit optimizer * Apply suggestions from code review Co-authored-by: Steven Liu <[email protected]> --------- Co-authored-by: Steven Liu <[email protected]>
…ptimizer (huggingface#25807) * Modify single-GPU efficient training doc with now-available adamw_bnb_8bit optimizer * Apply suggestions from code review Co-authored-by: Steven Liu <[email protected]> --------- Co-authored-by: Steven Liu <[email protected]>
…ptimizer (huggingface#25807) * Modify single-GPU efficient training doc with now-available adamw_bnb_8bit optimizer * Apply suggestions from code review Co-authored-by: Steven Liu <[email protected]> --------- Co-authored-by: Steven Liu <[email protected]>
What does this PR do?
The documentation for efficient single-GPU training previously mentioned that the
adamw_bnb_8bit
optimizer could only be integrated using a third-party implementation. However, this is now available in Trainer directly as a result of this issue and corresponding PR.I think it's valuable to keep the 8-bit Adam entry in the documentation as it's a significant improvement over Adafactor. And I also think it's valuable to keep the sample integration with a third-party implementation of an optimizer for reference purposes. I have adjusted the documentation accordingly.
I was able to validate myself that both approaches, using Trainer directly with the
optim
flag and doing the third-party integration still appear to work when fine-tuning small LLMs on a single GPU.Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
@stevhliu and @MKhalusova