-
Notifications
You must be signed in to change notification settings - Fork 6.8k
Conversation
I merged master, so conflict is resolved. This operator is similar to #16715 but instead of updating a single Tensor, it updates multiple-tensors simultaneously. Thus, it expose more parallelism. |
Thanks for the clarification. Is it necessary to expose it as a separate |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ping @MoisesHer Is the MultiLAMB optimizer a generalization of LAMB optimizer? If so, why do we keep the LAMB optimizer? If not, please add documentation or a reference to the docstring. Thank you!
Sorry, since we were having different numbers of states I was not sure how to reuse the previous optimizer. After Haibin suggestion, now we have same numbers of states. |
could you update the code based on #17002 ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general looks good to me! Two more minor comments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure why CI is pending all the time. Could you sync with mxnet master and trigger the CI again?
Thank you @MoisesHer for addressing all the review comments |
Multi-tensor LAMB Optimizer (in development / debugging)
Checklist
Essentials
Please feel free to remove inapplicable items for your PR.
Changes