Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

Multi-tensor LAMB #16893

Merged
merged 29 commits into from
Jan 16, 2020
Merged

Multi-tensor LAMB #16893

merged 29 commits into from
Jan 16, 2020

Conversation

MoisesHer
Copy link
Contributor

@MoisesHer MoisesHer commented Nov 23, 2019

Multi-tensor LAMB Optimizer (in development / debugging)

Checklist

Essentials

Please feel free to remove inapplicable items for your PR.

  • Changes are complete
  • All changes have test coverage:
  • A test of the multi-LAMB optimizer is added in tests/python/unittest/test_optimizer:test_multilamb
  • Code is well-documented:
  • To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

Changes

  • Added Mxnet operator (CPU & GPU): _multi_lamb_update (and mix precision version _multi_mp_lamb_update). Given weights and gradients of multiple tensors, it updates all of them in parallel
  • Python LAMB optimizer was updated to launch the multi-tensor version if MXNET_OPTIMIZER_AGGREGATION_SIZE > 1

@MoisesHer MoisesHer changed the title Add multi-tensor lamb Op Multi-tensor LAMB Nov 23, 2019
@leezu
Copy link
Contributor

leezu commented Nov 27, 2019

What's the relation with #16715? #16715 recently added 'LAMB' optimizer to python/mxnet/optimizer/optimizer.py. Your code is currently in conflict. Please resolve the conflict by merging or rebasing on master.

@MoisesHer
Copy link
Contributor Author

MoisesHer commented Nov 28, 2019

What's the relation with #16715? #16715 recently added 'LAMB' optimizer to python/mxnet/optimizer/optimizer.py. Your code is currently in conflict. Please resolve the conflict by merging or rebasing on master.

I merged master, so conflict is resolved. This operator is similar to #16715 but instead of updating a single Tensor, it updates multiple-tensors simultaneously. Thus, it expose more parallelism.

@leezu
Copy link
Contributor

leezu commented Nov 28, 2019

Thanks for the clarification. Is it necessary to expose it as a separate multiLamb optimizer? Can it be integrated with #16715? Why do we need the less parallel implementation? Or are there other differences? Sorry, I didn't read through the code yet.

src/operator/contrib/multi_lamb-inl.h Outdated Show resolved Hide resolved
src/operator/contrib/multi_lamb.cu Outdated Show resolved Hide resolved
python/mxnet/ndarray/contrib.py Outdated Show resolved Hide resolved
python/mxnet/optimizer/optimizer.py Outdated Show resolved Hide resolved
src/operator/contrib/multi_lamb-inl.h Outdated Show resolved Hide resolved
python/mxnet/optimizer/optimizer.py Outdated Show resolved Hide resolved
src/operator/contrib/multi_lamb-inl.h Outdated Show resolved Hide resolved
Copy link
Contributor

@leezu leezu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ping @MoisesHer Is the MultiLAMB optimizer a generalization of LAMB optimizer? If so, why do we keep the LAMB optimizer? If not, please add documentation or a reference to the docstring. Thank you!

@MoisesHer
Copy link
Contributor Author

Ping @MoisesHer Is the MultiLAMB optimizer a generalization of LAMB optimizer? If so, why do we keep the LAMB optimizer? If not, please add documentation or a reference to the docstring. Thank you!

Sorry, since we were having different numbers of states I was not sure how to reuse the previous optimizer. After Haibin suggestion, now we have same numbers of states.
I have merged both optimizers. If aggregation number is <=1 or gradients/weights is a unique NDArray, the single-tensor implementation is used, otherwise, the multi-tensor implementation is used.

@eric-haibin-lin
Copy link
Member

could you update the code based on #17002 ?

Copy link
Member

@eric-haibin-lin eric-haibin-lin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general looks good to me! Two more minor comments

python/mxnet/optimizer/optimizer.py Outdated Show resolved Hide resolved
src/operator/contrib/multi_lamb.cc Outdated Show resolved Hide resolved
Copy link
Member

@eric-haibin-lin eric-haibin-lin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure why CI is pending all the time. Could you sync with mxnet master and trigger the CI again?

@eric-haibin-lin eric-haibin-lin merged commit 6b9a1da into apache:master Jan 16, 2020
@eric-haibin-lin
Copy link
Member

Thank you @MoisesHer for addressing all the review comments

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants