Skip to content
This repository has been archived by the owner on Jul 1, 2024. It is now read-only.

Optimizer state sharding - Not landing as-is, early feedback #584

Closed
wants to merge 1 commit into from
Closed

Optimizer state sharding - Not landing as-is, early feedback #584

wants to merge 1 commit into from

Conversation

blefaudeux
Copy link
Contributor

Summary:
Bringing in fairscale to provide an optional state sharded optimizer in Classy, which should help in situations bounded by memory pressure.
No new communication backend, this is using vanilla torch.distributed

Differential Revision: D22518768

@facebook-github-bot facebook-github-bot added CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. fb-exported labels Jul 31, 2020
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D22518768

8 similar comments
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D22518768

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D22518768

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D22518768

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D22518768

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D22518768

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D22518768

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D22518768

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D22518768

@blefaudeux
Copy link
Contributor Author

blefaudeux commented Aug 20, 2020

needs facebookresearch/fairscale#46

Summary:
Pull Request resolved: #584

Bringing in fairscale to provide an optional state sharded optimizer in Classy, which should help in situations bounded by memory pressure.
No new communication backend, this is using vanilla torch.distributed.

See ZeRO for more context https://www.microsoft.com/en-us/research/blog/zero-2-deepspeed-shattering-barriers-of-deep-learning-speed-scale/

KNOWN TODOs:
[-] huge memory discrepancy in between the two runs
[-] huge speed discrepancy
-> these probably come from the many small broadcasts, will be improved on the fairscale side, not related to this diff (T71319397 and facebookresearch/fairscale#42)

[x] final accuracy in the same ballpark but very different behaviours, could be some settings not properly passed down, an issue with LARC, or the parameter scheduling
-> this was due to the LR not properly adjusted, fixed since

[x] sync with min-xu-ai to use a proper gradient dispatch in the end, not landing anything before that
-> done by min-xu-ai on the fairscale side, needs benchmarking, but should not be related to this diff (no interface consequence hopefully)

Differential Revision: D22518768

fbshipit-source-id: ea79e3561580e21030123dca299f5c935ee971f4
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D22518768

@facebook-github-bot
Copy link
Contributor

This pull request has been merged in ac2993d.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. fb-exported Merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants