Updating memory optimization overview #1916

SalmanMohammadi · 2024-10-29T13:27:06Z

Context

What is the purpose of this PR? Is it to

add a new feature
fix a bug
update tests and/or documentation
other (please add here)

Updating the memory optimization tutorial with CPUOffloadingOptimizer,

                                      .,'
                                   .'`.'
                                  .' .'
                      _.ood0Pp._ ,'  `.~ .q?00doo._
                  .od00Pd0000Pdb._. . _:db?000b?000bo.
                .?000Pd0000PP?000PdbMb?000P??000b?0000b.
              .d0000Pd0000P'  `?0Pd000b?0'  `?000b?0000b.
             .d0000Pd0000?'     `?d000b?'     `?00b?0000b.
             d00000Pd0000Pd0000Pd00000b?00000b?0000b?0000b
             ?00000b?0000b?0000b?b    dd00000Pd0000Pd0000P
             `?0000b?0000b?0000b?0b  dPd00000Pd0000Pd000P'
              `?0000b?0000b?0000b?0bd0Pd0000Pd0000Pd000P'
                `?000b?00bo.   `?P'  `?P'   .od0Pd000P'
                  `~?00b?000bo._  .db.  _.od000Pd0P~'
                      `~?0b?0b?000b?0Pd0Pd000PdP~'

pytorch-bot · 2024-10-29T13:27:10Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchtune/1916

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 8cafa6f with merge base 1bbd749 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

joecummings

I am the King of All Nits, bow before me

docs/source/tutorials/memory_optimizations.rst

joecummings · 2024-10-29T13:52:02Z

docs/source/tutorials/memory_optimizations.rst

+
+* The CPU optimizer step is often the bottleneck when optimizer CPU offload is used. To minimize the slowdown, it is recommended to (1) use full ``bf16`` training so that parameters, gradients, and optimizer states are in ``bf16``; and (2) give GPU more work per optimizer step (e.g. larger batch size with activation checkpointing, gradient accumulation).
+* Gradient accumulation should always be set to 1 when ``offload_gradients=True``, as gradients are cleared on GPU every backward pass.
+* This optimizer works by keeping a copy of parameters and pre-allocating gradient memory on CPU. Therefore, expect your RAM usage to increase by 4x model size.


Is it exactly 4x model size? Or roughly?

According to the ao docs yeah

To minimize the amount of CPU<->GPU data transfer, we keep a copy of parameters and pre-allocate gradients memory on CPU. Therefore, expect your RAM usage to increase by 2x model size + optimizer state (which is 2x model size for Adam).

and since we always use adam it is 4x

SalmanMohammadi added 3 commits October 9, 2024 00:30

init

cef956b

adding 2 docs

408e49c

conflictos

b1fa388

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Oct 29, 2024

SalmanMohammadi changed the title ~~Upadting~~ Updating memory optimization overview Oct 29, 2024

SalmanMohammadi added 3 commits October 29, 2024 13:29

linteroonie

77f734c

conflicteroonies

10cdc48

conflicteroonies

0002eb7

joecummings approved these changes Oct 29, 2024

View reviewed changes

nitaroonie

8cafa6f

SalmanMohammadi merged commit 0cc1b1f into pytorch:main Oct 29, 2024
16 checks passed

SalmanMohammadi deleted the cpu_offload_docs branch October 29, 2024 14:29

SalmanMohammadi mentioned this pull request Oct 30, 2024

v0.4.0 release tracker #1747

Closed

34 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Updating memory optimization overview #1916

Updating memory optimization overview #1916

SalmanMohammadi commented Oct 29, 2024

pytorch-bot bot commented Oct 29, 2024 •

edited

Loading

joecummings left a comment

joecummings Oct 29, 2024

SalmanMohammadi Oct 29, 2024

Updating memory optimization overview #1916

Updating memory optimization overview #1916

Conversation

SalmanMohammadi commented Oct 29, 2024

Context

pytorch-bot bot commented Oct 29, 2024 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchtune/1916

✅ No Failures

joecummings left a comment

Choose a reason for hiding this comment

joecummings Oct 29, 2024

Choose a reason for hiding this comment

SalmanMohammadi Oct 29, 2024

Choose a reason for hiding this comment

pytorch-bot bot commented Oct 29, 2024 •

edited

Loading