LARC clipping+documentation #6

raulpuric · 2018-05-31T22:31:12Z

Proper implementation of LARC clipping
Documentation of LARC class
Modification of FP16_Optimizer to absorb optimizer instance that's being wrapped instead of creating new optimizer instance of same class.

* Proper implementation of LARC clipping * Documentation of LARC class * Modification of FP16_Optimizer to absorb optimizer instance that's being wrapped instead of creating new optimizer instance of same class.

brettkoonce · 2018-07-03T19:31:32Z

apex/parallel/LARC.py

+        optimizer: Pytorch optimizer to wrap and modify learning rate for.
+        trust_coefficient: Trust coefficient for calculating the lr. See https://arxiv.org/abs/1708.03888
+        clip: Decides between clipping or scaling mode of LARC. If `clip=True` the learning rate is set to `min(optimizer_lr, local_lr)` for each parameter. If `clip=False` the learning rate is set to `local_lr*optimizer_lr`.
+        eps: epsilon kludge to help with numerical stability while calculating adaotive_lr


minor sp: adaotive_lr --> adaptive_lr

brettkoonce · 2018-07-03T19:31:55Z

apex/parallel/LARC.py

@@ -4,11 +4,45 @@
 from torch.nn.parameter import Parameter

 class LARC(object):
-    def __init__(self, optimizer, trust_coefficient=0.02, epsilon=1e-8):
+    """
+    :class:`LARC` is a pytorch implementation of both the scaling and clipping varients of LARC,


minor sp: varients --> variants

mcarilli · 2018-07-03T20:17:40Z

My rationale for creating a new instance of the passed optimizer's class within FP16_Optimizer's constructor was that if the passed optimizer had been used earlier, it might have created some momentum or other ancillary buffers in FP16. I would have to trace through the optimizer's param_groups and cast all these ancillary buffers to FP32 as well. This is doable (it's similar to what torch.Optimizer.load_state_dict does) but seemed more brittle than FP32-ifying the param_groups then using them to create a fresh optimizer instance.

Your proposed change (using the passed optimizer directly) does impose the additional restriction that the passed optimizer has not been used beforehand/does not contain any ancillary buffers (aside from its owned parameters) that might be FP16. All my documentation and examples work this way, although the requirement is not yet stated explicitly, so I suppose it's fine to accept.

LARC clipping+documentation

29563bc

* Proper implementation of LARC clipping * Documentation of LARC class * Modification of FP16_Optimizer to absorb optimizer instance that's being wrapped instead of creating new optimizer instance of same class.

brettkoonce reviewed Jul 3, 2018

View reviewed changes

mcarilli merged commit 88effd5 into NVIDIA:master Jul 3, 2018

Solacex mentioned this pull request Jan 13, 2019

RuntimeError: cuda runtime error (74) : misaligned address at /pytorch/aten/src/THC/THCTensorCopy.cu:84 #124

Open

adrienchaton mentioned this pull request Jun 27, 2019

RuntimeError and speed loss with opt_level = O1, O2 or O3 #373

Open

cizhenshi mentioned this pull request Aug 14, 2019

nms error Expected object of scalar type Half but got scalar type Float for sequence elment 1 in sequence argument at position #1 'tensors' #430

Closed

chengmengli06 mentioned this pull request Nov 12, 2019

apex hangs on cudaFree #599

Open

lcskrishna added a commit to lcskrishna/apex that referenced this pull request May 13, 2020

Enable support for sparse tensors for multi_tensor_apply (NVIDIA#6)

02a5274

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LARC clipping+documentation #6

LARC clipping+documentation #6

raulpuric commented May 31, 2018

brettkoonce Jul 3, 2018 •

edited

Loading

brettkoonce Jul 3, 2018 •

edited

Loading

mcarilli commented Jul 3, 2018

LARC clipping+documentation #6

LARC clipping+documentation #6

Conversation

raulpuric commented May 31, 2018

brettkoonce Jul 3, 2018 • edited Loading

Choose a reason for hiding this comment

brettkoonce Jul 3, 2018 • edited Loading

Choose a reason for hiding this comment

mcarilli commented Jul 3, 2018

brettkoonce Jul 3, 2018 •

edited

Loading

brettkoonce Jul 3, 2018 •

edited

Loading