Gradient accumulation #52

SamDuffield · 2024-04-16T14:41:35Z

In some settings you might want to run multiple steps before doing an update (aka gradient accumulation), e.g. to reduce the variance of the stochastic gradient without increasing memory requirements.

Perhaps this could implemented with a unified API but we'd need to think carefully about it.
One option might be to change the API from
update(state: TensorTree, batch: TensorTree)
to
update(state: TensorTree, batch: TensorTree | AccumulateBatch[TensorTree])

where AccumulateBatch is just a NamedTuple so we can differentiate it from a and then e.g.

if not isinstance(batch, AccumulateBatch):
      batch = accumulate(batch) # convert batch to an AccumulateBatch of length 1

vals = []
aux = []
for b in batch:
     v, a = log_posterior(state.params, b)
     vals.append(v)
     aux.append(aux)

Some discussion on gradient accumulation here:

https://discuss.pytorch.org/t/why-do-we-need-to-set-the-gradients-manually-to-zero-in-pytorch/4903/19?u=alband

https://wandb.ai/wandb_fc/tips/reports/How-To-Implement-Gradient-Accumulation-in-PyTorch--VmlldzoyMjMwOTk5

The text was updated successfully, but these errors were encountered:

SamDuffield added the enhancement New feature or request (beyond just a new method) label Apr 16, 2024

SamDuffield mentioned this issue Apr 23, 2024

Add deterministic VI methods #76

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Gradient accumulation #52

Gradient accumulation #52

SamDuffield commented Apr 16, 2024 •

edited

Loading

Gradient accumulation #52

Gradient accumulation #52

Comments

SamDuffield commented Apr 16, 2024 • edited Loading

SamDuffield commented Apr 16, 2024 •

edited

Loading