Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Fix zero stage2 cpu_offload when some model trainable parameters skip…
…ped in training (#861) * Fix zero stage2 cpu_offload when some model trainable parameters skipped in training, as in #707 As some model trainable parameters skipped in training, their backward hooks in self.create_reduce_and_remove_grad_hooks() will not run, so they have no norm_for_param_grads * Trim space * Trim space Co-authored-by: Olatunji Ruwase <[email protected]>
- Loading branch information