[Mixed Precision] Apply Gradient Clipping on Mixed Precision #2746

DonghakPark · 2024-10-07T05:44:56Z

Currently, mixed precision training is implemented in NNTrainer, but gradient clipping considering loss scale has not been implemented yet.

In Torch's example, it is implemented as follows, and there is a need to implement this in NNTrainer too.

scaler = GradScaler()

for epoch in epochs:
    for input, target in data:
        optimizer.zero_grad()
        with autocast(device_type='cuda', dtype=torch.float16):
            output = model(input)
            loss = loss_fn(output, target)
        scaler.scale(loss).backward()

        # Unscales the gradients of optimizer's assigned params in-place
        scaler.unscale_(optimizer)

        # Since the gradients of optimizer's assigned params are unscaled, clips as usual:
        torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm)

        # optimizer's gradients are already unscaled, so scaler.step does not unscale them,
        # although it still skips optimizer.step() if the gradients contain infs or NaNs.
        scaler.step(optimizer)

        # Updates the scale for next iteration.
        scaler.update()

taos-ci · 2024-10-07T05:44:58Z

cibot: Thank you for posting issue #2746. The person in charge will reply soon.

DonghakPark · 2024-10-08T07:26:28Z

Training Sequence

Make an FP16 copy of the weights :
Forward propagate using FP16 weights and activations
Multiply the Resulting loss by the scale factor
Backward propagate using FP16 weights, activations, and gradients
Multiply the weight gradients by 1/sacle_factor
Option process (gradient clipping, weight decay)
Update the master copy of weights in FP32

DonghakPark mentioned this issue Oct 8, 2024

[Wait for #2663][Mixed Precision] Fix gradient clipping logic #2749

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Mixed Precision] Apply Gradient Clipping on Mixed Precision #2746

[Mixed Precision] Apply Gradient Clipping on Mixed Precision #2746

DonghakPark commented Oct 7, 2024

taos-ci commented Oct 7, 2024

DonghakPark commented Oct 8, 2024

[Mixed Precision] Apply Gradient Clipping on Mixed Precision #2746

[Mixed Precision] Apply Gradient Clipping on Mixed Precision #2746

Comments

DonghakPark commented Oct 7, 2024

taos-ci commented Oct 7, 2024

DonghakPark commented Oct 8, 2024