Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

improve gradient_clip doc #4959

Merged
merged 4 commits into from
Jul 5, 2022
Merged

improve gradient_clip doc #4959

merged 4 commits into from
Jul 5, 2022

Conversation

betterpig
Copy link
Contributor

@betterpig betterpig commented Jun 23, 2022

improve gradient_clip doc

@paddle-bot-old
Copy link

感谢你贡献飞桨文档,文档预览构建中,Docs-New 跑完后即可预览,预览链接:http://preview-pr-4959.paddle-docs-preview.paddlepaddle.org.cn/documentation/docs/zh/api/index_cn.html
预览工具的更多说明,请参考:[Beta]飞桨文档预览工具

@paddle-bot-old
Copy link

paddle-bot-old bot commented Jun 23, 2022

✅ This PR's description meets the template requirements!
Please wait for other CI results.

for t in range(100):
idx = np.random.choice(total_data, batch_size, replace=False)
x = paddle.to_tensor(x_data[idx,:])
y = paddle.to_tensor(y_data[idx,:])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

y叫label,y_pred叫pred吧

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


train()

未开启梯度裁剪时的部分日志如下,可以看到在loss和梯度都在逐渐增大,在第4步就已经达到正无穷大,变为nan。
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里可以再说详细点,在这个网络设计下,是哪些参数的梯度计算出来很大,以及为什么越来越大的原因

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@zhwesky2010 zhwesky2010 self-requested a review June 29, 2022 06:44
@betterpig betterpig closed this Jun 30, 2022
@betterpig betterpig reopened this Jun 30, 2022
@betterpig betterpig closed this Jun 30, 2022
@betterpig betterpig reopened this Jun 30, 2022
@PaddlePaddle PaddlePaddle locked and limited conversation to collaborators Jun 30, 2022
@PaddlePaddle PaddlePaddle unlocked this conversation Jun 30, 2022
loss = \frac{1}{n} \sum_{i=1}^n(y_i-y_i')^2

- 在得到损失后,进行一次反向传播,调整权重和偏差。为了更新网络参数,首先要计算损失函数对于参数的梯度,然后使用某种梯度更新算法,
执行一步梯度下降,以减小损失函数值。如下式,其中alpha为学习率。
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里格式不对

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

一、设定范围值裁剪
--------------------
.. math::
O^k = f(W O^{k-1} + b)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

O要解释一下

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

在计算出网络的估计值后,使用类似均方误差的方法,计算由目标值与估计值的差距定义的损失函数。

.. math::
loss = \frac{1}{n} \sum_{i=1}^n(y_i-y_i')^2
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yi, yi' 也要解释一下

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

.. math::
loss = \frac{1}{n} \sum_{i=1}^n(y_i-y_i')^2

- 在得到损失后,进行一次反向传播,调整权重和偏差。为了更新网络参数,首先要计算损失函数对于参数的梯度,然后使用某种梯度更新算法,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

梯度的概念最好解释一下?以及对应公式中的哪个部分

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已加上对应的部分。梯度最为深度学习最基础的概念,我感觉是需要用户自己掌握的~

- 模型参数和损失值变为NaN

如果发生了 "梯度爆炸",在网络学习过程中会直接跳过最优解,所以有必要进行梯度裁剪,防止网络在学习过程中越过最优解。Paddle提供了三种梯度裁剪方式:设置范围值裁剪、通过L2范数裁剪、通过全局L2范数裁剪。设置范围值
裁剪方法简单,但是很难确定一个合适的阈值。通过L2范数裁剪和通过全局L2范数裁剪方法,都是用阈值限制梯度向量的L2范数,前者只对特定梯度进行裁剪,后者会对优化器的所有梯度进行裁剪。
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里预览多个空格

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


设定范围值裁剪:将参数的梯度限定在一个范围内,如果超出这个范围,则进行裁剪。

使用方式:需要创建一个 :ref:`paddle.nn.ClipGradByValue <cn_api_fluid_clip_ClipGradByValue>` 类的实例,然后传入到优化器中,优化器会在更新参数前,对梯度进行裁剪。

**1. 全部参数裁剪(默认)**
- **全部参数裁剪(默认)**

默认情况下,会裁剪优化器中全部参数的梯度:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

说明一下裁剪的范围?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

二、通过L2范数裁剪
--------------------
2. 通过L2范数裁剪
###################

通过L2范数裁剪:梯度作为一个多维Tensor,计算其L2范数,如果超过最大值则按比例进行裁剪,否则不裁剪。
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

解释一下 X & clip_norm?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -114,7 +146,7 @@ Paddle提供了三种梯度裁剪方式:

其中 :math:`norm(X)` 代表 :math:`X` 的L2范数
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

解释一下 X、global_norm、clip_norm?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


部分参数裁剪的设置方式与上面一致,也是通过设置参数的 :ref:`paddle.ParamAttr <cn_api_fluid_ParamAttr>` ,其中的 ``need_clip`` 默认为True,表示需要裁剪,如果设置为False,则不会裁剪。可参考上面的示例代码进行设置。

由上面的介绍可以知道,设置范围值裁剪可能会改变梯度向量的方向。例如,阈值为1.0,原梯度向量为[0.8, 89.0],裁剪后的梯度向量变为[0,8, 1.0],方向发生了很大的改变。而对于通过L2范数裁剪的两种方式,阈值为1.0,则裁剪后的梯度向量
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

阈值为1.0,则裁剪后的梯度向量为[] ?
为空么?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已补充数据

由上面的介绍可以知道,设置范围值裁剪可能会改变梯度向量的方向。例如,阈值为1.0,原梯度向量为[0.8, 89.0],裁剪后的梯度向量变为[0,8, 1.0],方向发生了很大的改变。而对于通过L2范数裁剪的两种方式,阈值为1.0,则裁剪后的梯度向量
为[]。能够保证原梯度向量的方向,但是由于分量2的值较大,导致分量1的值变得接近0。在实际的训练过程中,如果遇到梯度爆炸情况,可以试着用不同的裁剪方式对比在验证集上的效果。

三、 实例
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这一节格式有问题,请处理一下。

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Collaborator

@TCChenlong TCChenlong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@TCChenlong TCChenlong merged commit 8b5d5b4 into PaddlePaddle:develop Jul 5, 2022
二、Paddle梯度裁剪使用方法
---------------------------

1. 设定范围值裁剪
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

改成2.1 设定范围值裁剪

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -38,8 +69,8 @@ Paddle提供了三种梯度裁剪方式:

linear = paddle.nn.Linear(10, 10,bias_attr=paddle.ParamAttr(need_clip=False))

二、通过L2范数裁剪
--------------------
2. 通过L2范数裁剪
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

改成2.2 通过L2范数裁剪

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


三、通过全局L2范数裁剪
--------------------
3. 通过全局L2范数裁剪
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

改成2.3 通过全局L2范数裁剪

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants