improve gradient_clip doc #4959

betterpig · 2022-06-23T12:35:29Z

improve gradient_clip doc

paddle-bot-old · 2022-06-23T12:35:32Z

感谢你贡献飞桨文档，文档预览构建中，Docs-New 跑完后即可预览，预览链接：http://preview-pr-4959.paddle-docs-preview.paddlepaddle.org.cn/documentation/docs/zh/api/index_cn.html
预览工具的更多说明，请参考：[Beta]飞桨文档预览工具

paddle-bot-old · 2022-06-23T12:35:35Z

✅ This PR's description meets the template requirements!
Please wait for other CI results.

zhwesky2010 · 2022-06-24T07:12:08Z

docs/guides/advanced/gradient_clip_cn.rst

+      for t in range(100):
+          idx = np.random.choice(total_data, batch_size, replace=False)
+          x = paddle.to_tensor(x_data[idx,:])
+          y = paddle.to_tensor(y_data[idx,:])


y叫label，y_pred叫pred吧

zhwesky2010 · 2022-06-24T07:20:06Z

docs/guides/advanced/gradient_clip_cn.rst

+
+  train()
+
+未开启梯度裁剪时的部分日志如下，可以看到在loss和梯度都在逐渐增大，在第4步就已经达到正无穷大，变为nan。


这里可以再说详细点，在这个网络设计下，是哪些参数的梯度计算出来很大，以及为什么越来越大的原因

TCChenlong · 2022-07-01T03:03:34Z

docs/guides/advanced/gradient_clip_cn.rst

+  loss = \frac{1}{n} \sum_{i=1}^n(y_i-y_i')^2
+
+- 在得到损失后，进行一次反向传播，调整权重和偏差。为了更新网络参数，首先要计算损失函数对于参数的梯度，然后使用某种梯度更新算法，
+执行一步梯度下降，以减小损失函数值。如下式，其中alpha为学习率。


这里格式不对

TCChenlong · 2022-07-01T03:05:01Z

docs/guides/advanced/gradient_clip_cn.rst

-一、设定范围值裁剪
--------------------
+.. math::
+  O^k = f(W O^{k-1} + b)


O要解释一下

TCChenlong · 2022-07-01T03:05:31Z

docs/guides/advanced/gradient_clip_cn.rst

+在计算出网络的估计值后，使用类似均方误差的方法，计算由目标值与估计值的差距定义的损失函数。
+
+.. math::
+  loss = \frac{1}{n} \sum_{i=1}^n(y_i-y_i')^2


yi, yi' 也要解释一下

TCChenlong · 2022-07-01T03:06:04Z

docs/guides/advanced/gradient_clip_cn.rst

+.. math::
+  loss = \frac{1}{n} \sum_{i=1}^n(y_i-y_i')^2
+
+- 在得到损失后，进行一次反向传播，调整权重和偏差。为了更新网络参数，首先要计算损失函数对于参数的梯度，然后使用某种梯度更新算法，


梯度的概念最好解释一下？以及对应公式中的哪个部分

已加上对应的部分。梯度最为深度学习最基础的概念，我感觉是需要用户自己掌握的~

TCChenlong · 2022-07-01T03:27:04Z

docs/guides/advanced/gradient_clip_cn.rst

+- 模型参数和损失值变为NaN
+
+如果发生了 "梯度爆炸"，在网络学习过程中会直接跳过最优解，所以有必要进行梯度裁剪，防止网络在学习过程中越过最优解。Paddle提供了三种梯度裁剪方式：设置范围值裁剪、通过L2范数裁剪、通过全局L2范数裁剪。设置范围值
+裁剪方法简单，但是很难确定一个合适的阈值。通过L2范数裁剪和通过全局L2范数裁剪方法，都是用阈值限制梯度向量的L2范数，前者只对特定梯度进行裁剪，后者会对优化器的所有梯度进行裁剪。


这里预览多个空格

TCChenlong · 2022-07-01T03:28:15Z

docs/guides/advanced/gradient_clip_cn.rst


 设定范围值裁剪：将参数的梯度限定在一个范围内，如果超出这个范围，则进行裁剪。

 使用方式：需要创建一个 :ref:`paddle.nn.ClipGradByValue <cn_api_fluid_clip_ClipGradByValue>` 类的实例，然后传入到优化器中，优化器会在更新参数前，对梯度进行裁剪。

-**1. 全部参数裁剪（默认）**
+- **全部参数裁剪（默认）**

 默认情况下，会裁剪优化器中全部参数的梯度：


说明一下裁剪的范围？

TCChenlong · 2022-07-01T03:30:27Z

docs/guides/advanced/gradient_clip_cn.rst

-二、通过L2范数裁剪
--------------------
+2. 通过L2范数裁剪
+###################

 通过L2范数裁剪：梯度作为一个多维Tensor，计算其L2范数，如果超过最大值则按比例进行裁剪，否则不裁剪。


解释一下 X & clip_norm?

TCChenlong · 2022-07-01T03:31:54Z

docs/guides/advanced/gradient_clip_cn.rst

@@ -114,7 +146,7 @@ Paddle提供了三种梯度裁剪方式：

 其中 :math:`norm（X）` 代表 :math:`X` 的L2范数


解释一下 X、global_norm、clip_norm？

TCChenlong · 2022-07-01T03:33:01Z

docs/guides/advanced/gradient_clip_cn.rst


 部分参数裁剪的设置方式与上面一致，也是通过设置参数的 :ref:`paddle.ParamAttr <cn_api_fluid_ParamAttr>` ，其中的 ``need_clip`` 默认为True，表示需要裁剪，如果设置为False，则不会裁剪。可参考上面的示例代码进行设置。
+
+由上面的介绍可以知道，设置范围值裁剪可能会改变梯度向量的方向。例如，阈值为1.0，原梯度向量为[0.8, 89.0]，裁剪后的梯度向量变为[0,8, 1.0]，方向发生了很大的改变。而对于通过L2范数裁剪的两种方式，阈值为1.0，则裁剪后的梯度向量


阈值为1.0，则裁剪后的梯度向量为[] ？
为空么？

已补充数据

TCChenlong · 2022-07-01T03:34:27Z

docs/guides/advanced/gradient_clip_cn.rst

+由上面的介绍可以知道，设置范围值裁剪可能会改变梯度向量的方向。例如，阈值为1.0，原梯度向量为[0.8, 89.0]，裁剪后的梯度向量变为[0,8, 1.0]，方向发生了很大的改变。而对于通过L2范数裁剪的两种方式，阈值为1.0，则裁剪后的梯度向量
+为[]。能够保证原梯度向量的方向，但是由于分量2的值较大，导致分量1的值变得接近0。在实际的训练过程中，如果遇到梯度爆炸情况，可以试着用不同的裁剪方式对比在验证集上的效果。
+
+三、 实例


这一节格式有问题，请处理一下。

TCChenlong

LGTM

zkming9 · 2022-08-15T03:21:59Z

docs/guides/advanced/gradient_clip_cn.rst

+二、Paddle梯度裁剪使用方法
+---------------------------
+
+1. 设定范围值裁剪


改成2.1 设定范围值裁剪

zkming9 · 2022-08-15T03:22:50Z

docs/guides/advanced/gradient_clip_cn.rst

@@ -38,8 +69,8 @@ Paddle提供了三种梯度裁剪方式：

    linear = paddle.nn.Linear(10, 10，bias_attr=paddle.ParamAttr(need_clip=False))

-二、通过L2范数裁剪
--------------------
+2. 通过L2范数裁剪


改成2.2 通过L2范数裁剪

zkming9 · 2022-08-15T03:23:37Z

docs/guides/advanced/gradient_clip_cn.rst

-
-三、通过全局L2范数裁剪
--------------------
+3. 通过全局L2范数裁剪


改成2.3 通过全局L2范数裁剪

improve gradient_clip doc

18f9717

betterpig force-pushed the clip branch from 068cb63 to 18f9717 Compare June 23, 2022 12:37

zhwesky2010 reviewed Jun 24, 2022

View reviewed changes

modify according to zhouwei's comment

0a2122d

zhwesky2010 approved these changes Jun 29, 2022

View reviewed changes

zhwesky2010 self-requested a review June 29, 2022 06:44

betterpig closed this Jun 30, 2022

betterpig reopened this Jun 30, 2022

modify according to zhangkeming's comment

bc77b8b

betterpig force-pushed the clip branch from 38becd6 to bc77b8b Compare June 30, 2022 02:27

betterpig closed this Jun 30, 2022

betterpig reopened this Jun 30, 2022

PaddlePaddle locked and limited conversation to collaborators Jun 30, 2022

PaddlePaddle unlocked this conversation Jun 30, 2022

TCChenlong reviewed Jul 1, 2022

View reviewed changes

modify according to chenlong's comment

08ac513

TCChenlong approved these changes Jul 5, 2022

View reviewed changes

TCChenlong merged commit 8b5d5b4 into PaddlePaddle:develop Jul 5, 2022

zkming9 reviewed Aug 15, 2022

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

improve gradient_clip doc #4959

improve gradient_clip doc #4959

betterpig commented Jun 23, 2022 •

edited

Loading

paddle-bot-old bot commented Jun 23, 2022

paddle-bot-old bot commented Jun 23, 2022 •

edited

Loading

zhwesky2010 Jun 24, 2022

betterpig Jun 27, 2022

zhwesky2010 Jun 24, 2022

betterpig Jun 27, 2022

TCChenlong Jul 1, 2022

betterpig Jul 1, 2022

TCChenlong Jul 1, 2022

betterpig Jul 1, 2022

TCChenlong Jul 1, 2022

betterpig Jul 1, 2022

TCChenlong Jul 1, 2022

betterpig Jul 1, 2022

TCChenlong Jul 1, 2022

betterpig Jul 1, 2022

TCChenlong Jul 1, 2022

betterpig Jul 1, 2022

TCChenlong Jul 1, 2022

betterpig Jul 1, 2022

TCChenlong Jul 1, 2022

betterpig Jul 1, 2022

TCChenlong Jul 1, 2022

betterpig Jul 1, 2022

TCChenlong Jul 1, 2022

betterpig Jul 1, 2022

TCChenlong left a comment

zkming9 Aug 15, 2022

betterpig Aug 15, 2022

zkming9 Aug 15, 2022

betterpig Aug 15, 2022

zkming9 Aug 15, 2022

betterpig Aug 15, 2022


		train()

		未开启梯度裁剪时的部分日志如下，可以看到在loss和梯度都在逐渐增大，在第4步就已经达到正无穷大，变为nan。

		@@ -114,7 +146,7 @@ Paddle提供了三种梯度裁剪方式：

		其中 :math:`norm（X）` 代表 :math:`X` 的L2范数


		部分参数裁剪的设置方式与上面一致，也是通过设置参数的 :ref:`paddle.ParamAttr <cn_api_fluid_ParamAttr>` ，其中的 ``need_clip`` 默认为True，表示需要裁剪，如果设置为False，则不会裁剪。可参考上面的示例代码进行设置。

		由上面的介绍可以知道，设置范围值裁剪可能会改变梯度向量的方向。例如，阈值为1.0，原梯度向量为[0.8, 89.0]，裁剪后的梯度向量变为[0,8, 1.0]，方向发生了很大的改变。而对于通过L2范数裁剪的两种方式，阈值为1.0，则裁剪后的梯度向量

improve gradient_clip doc #4959

improve gradient_clip doc #4959

Conversation

betterpig commented Jun 23, 2022 • edited Loading

paddle-bot-old bot commented Jun 23, 2022

paddle-bot-old bot commented Jun 23, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

TCChenlong left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

betterpig commented Jun 23, 2022 •

edited

Loading

paddle-bot-old bot commented Jun 23, 2022 •

edited

Loading