-
Notifications
You must be signed in to change notification settings - Fork 73
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[WeeklyReport] lishuai-97 2024.02.25~2024.03.08 (#144)
Co-authored-by: Sonder <[email protected]>
- Loading branch information
1 parent
f26b413
commit 5442dfd
Showing
1 changed file
with
30 additions
and
0 deletions.
There are no files selected for viewing
30 changes: 30 additions & 0 deletions
30
WeeklyReports/Hackathon_6th/24_lishuai-97/[WeeklyReports]2024.02.25~2024.03.08.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,30 @@ | ||
### 姓名 | ||
李帅 | ||
|
||
### 实习项目 | ||
大模型训练稳定性和高效低价小模型快速收敛 | ||
|
||
### 本周工作 | ||
|
||
1. **XXX梯度裁剪算法** | ||
|
||
* 阅读梯度裁剪算法相关论文,了解梯度裁剪背景以及Gradient Clipping、Adaptive Gradient Clipping、LAMB、Clippy等经典梯度裁剪算法的原理和实现方式。 | ||
* 熟悉XXX梯度裁剪算法的原理和实现方式,并基于AdamW优化器对XXX梯度裁剪算法的element-wise以及gloabl level进行了实现。 | ||
|
||
|
||
2. **大模型训练稳定性探索** | ||
|
||
* 阅读大模型训练稳定性相关论文,从优化器、模型结构、尺度大小等角度对大模型训练的不稳定性进行调研。 | ||
* 基于Megatron-LM对345M参数量的开源模型GPT-2进行训练,尝试先在小规模上进行训练,观察训练过程中的loss spike现象。 | ||
|
||
3. **问题疑惑与解答** 无 | ||
|
||
|
||
### 下周工作 | ||
|
||
1. 继续完善在小规模开源模型上loss spike的复现,探索解决方案并进行验证。 | ||
2. 在toy expample上验证改进的优化算法,针对结果进行分析和总结。 | ||
3. 阅读符号优化器相关论文,为改进的优化算法提供理论支持。 | ||
|
||
### 导师点评 | ||
李帅已经熟悉近期的梯度裁剪、更新裁剪、优化器相关的论文,并可以开始着手先复现最新论文中的结论,进度符合预期。 |