-
Notifications
You must be signed in to change notification settings - Fork 73
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[WeeklyReport] lishuai-97 2024.03.09~2024.03.22 (#163)
Co-authored-by: Guoxia Wang <[email protected]>
- Loading branch information
1 parent
87b3cbd
commit 5bbca64
Showing
1 changed file
with
38 additions
and
0 deletions.
There are no files selected for viewing
38 changes: 38 additions & 0 deletions
38
WeeklyReports/Hackathon_6th/24_lishuai-97/[WeeklyReports]2024.03.09~2024.03.22.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,38 @@ | ||
### 姓名 | ||
李帅 | ||
|
||
### 实习项目 | ||
大模型训练稳定性和高效低价小模型快速收敛 | ||
|
||
### 本周工作 | ||
|
||
1. **分布式训练框架学习** | ||
|
||
* 学习Megatron-LM以及Megatron-DeepSpeed的相关论文,了解其分布式训练、显存优化等相关技术原理。 | ||
* 熟悉Megatron-LM以及Megatron-DeepSpeed训练框架代码结构,掌握其优化器、分布式训练、梯度裁剪等相关代码实现。 | ||
|
||
|
||
2. **XXX梯度裁剪算法** | ||
|
||
* XXX梯度裁剪算法global level以及tensor-wise的代码实现,并将其集成到Megatron-LM中。 | ||
* 在开源模型以及公开数据集上对XXX梯度裁剪算法的tensor-wise以及gloabl level策略进行了初步的实验验证。 | ||
|
||
|
||
3. **大模型训练稳定性探索** | ||
|
||
* 阅读了大模型训练稳定性相关论文,从优化器、模型结构、尺度大小等角度对大模型训练的不稳定性进行调研。 | ||
* 基于Megatron-LM对345M参数量的开源模型GPT-2进行训练,在小规模上数据集上复现了loss spike现象。 | ||
* 基于Megatron-LM在345M参数量的开源模型GPT-2对竞品提出的优化策略进行了初步复现,观察其所提出策略的效果。 | ||
|
||
|
||
4. **问题疑惑与解答** 无 | ||
|
||
|
||
### 下周工作 | ||
|
||
1. 进一步完善在小规模开源模型上对竞品策略的复现,与我们的方法进行对齐,比较策略效果优劣。 | ||
2. 继续完善XXX梯度裁剪算法的实验验证,针对结果进行全面分析和总结。 | ||
3. 继续阅读梯度裁剪、大模型稳定性训练以及优化器的相关论文,为我们的优化算法提供理论支持。 | ||
|
||
|
||
### 导师点评 |