Enable the combination of sequence length warmup and RoPE #285

conglongli · 2023-11-09T20:58:54Z

Enable the combination of sequence length warmup (SLW, one of the curriculum learning techniques released by the DeepSpeed team: https://www.deepspeed.ai/tutorials/curriculum-learning/ and https://www.deepspeed.ai/tutorials/data-efficiency/) and RoPE (rotary positional embedding).

Verified the effectiveness of the sequence length warmup when RoPE is used together:

Blue line: baseline with RoPE and without SLW. GPT-3 1.3B pretraining with 30B tokens. Used learning rate = 1.0e-3 (5x of OpenAI's original recipe) and batch size = 2048 (4x of OpenAI's original recipe) to simulate an unstable training.
Red line: with RoPE and with SLW (cl_min = 64, cl_step = 3000). GPT-3 1.3B pretraining with 30B tokens. Same learning rate and batch size as blue line. Compared to blue line, avoided bigger loss spike and achieved better validation loss at the end of training.

Enable the combination of sequence length warmup and RoPE

ff450bd

conglongli requested review from jeffra, tjruwase, ShadenSmith, awan-10, eltonzheng, minjiaz, RezaYazdaniAminabadi, duli2012, mrwyattii, arashb, xiaoxiawu-microsoft and GuanhuaWang as code owners November 9, 2023 20:58

conglongli requested review from tohtana and removed request for arashb, ShadenSmith, jeffra, duli2012, tjruwase, awan-10, GuanhuaWang, mrwyattii, eltonzheng, minjiaz, RezaYazdaniAminabadi and xiaoxiawu-microsoft November 9, 2023 20:59

tohtana approved these changes Nov 9, 2023

View reviewed changes

conglongli merged commit 4c0bc7f into main Nov 9, 2023
1 check passed

conglongli deleted the slw_for_rope branch November 9, 2023 21:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable the combination of sequence length warmup and RoPE #285

Enable the combination of sequence length warmup and RoPE #285

conglongli commented Nov 9, 2023 •

edited

Loading

Enable the combination of sequence length warmup and RoPE #285

Enable the combination of sequence length warmup and RoPE #285

Conversation

conglongli commented Nov 9, 2023 • edited Loading

conglongli commented Nov 9, 2023 •

edited

Loading