[LayerSkip] Per-Layer Dropout Rate Configuration #640

mostafaelhoushi · 2024-07-08T14:55:00Z

Describe the solution you would like:
Would like to enable configuration of a different layer dropout rate for each layer.

Describe the alternatives you have considered:
Currently, layer dropout is implemented in fairseq2 as a scalar probability for all layers (check here).
We can follow an implementation similar to this PR in torchtune to support linear, exponential, or step configurations for increasing dropout rate acorss layers.

Additional Context:
This will enable implementing:

Progressive Layer Dropping: that claims to increase accuracy and speed of training if dropout rate increases across layers linearly
LayerSkip: that claims to increase accuracy of early exit layers if dropout rate increaes linearly or exponentially across layers

mostafaelhoushi added the enhancement New feature or request label Jul 8, 2024

mostafaelhoushi changed the title ~~Per-Layer Dropout Rate Configuration [LayerSkip]~~ [LayerSkip] Per-Layer Dropout Rate Configuration Jul 8, 2024

mostafaelhoushi mentioned this issue Jul 9, 2024

Variable Layer Dropout YJYJLee/fairseq2#1

Draft

7 tasks

mostafaelhoushi mentioned this issue Sep 4, 2024

[WIP] Implement LayerSkip YJYJLee/fairseq2#2

Draft

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[LayerSkip] Per-Layer Dropout Rate Configuration #640

[LayerSkip] Per-Layer Dropout Rate Configuration #640

mostafaelhoushi commented Jul 8, 2024

[LayerSkip] Per-Layer Dropout Rate Configuration #640

[LayerSkip] Per-Layer Dropout Rate Configuration #640

Comments

mostafaelhoushi commented Jul 8, 2024