You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the solution you would like:
Would like to enable configuration of a different layer dropout rate for each layer.
Describe the alternatives you have considered:
Currently, layer dropout is implemented in fairseq2 as a scalar probability for all layers (check here).
We can follow an implementation similar to this PR in torchtune to support linear, exponential, or step configurations for increasing dropout rate acorss layers.
Additional Context:
This will enable implementing:
Progressive Layer Dropping: that claims to increase accuracy and speed of training if dropout rate increases across layers linearly
LayerSkip: that claims to increase accuracy of early exit layers if dropout rate increaes linearly or exponentially across layers
The text was updated successfully, but these errors were encountered:
Describe the solution you would like:
Would like to enable configuration of a different layer dropout rate for each layer.
Describe the alternatives you have considered:
Currently, layer dropout is implemented in fairseq2 as a scalar probability for all layers (check here).
We can follow an implementation similar to this PR in torchtune to support linear, exponential, or step configurations for increasing dropout rate acorss layers.
Additional Context:
This will enable implementing:
The text was updated successfully, but these errors were encountered: