You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am planning to contribute a series of FSMT models to the model hub. The models have been trained for a paper that is currently under review.
Before working on a PR I wanted to ask for some advice:
normalize_before
The new models have been trained with Fairseq's option normalize_before=True, while the existing FSMT implementation uses normalize_before=False. I understand that copy-pasting model code is preferred to extending the configuration. This would mean that a near-duplicate module fsmt_prenorm needs to be created. Is this correct?
Adequate base branch
The FSMT module is currently being refactored (#11218). Do you recommend that I start from the master branch or from the PR's feature branch, which is nearly completed?
The text was updated successfully, but these errors were encountered:
@patil-suraj I am still very motivated to work on the pull request :) Just let me know if you need more information to answer my question.
In case you're interested, the paper describing our models is now public (https://openreview.net/forum?id=RvO9DqoWI9V). I believe the models could be of value to others in the community.
🌟 New model addition
Model description
I am planning to contribute a series of FSMT models to the model hub. The models have been trained for a paper that is currently under review.
Before working on a PR I wanted to ask for some advice:
normalize_before
The new models have been trained with Fairseq's option
normalize_before=True
, while the existing FSMT implementation usesnormalize_before=False
. I understand that copy-pasting model code is preferred to extending the configuration. This would mean that a near-duplicate modulefsmt_prenorm
needs to be created. Is this correct?Adequate base branch
The FSMT module is currently being refactored (#11218). Do you recommend that I start from the master branch or from the PR's feature branch, which is nearly completed?
The text was updated successfully, but these errors were encountered: