-
Notifications
You must be signed in to change notification settings - Fork 130
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
preload_from_files for PT engine #1292
Conversation
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
We also should make the order, or actually the preference for var loading consistent to TF. In TF, we first go through Now, you don't have any such logic in PT. This effectively means, some will get loaded multiple times, and also, the values from where it is loaded last, they stay. So the opposite order. I think then you need to do the normal loading first, and then you need to iterate over I'm not sure if I'm maybe missing sth else here. |
I reversed the order for the keys in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks ok to me now, despite my last comment. I hope I did not miss anything. As said, I think consistency to the TF logic is important.
@patrick-wilken or @JackTemaki or someone should also review.
By the way, what defines the parameter names in the frontend? That's what |
No, you would simply put it into a submodule. Where you probably have the model anyway. E.g. if you have trained a LM, using the module class Model(nn.Module):
def __init__(self):
super().__init__()
...
self.ext_lm = TransformerLm(...) In that example, the prefix is simply |
As discussed in #1120,
preload_from_files
or something equivalent should be added for the PT engine as well. This is certainly not complete, but could be helpful as a starting point. It works as a proof-of-concept to load a wav2vec 2.0 checkpoint.What do you think in general?