preload_from_files for PT engine #1292

vieting · 2023-03-28T14:03:18Z

As discussed in #1120, preload_from_files or something equivalent should be added for the PT engine as well. This is certainly not complete, but could be helpful as a starting point. It works as a proof-of-concept to load a wav2vec 2.0 checkpoint.

What do you think in general?

returnn/torch/engine.py

albertz · 2023-03-30T07:40:19Z

We also should make the order, or actually the preference for var loading consistent to TF. In TF, we first go through preload_from_files, and those entries sorted by key name, and then afterwards to the normal loading. However, every variable we will load, we will mark (see set_as_custom_init), and this makes sure that any subsequent loading will not load it anymore. So it means the values from where it is loaded first, they stay.

Now, you don't have any such logic in PT. This effectively means, some will get loaded multiple times, and also, the values from where it is loaded last, they stay. So the opposite order. I think then you need to do the normal loading first, and then you need to iterate over reversed(sorted(preload_from_files.items())).

I'm not sure if I'm maybe missing sth else here.

vieting · 2023-03-30T08:00:40Z

I reversed the order for the keys in preload_from_files. The normal loading from existing checkpoints for epoch > 1 should stay first, so it will be overwritten by preload_from_files in case that we are not in training, right?

returnn/torch/engine.py

albertz

Looks ok to me now, despite my last comment. I hope I did not miss anything. As said, I think consistency to the TF logic is important.

@patrick-wilken or @JackTemaki or someone should also review.

returnn/torch/engine.py

patrick-wilken · 2023-03-30T16:16:55Z

By the way, what defines the parameter names in the frontend? That's what rf.Module.named_parameters() / rf.Module.named_parameters() seem to do, right? And it uses the attribute names of the modules. So to add the prefix we are talking about here you would rename the Module attribute? 😕

albertz · 2023-03-30T17:14:25Z

So to add the prefix we are talking about here you would rename the Module attribute?

No, you would simply put it into a submodule. Where you probably have the model anyway. E.g. if you have trained a LM, using the module TransformerLm, now you would create TransformerLm as a submodule into your main model, like:

class Model(nn.Module):
  def __init__(self):
    super().__init__()
    ...
    self.ext_lm = TransformerLm(...)

In that example, the prefix is simply ext_lm..

returnn/torch/engine.py

This comment was marked as resolved.

Sign in to view

vieting marked this pull request as ready for review March 29, 2023 16:08

vieting requested review from a team and albertz as code owners March 29, 2023 16:08