-
Notifications
You must be signed in to change notification settings - Fork 130
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to mix RF code with PT code #1287
Comments
There was also the suggestion by @braddockcg to automatically wrap PT code/modules (see #1120 (comment), #1120 (comment)) in a way that automatically adjusts the signatures in some way to convert between our |
I think you mixed up the two cases, in the For
I think With that you can use the Frontend to define torch Modules that you can then arbitrarily extend with custom PyTorch code. The other direction doesn't sound so useful to me. Well, it would be needed to support torch Models if
Or as an alternative:
|
In "Example using PT module inside a RF module", the submodule is a PT module, so it gets a normal
You mean using a RF module inside a PT module? Why is this not relevant? Maybe it's less often used, but for example, when you are testing some existing external PT modules for some experiments, and now just want to switch out their self-attention implementation by ours with our custom rel pos encoding implementation, or sth like that, then you can do that. In any case, it's basically just the same logic, just reverse. It should be quite straightforward. Maybe a small question is what dim tags to use for the |
One other set of functions on modules are weight-normalization ( Looking at the PT setattr(module, self.name, self.compute_weight(module)) So, our converted PT module, could it support additional The PT code is anyway a bit problematic as this We could also implement our We could also say, this is just not supported. But then it would be good if we can detect such usages and not just silently ignore it? Similar to weight-norm is also weight-dropout, or basically any transformations on the weights. Maybe custom Python descriptors can be useful? |
Makes mixing RF code with pure PT code easier. #1287
I pushed now a simple initial implementation. I also relaxed |
For reference, see #1120, #1120 (comment).
RF =
import returnn.frontend as rf
.RF code = Code which uses functions/classes from
rf
(rf.Module
,rf.matmul
etc).PT code = pure PyTorch code, just using
torch
.It is of high priority that mixing pure PT code with RF code is easy. In both ways, e.g. when having some pure PT code/module, it should be simple to embed some RF code/module in it, and vice versa, i.e. when having some RF code/module, it should be simple to embed some PT code/module in it.
I distinguish a bit between just code (function calls) and modules (
rf.Module
ortorch.nn.Module
).I think just function calls are probably already straight-forward. RF functions get
Tensor
andDim
as arguments and returnTensor
and maybeDim
again. You get the rawtorch.Tensor
by accessingraw_tensor
. You can also easily create aTensor
andDim
on-the-fly. So both ways should be simple.Example using PT inside RF code:
Example using RF inside PT code:
For modules, it is a bit unclear.
rf.Module
is similar totorch.nn.Module
, but they don't share any common base class.rf.Parameter
also is different fromtorch.nn.Parameter
. We maybe could have some automaticrf_module_to_pt_module
and vice versa?Example using PT module inside a RF module:
Example using RF module inside a PT module:
The text was updated successfully, but these errors were encountered: