You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We want to be able to perform some generic things on parameters, such as weight norm, wight dropout or L2 loss (see #59) in a unified and straightforward way.
When we have some modules where the parameters are hidden inside the RETURNN layer (e.g. Linear), any such logic could be quite counter-intuitive, complicated and potentially even buggy. I expect that when we can directly see all parameters in the returnn-code, that this should become much easier (see e.g. the code behind torch.nn.utils.weight_norm, which is quite simple, but would be tricky if parameters are hidden in RETURNN layers).
There are actually not much such modules:
Linear
Conv
TransposedConv
BatchNorm
RelativePositionalEncoding
We also need to have a functional variant of the RecLayer (rwth-i6/returnn#817).
That's all. And they are all very simple to be reimplemented using pure functional modules, e.g. dot etc.
Specifically:
Linear: Use dot
Conv: Use the functional variant of ConvLayer
TransposedConv: Use the functional variant of TransposedConvLayer
BatchNorm: reimplement, maybe even more efficient by more directly wrapping fused TF ops
So then the only module which really is a tf.Variable is the Variable module (or maybe rename to Parameter, to be more consistent to PyTorch). We can also easily implement functions like parameters() and named_parameters() for modules, and then follow very similar simple logic for things like weight norm etc as in PyTorch.
The text was updated successfully, but these errors were encountered:
We want to be able to perform some generic things on parameters, such as weight norm, wight dropout or L2 loss (see #59) in a unified and straightforward way.
When we have some modules where the parameters are hidden inside the RETURNN layer (e.g.
Linear
), any such logic could be quite counter-intuitive, complicated and potentially even buggy. I expect that when we can directly see all parameters in the returnn-code, that this should become much easier (see e.g. the code behindtorch.nn.utils.weight_norm
, which is quite simple, but would be tricky if parameters are hidden in RETURNN layers).There are actually not much such modules:
Linear
Conv
TransposedConv
BatchNorm
RelativePositionalEncoding
We also need to have a functional variant of the
RecLayer
(rwth-i6/returnn#817).That's all. And they are all very simple to be reimplemented using pure functional modules, e.g.
dot
etc.Specifically:
Linear
: Usedot
Conv
: Use the functional variant ofConvLayer
TransposedConv
: Use the functional variant ofTransposedConvLayer
BatchNorm
: reimplement, maybe even more efficient by more directly wrapping fused TF opsRelativePositionalEncoding
: anyway reimplement, see discussion in Transformer Modules #55So then the only module which really is a
tf.Variable
is theVariable
module (or maybe rename toParameter
, to be more consistent to PyTorch). We can also easily implement functions likeparameters()
andnamed_parameters()
for modules, and then follow very similar simple logic for things like weight norm etc as in PyTorch.The text was updated successfully, but these errors were encountered: