You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently we call rf.random, which always allocates a new tensor, and then we call Backend.set_parameter_initial_value which copies over the values to the param.
Instead, many eager-based frameworks have inplace ops for many things, including generating random values. That's how random init is usually done in PyTorch, e.g. by calling torch.nn.init.uniform_.
We should also do this for our param init when such ops are available.
This means we must change our ParamInit API. E.g. like adding out, like:
For reference on the RETURNN frontend (RF): #1120
Currently we call
rf.random
, which always allocates a new tensor, and then we callBackend.set_parameter_initial_value
which copies over the values to the param.Instead, many eager-based frameworks have inplace ops for many things, including generating random values. That's how random init is usually done in PyTorch, e.g. by calling
torch.nn.init.uniform_
.We should also do this for our param init when such ops are available.
This means we must change our
ParamInit
API. E.g. like addingout
, like:We also have to extend
rf.random
in a similar way, by addingout
.The text was updated successfully, but these errors were encountered: