-
Notifications
You must be signed in to change notification settings - Fork 130
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MaskedComputationLayer
is violating the principle that the user should not need to think about rec automatic optimization
#769
Comments
@Zettelkasten @mmz33 @robin-p-schmitt maybe you have some ideas, or comments? |
Note that I already realized this as a problem some time ago, but now again when thinking about specifying the output spatial dim ( When not being optimized (e.g. simply
|
The new dim T' would somehow have a reference to T. It's a bit like when e.g. Now we can see the unmasking as a sparse-to-dense operation. However, this is still ambiguous. In sparse-to-dense, we would set 0 for all other frames. But this is rarely what we would want here. Specifically, when reproducing the output of a So, when we want to have automatic unmasking of T' to T, when such tensors are combined somewhere, e.g. via |
We could use the This could later be extended to be able to represent sparse data via The mask itself would be a Or as mentioned, instead of a mask, we could also store the indices, i.e. |
See RETURNN principles.
When defining
MaskedComputationLayer
inside a rec loop and not thinking about rec automatic optimization, the behavior is clear. It copies the prev output (and prev state) whenever the mask is False, otherwise it uses the current output.In this line of thought, there should be no need for
UnmaskLayer
.However, the current situation is, when it is optimized out, a tensor of shape [B,D] inside would not just become [T,B,D] where T is the number of rec iterations, but instead it becomes [T',B,D] where T' = sum(mask). And some follow-up operation at some point requires the [T,B,D] shape (maybe because it combines it with other tensors from the rec loop), and thus the user explicitly must use the
UnmaskLayer
at some point after theMaskedComputationLayer
.This violates the RETURNN principle that the user should not need to think about rec automatic optimization.
What are possible solutions?
I don't really have a good idea so far. It should be opaque.
Maybe the new dim tag for T' could have some special flag that it is a masked subset of T, and whenever it is going to be combined with the other dim tag, it would automatically do the unmasking.
The text was updated successfully, but these errors were encountered: