`MaskedComputationLayer` is violating the principle that the user should not need to think about rec automatic optimization #769

albertz · 2021-11-25T15:01:02Z

When defining MaskedComputationLayer inside a rec loop and not thinking about rec automatic optimization, the behavior is clear. It copies the prev output (and prev state) whenever the mask is False, otherwise it uses the current output.

In this line of thought, there should be no need for UnmaskLayer.

However, the current situation is, when it is optimized out, a tensor of shape [B,D] inside would not just become [T,B,D] where T is the number of rec iterations, but instead it becomes [T',B,D] where T' = sum(mask). And some follow-up operation at some point requires the [T,B,D] shape (maybe because it combines it with other tensors from the rec loop), and thus the user explicitly must use the UnmaskLayer at some point after the MaskedComputationLayer.

This violates the RETURNN principle that the user should not need to think about rec automatic optimization.

What are possible solutions?

I don't really have a good idea so far. It should be opaque.

Maybe the new dim tag for T' could have some special flag that it is a masked subset of T, and whenever it is going to be combined with the other dim tag, it would automatically do the unmasking.

The text was updated successfully, but these errors were encountered:

albertz · 2021-11-25T15:01:58Z

@Zettelkasten @mmz33 @robin-p-schmitt maybe you have some ideas, or comments?

albertz · 2021-11-25T15:24:52Z

Note that I already realized this as a problem some time ago, but now again when thinking about specifying the output spatial dim (out_spatial_dim #597) and the exact behavior.

When not being optimized (e.g. simply optimize_move_layers_out=False) and then accumulating the output of MaskedComputationLayer, you would get shape [T,B,D], and never actually the shape [T',B,D].

out_spatial_dim should in all cases refer to T', even when T' would actually never exist? So basically out_spatial_dim would be ignored inside the loop, and only used outside a rec loop?

albertz · 2022-03-04T09:22:09Z

The new dim T' would somehow have a reference to T. derived_from_tag is obvious, but we need more, esp we need the mask, or the indices for unmasking.

It's a bit like when e.g. ConvLayer did downsampling via striding. Then T' = T / 2. And this could be formulated as a mask as well.

Now we can see the unmasking as a sparse-to-dense operation.

However, this is still ambiguous. In sparse-to-dense, we would set 0 for all other frames. But this is rarely what we would want here. Specifically, when reproducing the output of a MaskedComputationLayer, we would have a very specific behavior, namely that the previous masked frame would be copied.

So, when we want to have automatic unmasking of T' to T, when such tensors are combined somewhere, e.g. via Data.get_common_data and then Data.copy_compatible_to, we don't just need the mask but we also need the kind of unmasking, i.e. how to fill the other frames.

albertz · 2022-03-04T09:52:55Z

We could use the derived_from_op mechanism, via a special op kind "mask", and store the mask in the attributes, along with unmask_type="left" or so, to reflect that we want this specific behavior when unmasking.

This could later be extended to be able to represent sparse data via unmask_type="fill" and unmask_fill_value.

The mask itself would be a Data instance. In our case, it would be of shape [B,T] with dtype bool. However, the shape is probably arbitrary. Although it probably should contain the original dim T.

Or as mentioned, instead of a mask, we could also store the indices, i.e. Data of shape [B,T'] pointing to T.

#769

This was referenced Nov 25, 2021

Specify dim tags for layers that create new axes #597

Closed

Masked computation wrapper rwth-i6/returnn_common#23

Closed

albertz added a commit that referenced this issue Mar 4, 2022

test_MaskedComputationLayer_in_loop_auto_unmask

585604f

#769

albertz mentioned this issue Mar 4, 2022

Fix MaskedComputationLayer for RETURNN principles #976

Merged

albertz added a commit that referenced this issue Mar 4, 2022

MaskedComputationLayer, out spatial dim with automatic unmasking info

63f2bcd

#769

albertz added a commit that referenced this issue Mar 5, 2022

test_MaskedComputationLayer_in_loop_auto_unmask

2bfcbee

#769

albertz added a commit that referenced this issue Mar 5, 2022

MaskedComputationLayer, out spatial dim with automatic unmasking info

ae72420

#769

albertz added a commit that referenced this issue Mar 8, 2022

MaskedComputationLayer, out spatial dim with automatic unmasking info

cb15a74

#769

albertz added a commit that referenced this issue Mar 8, 2022

test_MaskedComputationLayer_in_loop_auto_unmask

b2b130f

#769

albertz added a commit that referenced this issue Mar 8, 2022

MaskedComputationLayer, out spatial dim with automatic unmasking info

11aa978

#769

albertz added a commit that referenced this issue Mar 8, 2022

test_MaskedComputationLayer_in_loop_auto_unmask

e90762a

#769

albertz added a commit that referenced this issue Mar 8, 2022

test_MaskedComputationLayer_in_loop_auto_unmask

0874a8b

#769

albertz closed this as completed in #976 Mar 8, 2022

albertz added a commit that referenced this issue Mar 8, 2022

test_MaskedComputationLayer_in_loop_auto_unmask

06d05d6

#769

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`MaskedComputationLayer` is violating the principle that the user should not need to think about rec automatic optimization #769

`MaskedComputationLayer` is violating the principle that the user should not need to think about rec automatic optimization #769

albertz commented Nov 25, 2021

albertz commented Nov 25, 2021

albertz commented Nov 25, 2021

albertz commented Mar 4, 2022

albertz commented Mar 4, 2022 •

edited

Loading

MaskedComputationLayer is violating the principle that the user should not need to think about rec automatic optimization #769

MaskedComputationLayer is violating the principle that the user should not need to think about rec automatic optimization #769

Comments

albertz commented Nov 25, 2021

albertz commented Nov 25, 2021

albertz commented Nov 25, 2021

albertz commented Mar 4, 2022

albertz commented Mar 4, 2022 • edited Loading

`MaskedComputationLayer` is violating the principle that the user should not need to think about rec automatic optimization #769

`MaskedComputationLayer` is violating the principle that the user should not need to think about rec automatic optimization #769

albertz commented Mar 4, 2022 •

edited

Loading