Fix MaskedComputationLayer for RETURNN principles #976

albertz · 2022-03-04T14:05:42Z

Fix #769. Specifically, it fixes the problem that the behavior when optimized out of a rec loop is not consistent and opaque to the user.

The masked_from option still is an exception though.

albertz · 2022-03-08T13:54:16Z

What actually would be the expected behavior for this inside a rec layer:

x_masked = masked_comp(x, mask=mask)
y = lstm(x_masked)

Well, the expected behavior is mostly clear for the case when it is inside a rec loop and not optimized.
And this should be the true behavior in all cases, per RETURNN principles because the user should not need to think about the automatic optimization (#769).

Currently, however, when this is optimized, the result would be wrong (or not match this expected behavior) because lstm would operated on the masked output and not the unmasked. (@robin-p-schmitt btw, do you maybe have this case?)

It would also still be wrong with all the new changes here because the automatic unmasking as implemented here happens only in copy_compatible_to.

The question is, what follow-up operations would we usually use actually?

In case of a transducer, it is common to have a joint network on top. We should actually carefully check current setups. I think current setups might actually really expect the current behavior, and not the "expected" behavior.
Edit, Yes this is unfortunately the case. E.g. this config:

"lm_masked": {"class": "masked_computation",
    "mask": "prev:output_emit",
    "from": "prev_out_non_blank",  # in decoding
    "masked_from": "base:lm_input" if task == "train" else None,  # enables optimization if used
    ...
    }}},
"readout_in": {"class": "linear", "from": ["am", "lm_masked"], "activation": None, "n_out": 1000, "L2": l2, "dropout": 0.2,
  "out_type": {"batch_dim_axis": 2 if task == "train" else 0, "shape": (None, None, 1000) if task == "train" else (1000,),
    "time_dim_axis": 0 if task == "train" else None}}, # (T, U+1, B, 1000
"readout": {"class": "reduce_out", "mode": "max", "num_pieces": 2, "from": "readout_in"},
...

So, we really expect that we do not make the both axes T and U here compatible, and instead that we keep both.
Also note though that we need masked_from here, and without masked_from, this would not work because this setup does not have any given fixed alignment. This is the point of this setup here, that it will then calculate the full sum over all alignments, i.e. the standard RNN-T loss.

Edit masked_from is really a special case. When this is set, it basically implies that we can optimize it out, and that we specifically want that. So maybe this can trigger this special current behavior. When it is not set, but it is still moved out, I think it should not trigger the current behavior.

albertz · 2022-03-08T15:15:43Z

The implication from my previous post is basically:

If masked_from is set, do the current logic. I think the documentation needs to be improved.
Else, if inside rec loop: Just as canonical, as we do now.
Else, if inside rec layer but optimized out: unmask right away. This is a change from before. But this is required for any other recurrent follow-up layers to behave correctly.
Else, if outside rec layer (really outside, not just optimized): keep current behavior, i.e. we want that the output stays masked.

It also means, this PR here changes a bit. We don't need such automatic broadcasting logic in copy_compatible_to.

albertz · 2022-03-08T15:35:21Z

So I actually never have this case:
x_masked = masked_comp(x, mask=mask)
y = lstm(x_masked)
I always have smth like:
x_masked = masked_comp(x, mask=mask)
x_unmasked = unmask(x_masked, mask=mask)
y = lstm(x_unmasked)
In this case, there shouldn't be any unexpected behavior, right?

So, you always have unmask right after the masked_computation? Then it should be correct. I was speaking about the case that there is anything else after masked_computation.

#769

#976

albertz force-pushed the albert-masked-comp-simpler-769 branch 6 times, most recently from 9f8a4b2 to 6723531 Compare March 4, 2022 23:56

This comment was marked as outdated.

Sign in to view

albertz force-pushed the albert-masked-comp-simpler-769 branch from 6723531 to fca3def Compare March 5, 2022 09:38

This comment was marked as resolved.

Sign in to view

This comment was marked as outdated.

Sign in to view

This comment was marked as off-topic.

Sign in to view

This comment was marked as outdated.

Sign in to view

This comment was marked as resolved.

Sign in to view

albertz added 6 commits March 8, 2022 17:10

CumsumLayer, fix initial_output

743b0e4

CumsumLayer, fix initial_output support outside of rec loop

37a8139

test_reclayer_optimize_out_cumsum_step_by_step_initial

587e2c5

test_MaskedComputationLayer_UnmaskLayer_in_loop_opt

2223872

Data.find_matching_dim_map, cleanup

5d8ab0c

Data.find_matching_dims, fix doc

f9d49e4

albertz force-pushed the albert-masked-comp-simpler-769 branch from cd37938 to 206577b Compare March 8, 2022 16:41

albertz added 3 commits March 8, 2022 20:22

test_Dim_get_all_dimension_tags_one_derived_time

69ef116

Dim.get_all_dimension_tags, prefer derived bases

b191628

Data.get_common_data, prefer better dim tags

6511181

Data.get_common_data, cleanup

f3a77e2

albertz force-pushed the albert-masked-comp-simpler-769 branch from 206577b to 79846ef Compare March 8, 2022 19:22

albertz added 2 commits March 8, 2022 20:27

test_MaskedComputationLayer_in_loop_auto_unmask

0874a8b

#769

MaskedComputationLayer, fix get_rec_initial_extra_outputs call

3f938d5

albertz force-pushed the albert-masked-comp-simpler-769 branch from 79846ef to 3f938d5 Compare March 8, 2022 19:27

albertz mentioned this pull request Mar 8, 2022

Masked computation wrapper rwth-i6/returnn_common#23

Closed

albertz added a commit that referenced this pull request Mar 8, 2022

MaskedComputationLayer, better dim logic, unmask if rec optimized

14c8fa5

#976

albertz added a commit that referenced this pull request Mar 8, 2022

MaskedComputationLayer, better dim logic, unmask if rec optimized

97c612b

#976

albertz added a commit that referenced this pull request Mar 8, 2022

UnmaskLayer, fix no-op case, comment

6848570

#976

albertz force-pushed the albert-masked-comp-simpler-769 branch from 14c8fa5 to 6848570 Compare March 8, 2022 21:25

albertz mentioned this pull request Mar 8, 2022

MaskedComputationLayer: allow sublayer to have different time dimension than mask #979

Closed

albertz added 2 commits March 8, 2022 22:41

MaskedComputationLayer, better dim logic, unmask if rec optimized

881a0ab

#976

UnmaskLayer, fix no-op case, comment

c492582

#976

albertz force-pushed the albert-masked-comp-simpler-769 branch from 6848570 to c492582 Compare March 8, 2022 21:41

albertz marked this pull request as ready for review March 8, 2022 21:44

albertz requested a review from a team as a code owner March 8, 2022 21:44

albertz changed the title ~~Automatic unmasking after MaskedComputationLayer~~ Fix MaskedComputationLayer for RETURNN principles Mar 8, 2022

albertz requested a review from robin-p-schmitt March 8, 2022 21:47

albertz merged commit ce19760 into master Mar 8, 2022

albertz added a commit that referenced this pull request Mar 8, 2022

MaskedComputationLayer, better dim logic, unmask if rec optimized

9f32044

#976

albertz deleted the albert-masked-comp-simpler-769 branch March 8, 2022 21:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix MaskedComputationLayer for RETURNN principles #976

Fix MaskedComputationLayer for RETURNN principles #976

albertz commented Mar 4, 2022 •

edited

Loading

This comment was marked as outdated.

This comment was marked as resolved.

This comment was marked as outdated.

albertz commented Mar 8, 2022 •

edited

Loading

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

albertz commented Mar 8, 2022 •

edited

Loading

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as outdated.

This comment was marked as resolved.

albertz commented Mar 8, 2022

Fix MaskedComputationLayer for RETURNN principles #976

Fix MaskedComputationLayer for RETURNN principles #976

Conversation

albertz commented Mar 4, 2022 • edited Loading

This comment was marked as outdated.

This comment was marked as resolved.

This comment was marked as outdated.

albertz commented Mar 8, 2022 • edited Loading

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

albertz commented Mar 8, 2022 • edited Loading

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as outdated.

This comment was marked as resolved.

albertz commented Mar 8, 2022

albertz commented Mar 4, 2022 •

edited

Loading

albertz commented Mar 8, 2022 •

edited

Loading

albertz commented Mar 8, 2022 •

edited

Loading