Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix MaskedComputationLayer for RETURNN principles #976

Merged
merged 14 commits into from
Mar 8, 2022

Conversation

albertz
Copy link
Member

@albertz albertz commented Mar 4, 2022

Fix #769. Specifically, it fixes the problem that the behavior when optimized out of a rec loop is not consistent and opaque to the user.

The masked_from option still is an exception though.

@albertz albertz force-pushed the albert-masked-comp-simpler-769 branch 6 times, most recently from 9f8a4b2 to 6723531 Compare March 4, 2022 23:56
@albertz

This comment was marked as outdated.

@albertz albertz force-pushed the albert-masked-comp-simpler-769 branch from 6723531 to fca3def Compare March 5, 2022 09:38
@albertz

This comment was marked as resolved.

@albertz

This comment was marked as outdated.

@albertz
Copy link
Member Author

albertz commented Mar 8, 2022

What actually would be the expected behavior for this inside a rec layer:

x_masked = masked_comp(x, mask=mask)
y = lstm(x_masked)

Well, the expected behavior is mostly clear for the case when it is inside a rec loop and not optimized.
And this should be the true behavior in all cases, per RETURNN principles because the user should not need to think about the automatic optimization (#769).

Currently, however, when this is optimized, the result would be wrong (or not match this expected behavior) because lstm would operated on the masked output and not the unmasked. (@robin-p-schmitt btw, do you maybe have this case?)

It would also still be wrong with all the new changes here because the automatic unmasking as implemented here happens only in copy_compatible_to.

The question is, what follow-up operations would we usually use actually?

In case of a transducer, it is common to have a joint network on top. We should actually carefully check current setups. I think current setups might actually really expect the current behavior, and not the "expected" behavior.
Edit, Yes this is unfortunately the case. E.g. this config:

"lm_masked": {"class": "masked_computation",
    "mask": "prev:output_emit",
    "from": "prev_out_non_blank",  # in decoding
    "masked_from": "base:lm_input" if task == "train" else None,  # enables optimization if used
    ...
    }}},
"readout_in": {"class": "linear", "from": ["am", "lm_masked"], "activation": None, "n_out": 1000, "L2": l2, "dropout": 0.2,
  "out_type": {"batch_dim_axis": 2 if task == "train" else 0, "shape": (None, None, 1000) if task == "train" else (1000,),
    "time_dim_axis": 0 if task == "train" else None}}, # (T, U+1, B, 1000
"readout": {"class": "reduce_out", "mode": "max", "num_pieces": 2, "from": "readout_in"},
...

So, we really expect that we do not make the both axes T and U here compatible, and instead that we keep both.
Also note though that we need masked_from here, and without masked_from, this would not work because this setup does not have any given fixed alignment. This is the point of this setup here, that it will then calculate the full sum over all alignments, i.e. the standard RNN-T loss.

Edit masked_from is really a special case. When this is set, it basically implies that we can optimize it out, and that we specifically want that. So maybe this can trigger this special current behavior. When it is not set, but it is still moved out, I think it should not trigger the current behavior.

@robin-p-schmitt

This comment was marked as off-topic.

@albertz

This comment was marked as off-topic.

@robin-p-schmitt

This comment was marked as off-topic.

@albertz
Copy link
Member Author

albertz commented Mar 8, 2022

The implication from my previous post is basically:

  • If masked_from is set, do the current logic. I think the documentation needs to be improved.
  • Else, if inside rec loop: Just as canonical, as we do now.
  • Else, if inside rec layer but optimized out: unmask right away. This is a change from before. But this is required for any other recurrent follow-up layers to behave correctly.
  • Else, if outside rec layer (really outside, not just optimized): keep current behavior, i.e. we want that the output stays masked.

It also means, this PR here changes a bit. We don't need such automatic broadcasting logic in copy_compatible_to.

@albertz

This comment was marked as off-topic.

@robin-p-schmitt

This comment was marked as off-topic.

@albertz

This comment was marked as outdated.

@robin-p-schmitt

This comment was marked as resolved.

@albertz
Copy link
Member Author

albertz commented Mar 8, 2022

So I actually never have this case:

x_masked = masked_comp(x, mask=mask)
y = lstm(x_masked)

I always have smth like:

x_masked = masked_comp(x, mask=mask)
x_unmasked = unmask(x_masked, mask=mask)
y = lstm(x_unmasked)

In this case, there shouldn't be any unexpected behavior, right?

So, you always have unmask right after the masked_computation? Then it should be correct. I was speaking about the case that there is anything else after masked_computation.

@albertz albertz force-pushed the albert-masked-comp-simpler-769 branch from cd37938 to 206577b Compare March 8, 2022 16:41
@albertz albertz force-pushed the albert-masked-comp-simpler-769 branch from 206577b to 79846ef Compare March 8, 2022 19:22
@albertz albertz force-pushed the albert-masked-comp-simpler-769 branch from 6848570 to c492582 Compare March 8, 2022 21:41
@albertz albertz marked this pull request as ready for review March 8, 2022 21:44
@albertz albertz requested a review from a team as a code owner March 8, 2022 21:44
@albertz albertz changed the title Automatic unmasking after MaskedComputationLayer Fix MaskedComputationLayer for RETURNN principles Mar 8, 2022
@albertz albertz merged commit ce19760 into master Mar 8, 2022
@albertz albertz deleted the albert-masked-comp-simpler-769 branch March 8, 2022 21:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants