Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RecLayer multiple inputs via explicit unstacking #552

Merged
merged 5 commits into from
Oct 6, 2021
Merged

Conversation

albertz
Copy link
Member

@albertz albertz commented Jun 27, 2021

Instead of using from: x in the RecLayer, which gets unstacked (via TensorArray) such that inside the rec loop you get x[i], here is some approach to make this unstacking explicit. This would then also support multiple inputs to the RecLayer.

returnn/tf/layers/rec.py Outdated Show resolved Hide resolved
@albertz albertz mentioned this pull request Sep 14, 2021
@patrick-wilken
Copy link
Contributor

Related to #649 the question is, whether the additional inputs should be considered for the sequence length (dynamic vs. fixed) in _SubnetworkCell.get_output(). So if we copy a base-layer into a dynamic loop, does it inherit the fixed length? Can we allow several base-layers with different length? Etc.

@albertz
Copy link
Member Author

albertz commented Sep 14, 2021

Related to #649 the question is, whether the additional inputs should be considered for the sequence length (dynamic vs. fixed) in _SubnetworkCell.get_output(). So if we copy a base-layer into a dynamic loop, does it inherit the fixed length? Can we allow several base-layers with different length? Etc.

I'm not sure I understand. You mean whether i is an explicit argument to the layer or implicit via the rec layer (so basically hardcoded to :i)?

But even when it is hardcoded to :i, this can still support any input which is the same length or shorter, so it doesn't need to be the same.

Or what do you mean by "fixed length"? The length can still be dynamic in any case (of shape [B]).

The number of iterations (max(i) + 1) the tf.while_loop does is of course a scalar.

I think I would prefer to not have too much magic here. This layer (IterLayer) or such a layer should really just do x[i] (via TensorArray), nothing else. Whatever else you do should be up to you via other layers.

@albertz albertz marked this pull request as ready for review October 6, 2021 08:51
@albertz
Copy link
Member Author

albertz commented Oct 6, 2021

Note on the layer name: I considered several options:

  • IterLayer: because we iterate over some tensor, so you could think of for x in xs: ....
  • UnstackLayer because you unstack the tensor, and it basically wraps tf.TensorArray.unstack

I did not want to have TensorArray in the name as I felt this being too much implementation specific. The name should reflect the logical operation and not implementation details.

In the end, I choose RecUnstackLayer, to reflect that this is specific for a RecLayer. There is also StackLayer, and people might mistake this as the inverse of StackLayer, which would be misleading. There also might be some RecStackLayer later, although this is anyway what you get when you access some RecLayer sub layer from outside (when it was marked as output layer).

@albertz albertz merged commit e3fb72c into master Oct 6, 2021
@albertz albertz deleted the albert-rec-iter branch October 6, 2021 09:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants