Solution for ambiguous dim tags #871

albertz · 2021-12-19T23:34:23Z

Consider the example (also in the test) to use DotLayer using a square matrix, so its two dimensions are the same.

from returnn.tf.util.data import batch_dim
time_dim = SpatialDim("time")
feat_dim = FeatureDim("feature", dimension=3)
config = Config({
  "extern_data": {
    "data": {"dim_tags": [batch_dim, time_dim, feat_dim]},
    "matrix_ambiguous": {"dim_tags": [feat_dim, feat_dim], "available_for_inference": True},
  },
})

Here, {"class": "dot", "from": ["data:data", "data:matrix_ambiguous"], "reduce": feat_dim} does not work because it is ambiguous.

Introducing the Dim.match_priority solves this problem, by having:

"matrix_non_ambiguous": {
  "dim_tags": [feat_dim.copy(match_priority=1), feat_dim], "available_for_inference": True},

This was suggested here: rwth-i6/returnn_common#17 (comment)

While maybe not the nicest solution, I don't really see any better solution at the moment.

This PR is also a test if anything else breaks.

Zettelkasten · 2021-12-20T23:09:21Z

Technically this PR looks fine, but I'm not sure if this is really a clean solution in general.
In particular it is not so clean that you have to specify a matching priority when you define your weight variable, which is not really specific to the variable itself but rather specific to the operation you use it as argument to afterwards (in your case, dot layer).
For example, in our MT configs we often use the transposed target embeddings as projection matrix. In this case I need my dot operation to reduce the axis exactly on the other axis then before, i.e. the one with the lower matching priority.
Sure, there are solutions to this (e.g. to reinterpret the variable output to change the matching priorities), but I personally think that this is an indicator that this concept in general is not that complete and perhaps more confusing/error prone.

I think the code you posted for your Linear module implementation is not too bad. I like about it that the variable has unique dim tags, I think that makes it pretty clear what is going on. I think I would prefer it to the somewhat less meaningful matching prios. Note that it would probably also be nice if it exposed the inner out dim tag (so that some other layer could e.g. transpose the variable).

I do agree though that it is not that nice and that it would be very annoying if RETURNN configs would often need to do such extra handling. I don't really know a better solution now either. Maybe just adding utility functions to make the code you already had with the inner dim tag less verbose would help. Maybe you have new thoughts since you opened this, too?

How often do you think does one need these extra checks? Only for layers introducing variables like Linear, probably also for a few others?
Would this also be an issue for RETURNN code that the user usually writes? Or could the user just call what you have in your Linear module (or some related, some what elementary building block)?

albertz · 2021-12-21T00:42:15Z

In particular it is not so clean that you have to specify a matching priority when you define your weight variable, which is not really specific to the variable itself but rather specific to the operation you use it as argument to afterwards (in your case, dot layer).

It would work for all operations which do some reduce, e.g. also gather, and that will always be the main purpose of this variable in this case.

For example, in our MT configs we often use the transposed target embeddings as projection matrix. In this case I need my dot operation to reduce the axis exactly on the other axis then before, i.e. the one with the lower matching priority.

There is no problem on this with the matching priority because that will be two different dims anyway - the target classes dim and some intermediate hidden dim, which would be almost always something different.

Also, when dim orders do not matter, there is nothing like a transpose.

I think the code you posted for your Linear module implementation is not too bad. I like about it that the variable has unique dim tags, I think that makes it pretty clear what is going on. I think I would prefer it to the somewhat less meaningful matching prios. Note that it would probably also be nice if it exposed the inner out dim tag (so that some other layer could e.g. transpose the variable).

The code has several issues:

It is too complicated.
It is not straightforward. I only stumbled upon it because I got it wrong first. This is a bad sign. Ideally it should be designed such that the obvious and straightforward solution works. (Although you could argue, the matching priority is not straightforward either.)
Now using it for the transposed linear operation as you mentioned before would actually not work anymore, or only with further complicated code.
I was also not sure, should I create this dummy dim always or only when in_dim == out_dim? In both cases, it's not so nice. Creating it always is maybe easier to reason about because there will never be two cases, only one case, which always makes it simpler and better.
Some other code which creates the variable or any tensor which would be used inplace of the variable would also need to have this extra logic. When assigning linear.weights = ..., it would additionally also need to assign linear.out_dim_inner = ....

Maybe you have new thoughts since you opened this, too?

Not really.

How often do you think does one need these extra checks? Only for layers introducing variables like Linear, probably also for a few others? Would this also be an issue for RETURNN code that the user usually writes? Or could the user just call what you have in your Linear module (or some related, some what elementary building block)?

Depends maybe on your research focus. But when you are writing new models or model parts (deriving new classes from nn.Module), you need to create variables all the time. Well, in many cases, you might reuse existing modules to define the new model, but often enough it could be also more custom. It's hard to tell.

Now when you want to do e.g. param sharing between two different modules (e.g. Linear and LSTM), this is easy and straightforward to do in principle (just linear.weights = lstm.weights), but it would be complicated if you also need to deal with sth like out_dims_inner.

I think this case that we have in_dim == out_dim is not so rare actually.

The potential cases where such matching priorities could go wrong are probably very rare. Although, when this happens, this will be also very annoying to detect even as a bug (because no error will occur, it will just do sth) and to debug.

Via: rwth-i6/returnn_common#17 (comment)

albertz · 2022-01-05T10:51:32Z

@Zettelkasten Have you further thought on this? I really don't like the solution with out_dims_inner. It should be possible to write generic code which would copy params around in whatever way. Sth like out_dims_inner would make this impossible because you never know if there is some xy_inner_temp_dummy_dim for some weight which also needs to be copied alongside.

While I totally see your concerns with the proposed solution here, I currently don't see a better solution. Any suggestions are very welcome.

Zettelkasten · 2022-01-05T13:20:19Z

Hm no, not really, I thought about this some time, but didn't come up with something else really.

Now when you want to do e.g. param sharing between two different modules (e.g. Linear and LSTM), this is easy and straightforward to do in principle (just linear.weights = lstm.weights), but it would be complicated if you also need to deal with sth like out_dims_inner.

For this concrete example with LSTM and Linear param sharing its not a problem btw, because LSTM has no square params.
But I agree this is a problem in general when you write your own modules and re-use parameters.

albertz · 2022-02-02T13:39:59Z

This is still an open issue. We need some solution here.

JackTemaki · 2022-02-03T13:48:22Z

I do not see any better solution here, in the end it is always the question: what does the user actually want? So for the regular network construction it is fine to throw an error and let the user set the match_priority, and I guess for returnn_common this will be set automatically in many cases. But the docstring and the error message definitely need to be more accurate. I will add some suggestions.

returnn/tf/util/data.py

albertz · 2022-02-11T11:00:20Z

Ok, due to the lack of any better solution, I'm merging this now.

#17 rwth-i6/returnn#871

albertz requested a review from a team as a code owner December 19, 2021 23:34

albertz requested a review from Zettelkasten December 19, 2021 23:34

albertz mentioned this pull request Dec 19, 2021

Design/Handling of dimension tags rwth-i6/returnn_common#17

Closed

7 tasks

albertz force-pushed the albert-dot-square-matrix branch from 9120587 to b1f9634 Compare December 20, 2021 08:21

albertz mentioned this pull request Dec 21, 2021

DotLayer and Data Dim match, small improvements #872

Merged

albertz added 2 commits December 21, 2021 13:21

Dim match_priority

74981f1

Via: rwth-i6/returnn_common#17 (comment)

test_DotLayer_linear_square_matrix

c1435a5

albertz force-pushed the albert-dot-square-matrix branch from 2e01ae0 to c1435a5 Compare December 21, 2021 12:22

albertz mentioned this pull request Jan 12, 2022

Enforce dim tags to be unique in tensor #632

Open

JackTemaki reviewed Feb 3, 2022

View reviewed changes

returnn/tf/util/data.py Outdated Show resolved Hide resolved

JackTemaki reviewed Feb 3, 2022

View reviewed changes

returnn/tf/util/data.py Outdated Show resolved Hide resolved

comments by Nick

c5573fd

albertz merged commit a10e882 into master Feb 11, 2022

albertz deleted the albert-dot-square-matrix branch February 11, 2022 11:01

albertz added a commit to rwth-i6/returnn_common that referenced this pull request Feb 11, 2022

avoid out_dim_inner, use match_priority

53b7bc6

#17 rwth-i6/returnn#871

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Solution for ambiguous dim tags #871

Solution for ambiguous dim tags #871

albertz commented Dec 19, 2021

Zettelkasten commented Dec 20, 2021

albertz commented Dec 21, 2021

albertz commented Jan 5, 2022

Zettelkasten commented Jan 5, 2022

albertz commented Feb 2, 2022

JackTemaki commented Feb 3, 2022

albertz commented Feb 11, 2022

Solution for ambiguous dim tags #871

Solution for ambiguous dim tags #871

Conversation

albertz commented Dec 19, 2021

Zettelkasten commented Dec 20, 2021

albertz commented Dec 21, 2021

albertz commented Jan 5, 2022

Zettelkasten commented Jan 5, 2022

albertz commented Feb 2, 2022

JackTemaki commented Feb 3, 2022

albertz commented Feb 11, 2022