-
Notifications
You must be signed in to change notification settings - Fork 130
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Solution for ambiguous dim tags #871
Conversation
9120587
to
b1f9634
Compare
Technically this PR looks fine, but I'm not sure if this is really a clean solution in general. I think the code you posted for your Linear module implementation is not too bad. I like about it that the variable has unique dim tags, I think that makes it pretty clear what is going on. I think I would prefer it to the somewhat less meaningful matching prios. Note that it would probably also be nice if it exposed the inner out dim tag (so that some other layer could e.g. transpose the variable). I do agree though that it is not that nice and that it would be very annoying if RETURNN configs would often need to do such extra handling. I don't really know a better solution now either. Maybe just adding utility functions to make the code you already had with the inner dim tag less verbose would help. Maybe you have new thoughts since you opened this, too? How often do you think does one need these extra checks? Only for layers introducing variables like Linear, probably also for a few others? |
It would work for all operations which do some reduce, e.g. also
There is no problem on this with the matching priority because that will be two different dims anyway - the target classes dim and some intermediate hidden dim, which would be almost always something different. Also, when dim orders do not matter, there is nothing like a transpose.
The code has several issues:
Not really.
Depends maybe on your research focus. But when you are writing new models or model parts (deriving new classes from Now when you want to do e.g. param sharing between two different modules (e.g. I think this case that we have The potential cases where such matching priorities could go wrong are probably very rare. Although, when this happens, this will be also very annoying to detect even as a bug (because no error will occur, it will just do sth) and to debug. |
2e01ae0
to
c1435a5
Compare
@Zettelkasten Have you further thought on this? I really don't like the solution with While I totally see your concerns with the proposed solution here, I currently don't see a better solution. Any suggestions are very welcome. |
Hm no, not really, I thought about this some time, but didn't come up with something else really.
For this concrete example with LSTM and Linear param sharing its not a problem btw, because LSTM has no square params. |
This is still an open issue. We need some solution here. |
I do not see any better solution here, in the end it is always the question: what does the user actually want? So for the regular network construction it is fine to throw an error and let the user set the match_priority, and I guess for returnn_common this will be set automatically in many cases. But the docstring and the error message definitely need to be more accurate. I will add some suggestions. |
Ok, due to the lack of any better solution, I'm merging this now. |
Consider the example (also in the test) to use
DotLayer
using a square matrix, so its two dimensions are the same.Here,
{"class": "dot", "from": ["data:data", "data:matrix_ambiguous"], "reduce": feat_dim}
does not work because it is ambiguous.Introducing the
Dim.match_priority
solves this problem, by having:This was suggested here: rwth-i6/returnn_common#17 (comment)
While maybe not the nicest solution, I don't really see any better solution at the moment.
This PR is also a test if anything else breaks.