-
Notifications
You must be signed in to change notification settings - Fork 130
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ideas for RETURNN behaviour changes #508
Comments
As discussed, this is just an example list. For every individual potential change, we should make a separate issue, and discuss things there. Every such issue should use the potential-new-behavior tag (and maybe reference this issue here). This issue here probably can be closed once we have a first version of |
Here is also a full list of all possible parameters: Many of those only belong to the Theano backend, and can possibly be ignored |
I would just start by one single change (e.g. Also, I would prefer to keep each increment of |
Btw, as this came up now with a proposed change on enforcing In principle, I would say that it should be easy for most users to use their existing setups and increase I'm not sure on changes which would break the majority of setups. E.g. enforcing So for a user which wants to keep up by using always the latest |
Ah, another thing: We should have a dedicated documentation for this. Where users can read what is being changed for each |
Btw, one other idea, maybe should be discussed now before we start working on any of this: Instead of having I wonder for the case of a Sisyphus setup which consists of many experiments, where maybe some older experiments use some older Just an idea. Maybe can also be added later. |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
Ah okay, no I do not think this will ever become a problem. And if for some reason this should ever happen, the job can just get an assert that the behaviour_version needs to be higher and thats it. |
Another aspect: Sometimes (or even often), the discussion will not be about behavior change, but just about removing some functionality, in favor of some better way. We don't strictly need to disallow this then. We can just put a deprecation warning with information what to use instead. The deprecation warning should probably be printed in any case, no matter the Maybe this again has to be decided just individually per case. |
I would prefer to strictly disallow this with new behavior versions, because warnings are often missed. This is of course no reason to not add warnings to everything "deprecated" even before adding something as behavior version. I think this is what we discuss in each issue: should we put only a warning, or should this be added as new behavior version and then be strictly enforced. |
I think all the open questions here have been clarified, and we are already having the first couple of behavior versions. So we can close this. |
This is required to ensure that layers can reorder axes in whatever way to allow for potential optimizations. Nothing should depend on the order of axes. See the [RETURNN principles](https://github.com/rwth-i6/returnn/wiki/RETURNN-principles). This is also for #792 to allow for an easier transition. This introduces a new behavior version (#508). While it probably requires changes for many configs, the changes should still be quite simple.
PyTorch might introduce sth similar (specifically for their param init defaults): pytorch/pytorch#41638
Edit They decided against it:
|
As there are now many different ways (as in parameter names, syntax styles etc...) to make RETURNN do the same things, it might be a good idea to add a
behaviour_version
parameter to prohibit using "deprecated" parameters and suboptimal behaviour of RETURNN. It can also be used to change the default behaviour of some layers which is currently suboptimal, such as the batch norm layers. With each desired change (e.g. commit or PR) to the behaviour, the version number would have to be increased by one.Here are some things we already recommend to avoid according to the documentation, e.g.:
num_inputs
andnum_outputs
should not be used in favor ofextern_data
extern_data
instead of the tuple syntaxoptimizer = {'class': "adam"}
or maybeoptimizer="adam"
instead ofadam=True
Then there are other redundant parameters which I can not name explicitely, but I think we have:
eval
versussearch_data
or so)And then we have sub-optimal default parameters, that should be changed or not have a default in the first place, e.g.:
ExtractAudioFeatures
should have no defaults, so that the feature settings are clearly visible in the configlstm
unit for rec layerOther things I would personally recommend doing in configs, e.g.:
available_for_inference
for theextern_data
entriesseq_order_control_dataset
within aMetaDataset
out_type
for any layer except maybe eval layer. Although here I am not sure if this has any major implications. Maybe there are some networks with rather instable construction and you want to have this fixed by hand.There is probably a lot more which needs to be collected/discussed, so this is just intended to give some initial ideas.
The text was updated successfully, but these errors were encountered: