Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ideas for RETURNN behaviour changes #508

Closed
JackTemaki opened this issue May 19, 2021 · 19 comments
Closed

Ideas for RETURNN behaviour changes #508

JackTemaki opened this issue May 19, 2021 · 19 comments
Labels
potential-new-behavior Discussions about RETURNN behaviour

Comments

@JackTemaki
Copy link
Collaborator

As there are now many different ways (as in parameter names, syntax styles etc...) to make RETURNN do the same things, it might be a good idea to add a behaviour_version parameter to prohibit using "deprecated" parameters and suboptimal behaviour of RETURNN. It can also be used to change the default behaviour of some layers which is currently suboptimal, such as the batch norm layers. With each desired change (e.g. commit or PR) to the behaviour, the version number would have to be increased by one.

Here are some things we already recommend to avoid according to the documentation, e.g.:

  • num_inputs and num_outputs should not be used in favor of extern_data
  • using the "dict" syntax for extern_data instead of the tuple syntax
  • prohibit using old theano parameters such as "multiprocessing" when working with tensorflow
  • prohibit direct optimizer parameters, so writing optimizer = {'class': "adam"} or maybe optimizer="adam" instead of adam=True
  • always use explicit input and target definitions (no default fallback to "data" or "classes")

Then there are other redundant parameters which I can not name explicitely, but I think we have:

  • overlapping/redundant parameters names for the dataset (e.g. eval versus search_data or so)
  • learning_rate vs learning_rates

And then we have sub-optimal default parameters, that should be changed or not have a default in the first place, e.g.:

  • batch norm layer parameters
  • ExtractAudioFeatures should have no defaults, so that the feature settings are clearly visible in the config
  • no default lstm unit for rec layer

Other things I would personally recommend doing in configs, e.g.:

  • always provide an explicit available_for_inference for the extern_data entries
  • always use seq_order_control_dataset within a MetaDataset
  • not using out_type for any layer except maybe eval layer. Although here I am not sure if this has any major implications. Maybe there are some networks with rather instable construction and you want to have this fixed by hand.

There is probably a lot more which needs to be collected/discussed, so this is just intended to give some initial ideas.

@JackTemaki JackTemaki added the potential-new-behavior Discussions about RETURNN behaviour label May 19, 2021
@albertz
Copy link
Member

albertz commented May 19, 2021

As discussed, this is just an example list. For every individual potential change, we should make a separate issue, and discuss things there. Every such issue should use the potential-new-behavior tag (and maybe reference this issue here).

This issue here probably can be closed once we have a first version of behaviour_version.
The setup of behaviour_version can probably be done in a similar way as use_tensorflow.
(Btw, use_tensorflow is also a default we can change by this.)

@JackTemaki
Copy link
Collaborator Author

JackTemaki commented May 21, 2021

Here is also a full list of all possible parameters:
https://gist.github.com/JackTemaki/b8c23b8eba61e4951ea22561f30e54a1

Many of those only belong to the Theano backend, and can possibly be ignored

@albertz
Copy link
Member

albertz commented May 21, 2021

I would just start by one single change (e.g. num_inputs/num_outputs + disallowing tuples) and introduce behaviour_version=1 for that.

Also, I would prefer to keep each increment of behaviour_version mostly for a single logical change, and not mix up different things (eg optimizer and num_inputs are totally independent of each other, should not be mixed up).

@albertz
Copy link
Member

albertz commented May 21, 2021

Btw, as this came up now with a proposed change on enforcing available_for_inference, but I want to put this more general:

In principle, I would say that it should be easy for most users to use their existing setups and increase behaviour_version, and if they have followed good practice and our general recommendations, it would all still work (maybe just get better, by better random init defaults or so).

I'm not sure on changes which would break the majority of setups. E.g. enforcing available_for_inference is such an example. I think really no-one explicitly specifies this (unless really needed, when the default is wrong). (Let's not discuss available_for_inference specifically here but leave that to a separate issue.)

So for a user which wants to keep up by using always the latest behaviour_version, how simple should that be for him/her? Should we still try to keep this simple, and avoid breaking changes unless for very good reasons? But maybe this just has to be discussed and decided for each change individually, and there is no generic answer.

@albertz
Copy link
Member

albertz commented May 21, 2021

Ah, another thing: We should have a dedicated documentation for this. Where users can read what is being changed for each behavior_version. (Maybe similar as the changelog.)

@albertz
Copy link
Member

albertz commented May 21, 2021

Btw, one other idea, maybe should be discussed now before we start working on any of this:

Instead of having behavior_version (an integer), maybe we should explicitly (or additionally) have flags for each change?

I wonder for the case of a Sisyphus setup which consists of many experiments, where maybe some older experiments use some older behavior_version, and some others use a newer behavior_version. And when Sisyphus interacts with these configs, some interactions might then also depend on the behavior_version (e.g. random example: the search job needs to use search_data, and sth changed there). But then this might clash. Sisyphus is just an example. There might be other cases maybe (not sure how often, or how difficult it is to deal with that). If we also have explicit flags, we can maybe overwrite some of this, like behavior_enable = {flag_x} or so.

Just an idea. Maybe can also be added later.

@JackTemaki

This comment has been minimized.

@albertz

This comment has been minimized.

@JackTemaki

This comment has been minimized.

@albertz

This comment has been minimized.

@JackTemaki
Copy link
Collaborator Author

This is not what I meant. I meant that e.g. some job, e.g. search job, might require some specific behavior

Ah okay, no I do not think this will ever become a problem. And if for some reason this should ever happen, the job can just get an assert that the behaviour_version needs to be higher and thats it.

@albertz
Copy link
Member

albertz commented May 25, 2021

Another aspect: Sometimes (or even often), the discussion will not be about behavior change, but just about removing some functionality, in favor of some better way.

We don't strictly need to disallow this then. We can just put a deprecation warning with information what to use instead.

The deprecation warning should probably be printed in any case, no matter the behavior_version. So the question is just, should sth strictly be disallowed with a new behavior_version or is the deprecation warning enough?

Maybe this again has to be decided just individually per case.

@JackTemaki
Copy link
Collaborator Author

I would prefer to strictly disallow this with new behavior versions, because warnings are often missed.

This is of course no reason to not add warnings to everything "deprecated" even before adding something as behavior version.

I think this is what we discuss in each issue: should we put only a warning, or should this be added as new behavior version and then be strictly enforced.

@albertz
Copy link
Member

albertz commented Nov 9, 2021

I think all the open questions here have been clarified, and we are already having the first couple of behavior versions. So we can close this.

@albertz albertz closed this as completed Nov 9, 2021
albertz added a commit that referenced this issue Nov 29, 2021
This is required to ensure that layers can reorder axes in whatever way to allow for potential optimizations. Nothing should depend on the order of axes. See the [RETURNN principles](https://github.com/rwth-i6/returnn/wiki/RETURNN-principles).

This is also for #792 to allow for an easier transition.

This introduces a new behavior version (#508).

While it probably requires changes for many configs, the changes should still be quite simple.
albertz added a commit that referenced this issue Dec 7, 2021
Fix #787.

Introduces new behavior version (#508).
albertz added a commit that referenced this issue Dec 7, 2021
Fix #787.

Introduces new behavior version (#508).
albertz pushed a commit that referenced this issue Dec 17, 2021
albertz added a commit that referenced this issue Jan 6, 2022
Fix #522

This introduces a new behavior version (#508).
albertz added a commit that referenced this issue Jan 6, 2022
Fix #522

This introduces a new behavior version (#508).
albertz added a commit that referenced this issue Jan 6, 2022
Fix #522

This introduces a new behavior version (#508).
@albertz
Copy link
Member

albertz commented Jan 18, 2022

PyTorch might introduce sth similar (specifically for their param init defaults): pytorch/pytorch#41638

torch.nn.init.set_init_version('1.7.1')

Edit They decided against it:

It doesn't look to me like this would fully work, so I'd be inclined to close it, even if I still think that this is one of PyTorch's murkier bits.

albertz added a commit that referenced this issue Oct 13, 2022
A check on matching time dim of RecLayer sub output layer
to the RecLayer time dim.

Fix #1140

This introduces a new behavior version 13 (#508).
albertz added a commit that referenced this issue Oct 13, 2022
A check on matching time dim of RecLayer sub output layer
to the RecLayer time dim.

Fix #1140

This introduces a new behavior version 13 (#508).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
potential-new-behavior Discussions about RETURNN behaviour
Projects
None yet
Development

No branches or pull requests

2 participants