-
Notifications
You must be signed in to change notification settings - Fork 130
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Design of PyTorch step-functions #1290
Comments
The current API is described exactly in the initial post. We also suggest The difference between I think this is all we need. |
I opened this discussion to challenge those decisions. The questions are: is it okay to no distinguish between train and eval? Could there be ever the need to have code in eval which is different to train. (As I said, if not provided we could always have train as the fallback). Then for the TrainCtx. I do not like that for a pure PyTorch model you would have to call |
What's the problem with that? We discussed this and we came to the conclusion it is simple enough, or actually simpler that way. For the user, there is not really any difference anyway. Either you need to pass around train_ctx.mark_as_loss(...) Vs (which works always): get_train_ctx().mark_as_loss(...) In case you use I'm not really sure what separation you mean. Or you just say this function |
You likely would anyway use a separate config for that, in case you want to evaluate sth different,. In that separate config, you could define any other custom Similarly for |
Thinking about the details I'm noticing that the interface is really more complex than I thought. That's what I had in mind so far:
(Beam search is a placeholder here for however you want to decode.) But: just having a For the maybe earlier intended use case of returnn_common where you define a Module, construct the network dict from it and set the |
Whether you have just a Whether you call Maybe look at my existing example (using RC, but the API is very similar):
In those examples, I do all the But you are really free to do it however you want. I liked to have this separation, at least for these examples, but maybe there are other cases where you want to do the
This is still possible with the current design. But I actually never really intended it to be done like this. But none of this discussion here is really anything about eager-based vs graph-based, or not? All what I said here is just the same, no matter if graph-based or eager-based. Yes, in eager mode, everything is always calculated. You have to make sure you would not calculate it if you don't need it. You can simply check on |
Btw, also check how they do it in Fairseq or ESPnet. I would assume they also separate the model stuff from the the loss calculation. Or at least have that in a separate function. |
Nope, for ESPNet it is hidden somewhere completely in the middle in a forward function in the model, e.g. https://github.com/espnet/espnet/blob/master/espnet2/gan_tts/vits/vits.py#L424 This also means it is completely hidden to the outside what the losses are. No idea how they track the losses then, definitely not an approach I like. BUT: So my question is:
why? If I have a duration predictor module in my TTS I want the duration loss to be defined exactly in there. Otherwise I lose modularity.
Do not forget the PyTorch way is to use So we would not even need Whatever we do, RETURNN should not impose any limits. If you want to define your losses in the forward/call somewhere deep in the model, it should be possible. If you want to collect all losses in the train_step, it should be possible. If you want to store extra stuff in a global context, it should be possible. If you want to pass everything manually in the forward function, well this is anyway always possible. It should also be easy to do more custom stuff like turn based training (https://github.com/espnet/espnet/blob/master/espnet2/train/gan_trainer.py#L147). But I think this is already working now, you just return different losses based on the step. For really crazy stuff we could even allow that every single function of the engine can be overwritten by a function defined in the config. So if someone wants a custom |
I have to correct myself here, as we want cross validation in eval mode but WITH loss computation, we can of course not use |
I just meant that I never intended to use that in my setups if possible, and that's what I would also recommend. Although I also saw some potential exceptions for some unsupervised auxiliary losses or so. And I also was not 100% sure about it, and just wanted to gain some experience in actually using it. But I actually always intended to have the possibility to define it wherever you want. And you have that possibility in the current design via |
Yes, although I personally do not like that the context is only available via the |
I implemented that partly now. |
Related to #1120 I want to start the discussion on which step functions we want to have and how they look like (This should be independent of PyTorch vs. RF) . Currently implemented is
train_step
. One possible approach is:train_step
eval_step
andforward_step
with an automatic fallback totrain_step
if no customization is needed.Then the question is if all get the same parameters or not. Currently we have model, data and TrainCtx. I think this is enough for the start (any implementation should have **kwargs anyway to not break). Maybe TrainCtx can be renamed then if it is not only for training.
The text was updated successfully, but these errors were encountered: