-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merge PR #2 and #3 and add compatibility with Theano and CNTK backend #5
base: master
Are you sure you want to change the base?
Conversation
drauh
commented
Aug 7, 2017
- This PR is a merge of my last two PR (question regards to loss calculation #2 and Remove bias in augmented loss + fix perplexity #3)
- add compatibility with Theano and CNTK
Fix perplexity calculation From [there]{https://github.com/tensorflow/models/blob/3d792f935d652b2c7793b95aa3351a5551dc2401/tutorials/rnn/ptb/ptb_word_lm.py#L319} Fix categorical_crossentropy argument ordering * this was probably the main bug of the perplexity implementation, now it is much smaller Fix categorical_crossentropy argument ordering * Fix categorical_crossentropy argument ordering in augmented loss
* In tensorflow, keep_prob is used to parametrize dropout and in keras it is the dropout_rate so that we have dropout_rate = 1 - keep_prob * I copied also the other parameters from [here]{https://github.com/tensorflow/models/blob/3d792f935d652b2c7793b95aa3351a5551dc2401/tutorials/rnn/ptb/ptb_word_lm.py#L226} but I think that his is less important
* removing recurrent dropout * add dropout in before the softmax layer This replicates [this]{https://github.com/tensorflow/models/blob/3d792f935d652b2c7793b95aa3351a5551dc2401/tutorials/rnn/ptb/ptb_word_lm.py#L128} implementation
I ran a small test overnight on the ptb dataset with the small configuration, --aug and --tying options. Results after 15 epochs are :
|
Thank you for your pull request! Could you let me some check? About TimeDistributedYou remove the It is different from the ordinary language model implementation. Ordinary implementation of the language model is as follows (Chainer/TensorFlow/PyTorch etc). To implements this, we have to use stateful LSTM, and it is difficult (I tried to do this and you can see its result). (PyTorch code is good to understand this model) About cross_entropy orderIn the But it is so confusable so the fix is added recently. But it is not deployed to PyPI (will be included 2.0.7). I want to fix this order so I'll merge your request after above fix is available.
|
About TimeDistributedI thought that TimeDistributed(Dense()) was needed when keras Dense layer could only be applied to rank 2 tensor. So that to my understanding when applying to a rank 3 tensor :
And that's what the said here. However I don't understand the interaction with statefulness of LSTM. About cross_entropy orderGood catch! I was using the master version and didn't pay enough attention... About stateful LSTMMBecause you used sequence size of 20, it means that you are doing BPTT up to 20 timesteps which seems enough. But this help to better initialize the LSTM cell state. Some research actually use Conv1D to do language modeling, which is faster because easily parallelizable. And it is compatible with weight tying and augmented loss, so I want to test it when I have the time. |