Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge PR #2 and #3 and add compatibility with Theano and CNTK backend #5

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

drauh
Copy link

@drauh drauh commented Aug 7, 2017

drauh added 5 commits August 7, 2017 18:30
Fix perplexity calculation

From [there]{https://github.com/tensorflow/models/blob/3d792f935d652b2c7793b95aa3351a5551dc2401/tutorials/rnn/ptb/ptb_word_lm.py#L319}
Fix categorical_crossentropy argument ordering

* this was probably the main bug of the perplexity implementation, now it is much smaller
Fix categorical_crossentropy argument ordering

* Fix categorical_crossentropy argument ordering in augmented loss
* In tensorflow, keep_prob is used to parametrize dropout and in keras it is the dropout_rate so that we have dropout_rate = 1 - keep_prob
* I copied also the other parameters from [here]{https://github.com/tensorflow/models/blob/3d792f935d652b2c7793b95aa3351a5551dc2401/tutorials/rnn/ptb/ptb_word_lm.py#L226} but I think that his is less important
* removing recurrent dropout
* add dropout in before the softmax layer

This replicates [this]{https://github.com/tensorflow/models/blob/3d792f935d652b2c7793b95aa3351a5551dc2401/tutorials/rnn/ptb/ptb_word_lm.py#L128} implementation
@drauh
Copy link
Author

drauh commented Aug 8, 2017

I ran a small test overnight on the ptb dataset with the small configuration, --aug and --tying options. Results after 15 epochs are :

  • val_acc: 0.2045
  • val_perplexity: 193.1504

@icoxfog417
Copy link
Owner

icoxfog417 commented Aug 8, 2017

Thank you for your pull request! Could you let me some check?

About TimeDistributed

You remove the TimeDistributed in the one_hot_model.py but it is necessary for many-to-many(2) type network.

image

It is different from the ordinary language model implementation.

Ordinary implementation of the language model is as follows (Chainer/TensorFlow/PyTorch etc). To implements this, we have to use stateful LSTM, and it is difficult (I tried to do this and you can see its result).

image

(PyTorch code is good to understand this model)

About cross_entropy order

In the K.backend, the argument order is pred, true. You can confirm it at here.

But it is so confusable so the fix is added recently. But it is not deployed to PyPI (will be included 2.0.7).

I want to fix this order so I'll merge your request after above fix is available.

PerplexityLogger is a good idea! I want to use it in another project.

@drauh
Copy link
Author

drauh commented Aug 8, 2017

About TimeDistributed

I thought that TimeDistributed(Dense()) was needed when keras Dense layer could only be applied to rank 2 tensor. So that to my understanding when applying to a rank 3 tensor :

  • Dense == Conv1D(kernel_size=1) = TimeDistrbuted(Dense())

And that's what the said here. However I don't understand the interaction with statefulness of LSTM.

About cross_entropy order

Good catch! I was using the master version and didn't pay enough attention...

About stateful LSTMM

Because you used sequence size of 20, it means that you are doing BPTT up to 20 timesteps which seems enough. But this help to better initialize the LSTM cell state.

Some research actually use Conv1D to do language modeling, which is faster because easily parallelizable. And it is compatible with weight tying and augmented loss, so I want to test it when I have the time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants