Merge PR #2 and #3 and add compatibility with Theano and CNTK backend #5

drauh · 2017-08-07T18:01:23Z

This PR is a merge of my last two PR (question regards to loss calculation #2 and Remove bias in augmented loss + fix perplexity #3)
add compatibility with Theano and CNTK

Fix perplexity calculation From [there]{https://github.com/tensorflow/models/blob/3d792f935d652b2c7793b95aa3351a5551dc2401/tutorials/rnn/ptb/ptb_word_lm.py#L319} Fix categorical_crossentropy argument ordering * this was probably the main bug of the perplexity implementation, now it is much smaller Fix categorical_crossentropy argument ordering * Fix categorical_crossentropy argument ordering in augmented loss

* In tensorflow, keep_prob is used to parametrize dropout and in keras it is the dropout_rate so that we have dropout_rate = 1 - keep_prob * I copied also the other parameters from [here]{https://github.com/tensorflow/models/blob/3d792f935d652b2c7793b95aa3351a5551dc2401/tutorials/rnn/ptb/ptb_word_lm.py#L226} but I think that his is less important

* removing recurrent dropout * add dropout in before the softmax layer This replicates [this]{https://github.com/tensorflow/models/blob/3d792f935d652b2c7793b95aa3351a5551dc2401/tutorials/rnn/ptb/ptb_word_lm.py#L128} implementation

drauh · 2017-08-08T08:21:22Z

I ran a small test overnight on the ptb dataset with the small configuration, --aug and --tying options. Results after 15 epochs are :

val_acc: 0.2045
val_perplexity: 193.1504

icoxfog417 · 2017-08-08T09:00:44Z

Thank you for your pull request! Could you let me some check?

About TimeDistributed

You remove the TimeDistributed in the one_hot_model.py but it is necessary for many-to-many(2) type network.

It is different from the ordinary language model implementation.

Ordinary implementation of the language model is as follows (Chainer/TensorFlow/PyTorch etc). To implements this, we have to use stateful LSTM, and it is difficult (I tried to do this and you can see its result).

(PyTorch code is good to understand this model)

About cross_entropy order

In the K.backend, the argument order is pred, true. You can confirm it at here.

But it is so confusable so the fix is added recently. But it is not deployed to PyPI (will be included 2.0.7).

I want to fix this order so I'll merge your request after above fix is available.

PerplexityLogger is a good idea! I want to use it in another project.

drauh · 2017-08-08T09:47:52Z

About TimeDistributed

I thought that TimeDistributed(Dense()) was needed when keras Dense layer could only be applied to rank 2 tensor. So that to my understanding when applying to a rank 3 tensor :

Dense == Conv1D(kernel_size=1) = TimeDistrbuted(Dense())

And that's what the said here. However I don't understand the interaction with statefulness of LSTM.

About cross_entropy order

Good catch! I was using the master version and didn't pay enough attention...

About stateful LSTMM

Because you used sequence size of 20, it means that you are doing BPTT up to 20 timesteps which seems enough. But this help to better initialize the LSTM cell state.

Some research actually use Conv1D to do language modeling, which is faster because easily parallelizable. And it is compatible with weight tying and augmented loss, so I want to test it when I have the time.

drauh added 5 commits August 7, 2017 18:30

Removing recurrent dropout

e3f491b

* removing recurrent dropout * add dropout in before the softmax layer This replicates [this]{https://github.com/tensorflow/models/blob/3d792f935d652b2c7793b95aa3351a5551dc2401/tutorials/rnn/ptb/ptb_word_lm.py#L128} implementation

work on theano compatibility

9080ed0

compatible with all backend

81cbbe3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Merge PR #2 and #3 and add compatibility with Theano and CNTK backend #5

Merge PR #2 and #3 and add compatibility with Theano and CNTK backend #5

drauh commented Aug 7, 2017

drauh commented Aug 8, 2017

icoxfog417 commented Aug 8, 2017 •

edited

Loading

drauh commented Aug 8, 2017 •

edited

Loading

Merge PR #2 and #3 and add compatibility with Theano and CNTK backend #5

Are you sure you want to change the base?

Merge PR #2 and #3 and add compatibility with Theano and CNTK backend #5

Conversation

drauh commented Aug 7, 2017

drauh commented Aug 8, 2017

icoxfog417 commented Aug 8, 2017 • edited Loading

About TimeDistributed

About cross_entropy order

drauh commented Aug 8, 2017 • edited Loading

About TimeDistributed

About cross_entropy order

About stateful LSTMM

icoxfog417 commented Aug 8, 2017 •

edited

Loading

drauh commented Aug 8, 2017 •

edited

Loading