Problems on MoS-AWD-LSTM-LM #2

takase · 2019-02-07T08:58:10Z

Hi,
I'm trying to reproduce your paper results but I haven't achieved them.
I tried two settings for Penn Treebank.
The first one is based on descriptions of the paper:

python main.py --batch_size 12 --data penn --dropouti 0.55 --dropouth 0.2 --seed 141 --nonmono 15 --epoch 500 --dropoutl 0.3 --lr 20 --nhid 960 --nhidlast 620 --emsize 280 --n_experts 15 --single_gpu --moment --adv

In this setting, I achieved 59.47 in valid and 56.96 in test but the scores reported in the paper are 57.55 in valid and 55.23 in test.

The second one is using the MoS setting described in its github page (https://github.com/zihangdai/mos) with FRAGE:

python main.py --batch_size 12 --data penn --dropouti 0.4 --dropouth 0.225 --seed 28 --nonmono 15 --epoch 500 --dropoutl 0.29 --lr 20 --nhid 960 --nhidlast 620 --emsize 280 --n_experts 15 --single_gpu --moment --adv

This setting achieved similar scores to the paper but fine-tune results were not well.
I achieved 56.11 in valid and 54.21 in test after fine-tuning but in your paper scores are 55.52 in valid and 53.31 in test.

So, could you tell me any idea for reproduction?
I used P100 GPU and CUDA 8.0.

The text was updated successfully, but these errors were encountered:

ChengyueGongR · 2019-02-22T12:49:56Z

Fix the bugs, reproduce the result.

I used P100 GPU, CUDA 9.0, pytorch 0.4.1.

the command should be:

python3 -u main.py --data data/penn --dropoutl 0.29 --dropouth 0.225 --dropouti 0.25 --gaussian 0.15 --dropouth 0.225 --seed 28 --batch_size 12 --lr 20.0 --epoch 600 --nhid 960 --nhidlast 620 --emsize 280 --n_experts 15 --save PTB --single_gpu --moment --adv --switch 160
python3 -u finetune.py --data data/penn --dropoutl 0.29 --dropouti 0.25 --gaussian 0.15 --dropouth 0.225 --seed 28 --batch_size 12 --lr 25.0 --epoch 1000 --nhid 960 --emsize 280 --n_experts 15 --save PATH_TO_FOLDER --single_gpu
cp PATH_TO_FOLDER/finetune_model.pt PATH_TO_FOLDER/model.pt and run python3 -u finetune.py --data data/penn --dropoutl 0.29 --dropouti 0.25 --gaussian 0.15 --dropouth 0.225 --seed 28 --batch_size 12 --lr 25.0 --epoch 1000 --nhid 960 --emsize 280 --n_experts 15 --save PATH_TO_FOLDER --single_gpu
cp PATH_TO_FOLDER/finetune_model.pt PATH_TO_FOLDER/model.pt and run python3 -u finetune.py --data data/penn --dropoutl 0.29 --dropouti 0.4 --gaussian 0.0 --dropouth 0.225 --seed 28 --batch_size 12 --lr 25.0 --epoch 1000 --nhid 960 --emsize 280 --n_experts 15 --save PATH_TO_FOLDER --single_gpu

takase · 2019-05-01T09:46:47Z

Thank you for uploading pre-trained model.
Did you find hyper-parameters for dynamic evaluation?
I applied grid search to the pre-trained model, but could not achieved good result.
I tried hyper-parameters as follows:
lrlist = [0.00002, 0.00003,0.00004,0.00005,0.00006,0.00007,0.0001]
lamblist = [0.001,0.002,0.003,0.004,0.005]
In the above search space, I achieved 48.84 in valid and 48.01 in test (47.38 and 46.54 in your paper).

ChengyueGongR · 2019-05-02T01:52:12Z

First, I do the grid-search (on lr, lamb and bptt) and only improve the test PPL from 47.9(using the original hyper-parameter of MoS) to 47.7.
Then, I guess that the problem should be the version of pytorch. In pytorch 0.3 or early version, we can call model.eval() and calculate the gradient in dynamiceval.py. However, in pytorch 0.4, we can only call model.train() to do this. Although I have do some changes, there may be still a big gap.
Therefore, I roll back to pytorch 0.2, and do the dynamic evaluation with the original hyper-parameter of MoS, and we can get 47.3 test PPL.
Therefore, I believe the main reason is the change of the pytorch version, and you can do grid search on pytorch 0.2.
To use pytorch 0.2, you can add a patch in related code:

try:
    torch._utils._rebuild_tensor_v2
except AttributeError:
    def _rebuild_tensor_v2(storage, storage_offset, size, stride, requires_grad, backward_hooks):
        tensor = torch._utils._rebuild_tensor(storage, storage_offset, size, stride)
        tensor.requires_grad = requires_grad
        tensor._backward_hooks = backward_hooks
        return tensor
    torch._utils._rebuild_tensor_v2 = _rebuild_tensor_v2
torch.nn.Module.dump_patches = True

Also, do some minor changes on other parts.
4) Do not use the original search space in dynamic-evaluation. Try to set a search space around the original hyper-parameter of MoS.
I also upload a pretrained model trained with pytorch 0.2 version . It can achieve test PPL 47.0 with the original hyper-parameter of MoS.
5) I will think about how to eliminate the gap in performance between pytorch 0.4 and early versions. If you have any advice, please feel free to contact me.

This comment has been minimized.

Sign in to view

ChengyueGongR added the good first issue Good for newcomers label May 2, 2019

ChengyueGongR changed the title ~~Reproduce paper results~~ Problems on MoS-AWD-LSTM-LM May 2, 2019

tonytan48 mentioned this issue Nov 17, 2020

Performance discrepancy for torch 0.2.0 and 0.4.0 zihangdai/mos#14

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Problems on MoS-AWD-LSTM-LM #2

Problems on MoS-AWD-LSTM-LM #2

takase commented Feb 7, 2019

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

ChengyueGongR commented Feb 22, 2019 •

edited

Loading

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

takase commented May 1, 2019

ChengyueGongR commented May 2, 2019 •

edited

Loading

Problems on MoS-AWD-LSTM-LM #2

Problems on MoS-AWD-LSTM-LM #2

Comments

takase commented Feb 7, 2019

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

ChengyueGongR commented Feb 22, 2019 • edited Loading

Fix the bugs, reproduce the result.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

takase commented May 1, 2019

ChengyueGongR commented May 2, 2019 • edited Loading

ChengyueGongR commented Feb 22, 2019 •

edited

Loading

ChengyueGongR commented May 2, 2019 •

edited

Loading