Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problems on MoS-AWD-LSTM-LM #2

Open
takase opened this issue Feb 7, 2019 · 14 comments
Open

Problems on MoS-AWD-LSTM-LM #2

takase opened this issue Feb 7, 2019 · 14 comments
Labels
good first issue Good for newcomers

Comments

@takase
Copy link

takase commented Feb 7, 2019

Hi,
I'm trying to reproduce your paper results but I haven't achieved them.
I tried two settings for Penn Treebank.
The first one is based on descriptions of the paper:

python main.py --batch_size 12 --data penn --dropouti 0.55 --dropouth 0.2 --seed 141 --nonmono 15 --epoch 500 --dropoutl 0.3 --lr 20 --nhid 960 --nhidlast 620 --emsize 280 --n_experts 15 --single_gpu --moment --adv

In this setting, I achieved 59.47 in valid and 56.96 in test but the scores reported in the paper are 57.55 in valid and 55.23 in test.

The second one is using the MoS setting described in its github page (https://github.com/zihangdai/mos) with FRAGE:

python main.py --batch_size 12 --data penn --dropouti 0.4 --dropouth 0.225 --seed 28 --nonmono 15 --epoch 500 --dropoutl 0.29 --lr 20 --nhid 960 --nhidlast 620 --emsize 280 --n_experts 15 --single_gpu --moment --adv

This setting achieved similar scores to the paper but fine-tune results were not well.
I achieved 56.11 in valid and 54.21 in test after fine-tuning but in your paper scores are 55.52 in valid and 53.31 in test.

So, could you tell me any idea for reproduction?
I used P100 GPU and CUDA 8.0.

@ChengyueGongR

This comment has been minimized.

@ChengyueGongR

This comment has been minimized.

@takase

This comment has been minimized.

@ChengyueGongR

This comment has been minimized.

@takase

This comment has been minimized.

@takase

This comment has been minimized.

@ChengyueGongR
Copy link
Owner

ChengyueGongR commented Feb 22, 2019

Fix the bugs, reproduce the result.

I used P100 GPU, CUDA 9.0, pytorch 0.4.1.

the command should be:

  • python3 -u main.py --data data/penn --dropoutl 0.29 --dropouth 0.225 --dropouti 0.25 --gaussian 0.15 --dropouth 0.225 --seed 28 --batch_size 12 --lr 20.0 --epoch 600 --nhid 960 --nhidlast 620 --emsize 280 --n_experts 15 --save PTB --single_gpu --moment --adv --switch 160
  • python3 -u finetune.py --data data/penn --dropoutl 0.29 --dropouti 0.25 --gaussian 0.15 --dropouth 0.225 --seed 28 --batch_size 12 --lr 25.0 --epoch 1000 --nhid 960 --emsize 280 --n_experts 15 --save PATH_TO_FOLDER --single_gpu
  • cp PATH_TO_FOLDER/finetune_model.pt PATH_TO_FOLDER/model.pt and run python3 -u finetune.py --data data/penn --dropoutl 0.29 --dropouti 0.25 --gaussian 0.15 --dropouth 0.225 --seed 28 --batch_size 12 --lr 25.0 --epoch 1000 --nhid 960 --emsize 280 --n_experts 15 --save PATH_TO_FOLDER --single_gpu
  • cp PATH_TO_FOLDER/finetune_model.pt PATH_TO_FOLDER/model.pt and run python3 -u finetune.py --data data/penn --dropoutl 0.29 --dropouti 0.4 --gaussian 0.0 --dropouth 0.225 --seed 28 --batch_size 12 --lr 25.0 --epoch 1000 --nhid 960 --emsize 280 --n_experts 15 --save PATH_TO_FOLDER --single_gpu

@takase

This comment has been minimized.

@takase

This comment has been minimized.

@ChengyueGongR

This comment has been minimized.

@simtony

This comment has been minimized.

@ChengyueGongR

This comment has been minimized.

@takase
Copy link
Author

takase commented May 1, 2019

Thank you for uploading pre-trained model.
Did you find hyper-parameters for dynamic evaluation?
I applied grid search to the pre-trained model, but could not achieved good result.
I tried hyper-parameters as follows:
lrlist = [0.00002, 0.00003,0.00004,0.00005,0.00006,0.00007,0.0001]
lamblist = [0.001,0.002,0.003,0.004,0.005]
In the above search space, I achieved 48.84 in valid and 48.01 in test (47.38 and 46.54 in your paper).

@ChengyueGongR
Copy link
Owner

ChengyueGongR commented May 2, 2019

  1. First, I do the grid-search (on lr, lamb and bptt) and only improve the test PPL from 47.9(using the original hyper-parameter of MoS) to 47.7.
  2. Then, I guess that the problem should be the version of pytorch. In pytorch 0.3 or early version, we can call model.eval() and calculate the gradient in dynamiceval.py. However, in pytorch 0.4, we can only call model.train() to do this. Although I have do some changes, there may be still a big gap.
    Therefore, I roll back to pytorch 0.2, and do the dynamic evaluation with the original hyper-parameter of MoS, and we can get 47.3 test PPL.
    Therefore, I believe the main reason is the change of the pytorch version, and you can do grid search on pytorch 0.2.
  3. To use pytorch 0.2, you can add a patch in related code:
try:
    torch._utils._rebuild_tensor_v2
except AttributeError:
    def _rebuild_tensor_v2(storage, storage_offset, size, stride, requires_grad, backward_hooks):
        tensor = torch._utils._rebuild_tensor(storage, storage_offset, size, stride)
        tensor.requires_grad = requires_grad
        tensor._backward_hooks = backward_hooks
        return tensor
    torch._utils._rebuild_tensor_v2 = _rebuild_tensor_v2
torch.nn.Module.dump_patches = True

Also, do some minor changes on other parts.
4) Do not use the original search space in dynamic-evaluation. Try to set a search space around the original hyper-parameter of MoS.
I also upload a pretrained model trained with pytorch 0.2 version . It can achieve test PPL 47.0 with the original hyper-parameter of MoS.
5) I will think about how to eliminate the gap in performance between pytorch 0.4 and early versions. If you have any advice, please feel free to contact me.

@ChengyueGongR ChengyueGongR added the good first issue Good for newcomers label May 2, 2019
@ChengyueGongR ChengyueGongR changed the title Reproduce paper results Problems on MoS-AWD-LSTM-LM May 2, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

3 participants