-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Problems on MoS-AWD-LSTM-LM #2
Comments
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
Fix the bugs, reproduce the result.I used P100 GPU, CUDA 9.0, pytorch 0.4.1. the command should be:
|
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
Thank you for uploading pre-trained model. |
Also, do some minor changes on other parts. |
Hi,
I'm trying to reproduce your paper results but I haven't achieved them.
I tried two settings for Penn Treebank.
The first one is based on descriptions of the paper:
python main.py --batch_size 12 --data penn --dropouti 0.55 --dropouth 0.2 --seed 141 --nonmono 15 --epoch 500 --dropoutl 0.3 --lr 20 --nhid 960 --nhidlast 620 --emsize 280 --n_experts 15 --single_gpu --moment --adv
In this setting, I achieved 59.47 in valid and 56.96 in test but the scores reported in the paper are 57.55 in valid and 55.23 in test.
The second one is using the MoS setting described in its github page (https://github.com/zihangdai/mos) with FRAGE:
python main.py --batch_size 12 --data penn --dropouti 0.4 --dropouth 0.225 --seed 28 --nonmono 15 --epoch 500 --dropoutl 0.29 --lr 20 --nhid 960 --nhidlast 620 --emsize 280 --n_experts 15 --single_gpu --moment --adv
This setting achieved similar scores to the paper but fine-tune results were not well.
I achieved 56.11 in valid and 54.21 in test after fine-tuning but in your paper scores are 55.52 in valid and 53.31 in test.
So, could you tell me any idea for reproduction?
I used P100 GPU and CUDA 8.0.
The text was updated successfully, but these errors were encountered: