-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Complete seq2seq for fluid #56
Conversation
[WIP] Add seq2seq model for fluid.
fluid/machine_translation.py
Outdated
|
||
parser = argparse.ArgumentParser(description=__doc__) | ||
parser.add_argument( | ||
"--word_vector_dim", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
embedding_dim better?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Followed.
import distutils.util | ||
|
||
import paddle.v2 as paddle | ||
import paddle.v2.fluid as fluid |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Benchmark as a demo code, we'd like only have
+import paddle.v2 as paddle
+import paddle.v2.fluid as fluid
just like tensorflow
import tensorflow as tf
nothing else.
fluid/machine_translation.py
Outdated
help="The dictionary capacity. Dictionaries of source sequence and " | ||
"target dictionary have same capacity. (default: %(default)d)") | ||
parser.add_argument( | ||
"--pass_number", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for unity, pass_num
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Followed.
fluid/machine_translation.py
Outdated
type=str, | ||
default='train', | ||
choices=['train', 'infer'], | ||
help="Do training or inference. (default: %(default)s)") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, followed.
fluid/machine_translation.py
Outdated
target_dict_dim, | ||
is_generating=False, | ||
beam_size=3, | ||
max_length=250): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
leave max_length, beam_size default value to argparse.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Followed.
fluid/machine_translation.py
Outdated
"""Construct a seq2seq network.""" | ||
feeding_list = ["source_sequence", "target_sequence", "label_sequence"] | ||
|
||
def bi_lstm_encoder(input_seq, size): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here maybe we need a notation.
the lstm unit has 4 parameters, hidden, memory_cell, ...
so need to multiply by 4.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add detailed comments.
fluid/machine_translation.py
Outdated
size=size * 4, | ||
act='tanh') | ||
forward, _ = fluid.layers.dynamic_lstm( | ||
input=input_forward_proj, size=size * 4) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
double check dynamic_lstm need 4. I roughly remember it has been done inside the lstm layer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
name a gate_size is a good idea.
https://github.com/pytorch/pytorch/blob/master/torch/nn/modules/rnn.py#L27
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
double check dynamic_lstm need 4. I roughly remember it has been done inside the lstm layer.
name a gate_size is a good idea.
Agree.
fluid/machine_translation.py
Outdated
default=16, | ||
help="The sequence number of a batch data. (default: %(default)d)") | ||
parser.add_argument( | ||
"--dict_size", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This value is indicated by the dataset. should it be an argument?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems this is an argument:
https://github.com/PaddlePaddle/Paddle/blob/develop/python/paddle/v2/dataset/wmt14.py#L106
fluid/machine_translation.py
Outdated
"--max_length", | ||
type=int, | ||
default=250, | ||
help="The max length of sequence when doing generation. " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
max -> maximum
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Followed.
fluid/machine_translation.py
Outdated
"--batch_size", | ||
type=int, | ||
default=16, | ||
help="The sequence number of a batch data. (default: %(default)d)") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
of a mini-batch
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Followed.
"--encoder_size", | ||
type=int, | ||
default=512, | ||
help="The size of encoder bi-rnn unit. (default: %(default)d)") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
size -> dimension
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinks both are ok, but size is shorter.
"--decoder_size", | ||
type=int, | ||
default=512, | ||
help="The size of decoder rnn unit. (default: %(default)d)") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
size -> dimension
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinks both are ok, but size is shorter.
fluid/machine_translation.py
Outdated
"--use_gpu", | ||
type=distutils.util.strtobool, | ||
default=True, | ||
help="Whether use gpu. (default: %(default)d)") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to use
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, followed.
|
||
def lstm_decoder_with_attention(target_embedding, encoder_vec, encoder_proj, | ||
decoder_boot, decoder_size): | ||
def simple_attention(encoder_vec, encoder_proj, decoder_state): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The attention mechanism is wrong. Where is 'tanh' operation which appears in original formula?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Didn't catch your point, why tanh
is necessary for attention? There are several kind of attention mechanisms. Please refer to https://github.com/PaddlePaddle/Paddle/blob/9bfa3013891cf3da832307894acff919d6705cee/python/paddle/trainer_config_helpers/networks.py#L1400
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://github.com/PaddlePaddle/Paddle/blob/9bfa3013891cf3da832307894acff919d6705cee/python/paddle/trainer_config_helpers/networks.py#L1473
Here, the mixed_layer performs tanh
.
And for attention mechanism in Neural Machine Translation By Jointly Learning To Align and Translate
, tanh
is used. Is this what you want to realize?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you think it's wrong to apply linear activation ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To keep consistent, will apply tanh
in next PR. Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
||
fetch_outs = exe.run( | ||
inference_program, | ||
feed=dict(zip(*[feeding_list, (src_seq, trg_seq, lbl_seq)])), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please fix this issue.
even dict is better than *zip()
function sugar.
Resolves #55
Resolves #22