Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add unified RNN APIs #26588

Merged
merged 23 commits into from
Aug 27, 2020
Merged

Add unified RNN APIs #26588

merged 23 commits into from
Aug 27, 2020

Conversation

iclementine
Copy link

PR types

New features

PR changes

APIs

Describe

Add unified RNN APIs

  1. RNN cells: SimpleRNNCell, LSTMCell, GRUCell
  2. RNN networks: SimpleRNN, LSTM, GRU,
  3. high order RNN wrapper: RNN, BiRNN

@paddle-bot-old
Copy link

Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

guoshengCS
guoshengCS previously approved these changes Aug 26, 2020
Copy link
Contributor

@guoshengCS guoshengCS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

self.is_reverse = is_reverse
self.time_major = time_major

def forward(self, inputs, initial_states=None, sequence_length=None):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里是否要增加**kwargs

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个可以加上的,Done

outputs, final_states = F.birnn(self.cell_fw, self.cell_bw, inputs,
initial_states, sequence_length,
self.time_major)
return outputs, final_states
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

F.birnn和这里返回的final_states 是否需要像outputs一样把双向的内容给concat起来呢

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

F.birnn 是一个比较低级的接口,它不能预设两个 cell 的层数或者 hidden_size 相等,也不能预设两个 shell 的类型相等(比如说正向是 SimpleRNN, 但是反向是一个 LSTM),所以没法这样做。


outputs, final_states = F.birnn(self.cell_fw, self.cell_bw, inputs,
initial_states, sequence_length,
self.time_major)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

**kwargs是否要加上呢

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个可以加上的,Done

Copy link
Contributor

@swtkiwi swtkiwi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1、请中英文文档一起提交review
2、请附上预览图

**kwargs: Additional keyword arguments. Arguments passed to `cell.call`.

Returns:
outputs (Tensor): A (possibly nested structure of) tensor variable[s],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

variable可以删除

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

删了

cell_bw = LSTMCell(16, 32)
inputs = paddle.rand((2, 23, 16))
outputs, final_states = paddle.nn.functional.birnn(cell_fw, cell_bw, inputs)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1、示例输入一般不要用随机生成,最好是具体的例子
2、注释一下具体输出内容

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

不用随机生成也无法保证输出。

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

所以含参 Layer 相关的一律不写输入输出


Please refer to `Finding Structure in Time
<https://crl.ucsd.edu/~elman/Papers/fsit.pdf>`_ for more details.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

需要说明一下参数的默认初始化的方法。
因为RNN系列的参数初始化方法跟ParamAttr的默认的初始化方法不一样。
下同。

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

有何不一样?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

指的是默认 Uniform(-std, std) 初始化这个事情吗?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

是的呀。
ParamAttr默认是xavier吧,用户沿着这个文档找到ParamAttr的说明,会误以为rnn也是xavier初始化的。


def forward(self, inputs, states=None):
r"""
Given the input and previous atate, compute the output and update state.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo 'state`

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

hidden_size (int): The hidden size.
nonlinearity (str): The activation in the SimpleRNN cell. It can be
`tanh` or `relu`. Defaults to `tanh`.
weight_ih_attr(ParamAttr, optional): The parameter attribute for
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这几个参数的说明文档,直接说input to hidden weights ; hidden to hidden weights,会容易理解一些。
因为文档里也没找到哪儿再在解释weight_ih

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已经加上了 Parameter section 来解释参数和公式中的符号的对应关系。

`[time_steps, batch_size, ...]`. Defaults to False.

Inputs:
inputs (Tensor): A (possibly nested structure of) tensor variable[s].
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"possibly nested structure of"
这句话有些含糊。

  • 这个版本里这里就是个batch * length * input_size 的 Tensor
  • 还是说可以结合forward时的sequence_length的参数,有另外的用法?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"possibly nested structure of" 是之前就这么设计的。

我认为支持 (T, B, C) 以及 (B, T, C) 两种布局的输入。

in RNN.
initial_states (Tensor|list|tuple, optional): A (possibly nested structure of)
tensor[s], representing the initial state for the rnn cell.
If not provided, `cell.get_initial_states` would be used to produce
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get_initial_states的默认行为是?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

全 0 初始化

class BiRNN(Layer):
r"""
Wrapper for bidirectional RNN. It assembles two RNN cells by performing
forward and backward RNN separately, and concat outputs.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

第二句英语感觉让新手不容易理解。(我也没想好其他合适的说法)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wrapper for bidirectional RNN. It takes two RNN cells as parameters and 
build a bidiretional RNN. A BiRNN applies forward RNN and backward 
RNN separately and concats the outputs along the last axis.

这样?

`tanh` or `relu`. Defaults to `tanh`.
direction (str): The direction of the network. It can be "forward",
"backward" and "bidirectional". Defaults to "forward".
dropout (float): The droput probability. Dropout is applied to the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

default value?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

defaults to "forward".

"backward" and "bidirectional". Defaults to "forward".
dropout (float): The droput probability. Dropout is applied to the
input of each layer except for the first layer.
time_major (bool): Whether the first dimension of the input means the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

default value

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


for i, rnn_layer in enumerate(self):
if i > 0:
inputs = F.dropout(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

确认一下。
这里用functional 形式的dropout的话,Layer.trainLayer.evaluate能正确处理吗?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个我稍候确认一下 Layer 的 eval 行为以及 Program 的 clone 行为

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已经加上了 training 参数

"RNN",
"BiRNN",
"RNNCellBase",
"RNNCellBase.get_initial_states"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

只有base class需要加入这里吧?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

文档解析工具不知出什么问题,每个都说没有 sample code

@PaddlePaddle PaddlePaddle locked and limited conversation to collaborators Aug 26, 2020
@PaddlePaddle PaddlePaddle unlocked this conversation Aug 26, 2020
XiaoguangHu01
XiaoguangHu01 previously approved these changes Aug 26, 2020
Copy link
Contributor

@XiaoguangHu01 XiaoguangHu01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

if nonlinearity == "tanh" \
else F.relu

def forward(self, inputs, states=None):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里inputs 是不是需要改成 input?复数表示有多个Tensor

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个看怎么理解?因为 batch 本身就已经是复数。

def __init__(self,
input_size,
hidden_size,
nonlinearity="tanh",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

激活函数,建议用activation="tanh"

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已经修改

input_size,
hidden_size,
num_layers=1,
nonlinearity="tanh",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

建议用activation

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已经修改

and mostly used in RNN.
"""

def get_initial_states(self,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个API是否需要提供给开发者使用?看起来可以作为一个内部的API

Copy link
Author

@iclementine iclementine Aug 27, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

基类的接口,用户本身也是可以调用。

prev_h = paddle.randn((4, 32))

cell = paddle.nn.LSTMCell(16, 32)
y, h = cell(x, prev_h)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

返回值不对应

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已经修改。

jzhang533
jzhang533 previously approved these changes Aug 27, 2020
Copy link
Contributor

@jzhang533 jzhang533 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm
some followup:

  • make explanation to "nested structure of tensors", "padded sequence", "initial states" more clear.
  • only Base Class could be added to the white list (for CI).

raindrops2sea
raindrops2sea previously approved these changes Aug 27, 2020
@PaddlePaddle PaddlePaddle locked and limited conversation to collaborators Aug 27, 2020
@PaddlePaddle PaddlePaddle unlocked this conversation Aug 27, 2020
Copy link
Contributor

@jzhang533 jzhang533 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Copy link
Contributor

@XiaoguangHu01 XiaoguangHu01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@iclementine iclementine merged commit f408301 into PaddlePaddle:develop Aug 27, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants