-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
API Simplification #10248
Comments
The Current Problems
For more details, let us take a look at the current example program
|
A ProposalThis idea came from @emailweixu . Here is a brief description with example code.
|
Following the proposal, an example train script could be import paddle.fluid as paddle
import paddle.v2.dataset as dataset
def conv_network():
image = fluid.layers.data(name='image', shape=[1, 28, 28])
label = fluid.layers.data(name='label', shape=[1], dtype='int64')
hidden = fluid.layers.simple_img_conv_pool(image,
num_filters=32, filter_size=3,
pool_size=3, pool_stride=1, act='relu')
hidden = fluid.layers.dropout(hidden, 0.1)
hidden = fluid.layers.batch_norm(hidden)
prediction = fluid.layers.fc(hidden, size=10, act='softmax')
loss = fluid.layers.cross_entropy(prediction, label)
return loss
def main():
trainer = fluid.Trainer(conv_network, optimizer=fluid.optimizer.SGD())
def event_handler(event):
if isinstance(event, fluid.EndIteration):
print event.metrics
elif isinstance(event, fluid.EndPass):
test_metrics = trainer.test(reader=dataset.mnist.test())
print test_metrics
trainer.train(reader=dataset.mnist.train(), num_pass=100)
if __name__ == '__main__':
main() |
For common models, this skeleton looks good. (Still need more polish and thinking) Overall, I think we need to have 2 level of APIs: high level and low level high level API simplifies the network construction for normal models, such as ResNet, LSTM. however, we need to be sure that user still has the flexibility of building complex models One last point: When our design is more stable, we need ask our modeling team member (qingqing, yaming, yibing, etc) for advice. We need to make sure our API has a good coverage of current |
I think the key problem makes current Fluid hard to use is that users can hardly understand our 'program'. Furthermore, in Fluid most features require more than one program. For example, if a user needs to do inference on test data every 10 training batches, he has to build and maintain two programs: the one for training and another one for test. Most users know neither why there should be two programs nor how to correctly build them. In my view, the most exciting point of this issue's proposal is to warp user's net config in a function and then pass the function to some other objects. Based on this idea, maybe we can introduce a conception of In this method, users no longer need to understand programs, for they will not directly use them anymore. By the way, in the proposed design, how to support GAN? |
@JiayiFeng It seems that we need to allow users to write the train-loop. (I was taking the PyTorch version as a reference.) I am afraid that this simplified API cannot make it, and we might want it in the next milestone. What do you think? |
Clearly, this high level API cannot satisfy all needs (e.g. reinforcement learning, GAN). The current V2 API cannot either. It might be possible to tweak a little bit (say, combining model and optimizer as one to pass to trainer) to make GAN possible. We need to think about to what level we can clean up the low-level API to support user train-loop in python. |
@reyoung Do you have any suggestion on how Inference will work with the paradigm that you have shared? I am not sure if this API style will be compatible with the inference engine work done in Q1. |
How about this? I think it can support GAN: import paddle.fluid as fluid
import paddle.v2.dataset as dataset
def conv_network():
image = fluid.layers.data(name='image', shape=[1, 28, 28])
label = fluid.layers.data(name='label', shape=[1], dtype='int64')
hidden = fluid.layers.simple_img_conv_pool(image,
num_filters=32, filter_size=3,
pool_size=3, pool_stride=1, act='relu')
hidden = fluid.layers.dropout(hidden, 0.1)
hidden = fluid.layers.batch_norm(hidden)
prediction = fluid.layers.fc(hidden, size=10, act='softmax')
loss = fluid.layers.cross_entropy(prediction, label)
return loss
def train_conv_network():
loss = conv_network()
sgd_optimizer = fluid.optimizer.SGD(learning_rate=0.001)
sgd_optimizer.minimize(loss)
return loss
def main():
# `fluid.Compile` creates a program,
# the program owns the program desc, and a single scope.
# Because the scope is shared by different methods (`conv_network`, `train_conv_network`),
# GAN should be supported.
program = fluid.Compile(conv_network, train_conv_network)
for i in range(0, 100):
for train_data in dataset.mnist.train():
loss = program.run("train_conv_network", {"image": train_data[0], "label": train_data[1]})
print("train loss", loss)
for test_data in dataset.mnist.test():
loss = program.run("conv_network", {"image": test_data[0], "label": test_data[1]})
print("test loss", loss)
if __name__ == '__main__':
main() |
@helinwang how do your proposal handle distributed training? |
@emailweixu for trainer, the TRAINING_ROLE=TRAINER PSERVERS=127.0.0.1:8000 python train.py For pserver, the user can do something like: TRAINING_ROLE=PSERVER paddle run --file train.py --main train_conv_network The key is that the entry point is no longer Python, instead it's the |
@emailweixu maybe a simpler way to start pserver is:
And now |
@helinwang The problem is that pserver does not have a loop over data generator. With your design, user code has to do different things depends on whether it is pserver or trainer. |
Awesome discussion. I have some naive thoughts, for some complicated networks, we should well handle the naming things. For example, with net.module('generator') as generator:
data1
...
with network.name_scope('scope') as sub_scope:
fc1 = fluid.layer.fc(...)
...
with net.module('discrimitor') as discrimitor:
data1
data2
...
fc2 = fluid.layer.fc(input=generator.sub_scope.fc1)
...
|
@pkuyym |
@reyoung Thanks for your reminder, I paste a snippet here: with fluid.unique_name.guard():
train_file_obj = fluid.layers.open_files(
filenames=TRAIN_FILES,
pass_num=pass_num,
shapes=[[-1, 1], [-1, 1]],
lod_levels=[1, 0],
dtypes=['int64', 'int64'],
thread_num=1) I think it may make the API more friendly to extend current # add prefix to make debug easier
with fluid.unique_name.guard('prefix_1') as scope_1:
fc = fluid.layers.fc(...)
with fluid.unique_name.guard('prefix_2') as scope_2:
fc = fluid.layers.fc(input=scope_1.fc) # very convenient to refer fc in scope_1 |
@wangkuiyi In my opinion, even in GANs, multiple nets have a certain running order. So maybe we can allow the trainer takes more than one net configs(in the form of a list), generates multiple sets of programs, and use a for-loop inside the trainer to execute them in turn. This idea is similar to @helinwang 's proposal. However, @helinwang proposes to compile all nets into a single program. I tend to assign every net with an independent program. |
@emailweixu thanks for pointing out, that is correct. Another possibility is we do it in Still, it's somewhat not satisfactory because the users may have done something in the Python code before The most clean way I think is to "extract" out the Fluid program definition code from the Python glue code. And run only the Fluid program definition code. According to this logic, one way would be doing: # assuming train.py is in the same folder
paddle run_pserver --main train.train_conv_network Internally import os
import paddle.fluid as fluid
import train
os.environ['TRAINING_ROLE'] = "PSERVER"
program = fluid.Compile(train.train_conv_network) # transpile happens inside
program.run() |
All, we did some thinking about how inference can be done. Please review our proposal: import paddle.fluid as paddle
import paddle.v2.dataset as dataset
def inference_network():
image = fluid.layers.data(name='image', shape=[1, 28, 28])
label = fluid.layers.data(name='label', shape=[1], dtype='int64')
hidden = fluid.layers.simple_img_conv_pool(image,
num_filters=32, filter_size=3,
pool_size=3, pool_stride=1, act='relu')
hidden = fluid.layers.dropout(hidden, 0.1)
hidden = fluid.layers.batch_norm(hidden)
prediction = fluid.layers.fc(hidden, size=10, act='softmax')
return prediction
def train_network():
prediction = inference_network()
loss = fluid.layers.cross_entropy(prediction, label)
return loss
def main():
params = fluid.Params('./params')
# If params is not None it will be loaded to Trainer
trainer = fluid.Trainer(train_network, optimizer=fluid.optimizer.SGD(), params=params)
def event_handler(event):
if isinstance(event, fluid.EndIteration):
print event.metrics
elif isinstance(event, fluid.EndPass):
test_metrics = trainer.test(reader=dataset.mnist.test())
print test_metrics
# Train over 100 epochs
trainer.train(reader=dataset.mnist.train(), 100, event_handler=event_handler)
inferencer = fluid.Inferencer(inference_network, trainer.params)
prediction = inferencer.infer({ 'image': <IMAGE_DATA>})
if __name__ == '__main__':
main() |
When we were trying to implement the Param class, we realized it was pretty ugly to implement with a share scope. Therefore we update the syntax to the following. import paddle.fluid as paddle
import paddle.v2.dataset as dataset
def inference_program():
image = fluid.layers.data(name='image', shape=[1, 28, 28])
hidden = fluid.layers.simple_img_conv_pool(image,
num_filters=32, filter_size=3,
pool_size=3, pool_stride=1, act='relu')
hidden = fluid.layers.dropout(hidden, 0.1)
hidden = fluid.layers.batch_norm(hidden)
prediction = fluid.layers.fc(hidden, size=10, act='softmax')
return prediction
def train_program():
label = fluid.layers.data(name='label', shape=[1], dtype='int64')
prediction = inference_program()
cost = fluid.layers.cross_entropy(prediction, label)
avg_cost = fluid.layers.mean(cost)
return avg_cost
def main():
place = fluid.CUDAPlace(0) if use_cuda else fluid.CPUPlace()
trainer = fluid.Trainer(program_func=train_program,
optimizer=fluid.optimizer.SGD(),
param_path="image.model",
place=place)
def event_handler(event):
if isinstance(event, fluid.EndEpochEvent):
pass
elif isinstance(event, fluid.EndStepEvent):
test_metrics = trainer.test(reader=test_reader)
pass
trainer.train(num_epochs=1,
event_handler=event_handler,
reader=train_reader,
feed_order=['image', 'label'])
inferencer = fluid.Inferencer(inference_program, param_path="image.model", place=place)
prediction = inferencer.infer({'image': < IMAGE_DATA >})
if __name__ == '__main__':
main() I also noticed there is another design. The change is to have the def main():
place = fluid.CUDAPlace(0) if use_cuda else fluid.CPUPlace()
trainer = fluid.Trainer(program_func=train_program,
infer_func=inference_program,
optimizer=fluid.optimizer.SGD(),
param_path="image.model",
place=place)
def event_handler(event):
if isinstance(event, fluid.EndEpochEvent):
pass
elif isinstance(event, fluid.EndStepEvent):
test_metrics = trainer.test(reader=test_reader)
pass
trainer.train(num_epochs=1,
event_handler=event_handler,
reader=train_reader,
feed_order=['image', 'label'])
inferencer = fluid.Inferencer(param_path="image.model", place=place)
prediction = inferencer.infer({'image': < IMAGE_DATA >}) |
Latest Syntax import paddle.fluid as paddle
import paddle.v2.dataset as dataset
def inference_program():
image = fluid.layers.data(name='image', shape=[1, 28, 28])
hidden = fluid.layers.simple_img_conv_pool(image,
num_filters=32, filter_size=3,
pool_size=3, pool_stride=1, act='relu')
hidden = fluid.layers.dropout(hidden, 0.1)
hidden = fluid.layers.batch_norm(hidden)
prediction = fluid.layers.fc(hidden, size=10, act='softmax')
return prediction
def train_program():
label = fluid.layers.data(name='label', shape=[1], dtype='int64')
prediction = inference_program()
cost = fluid.layers.cross_entropy(prediction, label)
avg_cost = fluid.layers.mean(cost)
return avg_cost
def main():
place = fluid.CUDAPlace(0) if use_cuda else fluid.CPUPlace()
trainer = fluid.Trainer(program_func=train_program,
optimizer=fluid.optimizer.SGD(),
param_path="image.model",
place=place)
def event_handler(event):
if isinstance(event, fluid.EndEpochEvent):
trainer.save_inference_model("image.model")
elif isinstance(event, fluid.EndStepEvent):
test_metrics = trainer.test(reader=test_reader)
pass
trainer.train(num_epochs=1,
event_handler=event_handler,
reader=train_reader,
feed_order=['image', 'label'])
inferencer = fluid.Inferencer(
infer_func=inference_program,
param_path="image.model", place=place)
prediction = inferencer.infer({'image': < IMAGE_DATA >})
if __name__ == '__main__':
main() |
您好,此issue在近一个月内暂无更新,我们将于今天内关闭。若在关闭后您仍需跟进提问,可重新开启此问题,我们将在24小时内回复您。因关闭带来的不便我们深表歉意,请您谅解~感谢您对PaddlePaddle的支持! |
Our current implementation of Fluid is incomplete and exposed too many details. A consequence is that Fluid applications are lengthy and incomprehensive.
Let us target for a cleanup and simplification
The text was updated successfully, but these errors were encountered: