Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API Simplification #10248

Closed
wangkuiyi opened this issue Apr 27, 2018 · 23 comments
Closed

API Simplification #10248

wangkuiyi opened this issue Apr 27, 2018 · 23 comments

Comments

@wangkuiyi
Copy link
Collaborator

wangkuiyi commented Apr 27, 2018

Our current implementation of Fluid is incomplete and exposed too many details. A consequence is that Fluid applications are lengthy and incomprehensive.

Let us target for a cleanup and simplification

  • Deadline: before July 5.
  • Goals:
    • Fluid applications are cleaner and more concise than v2 applications.
    • Try to make the work not conflicting with the long-term goal of Fluid -- a differentiable DL language.
@wangkuiyi
Copy link
Collaborator Author

The Current Problems

  1. Concepts like Executor should be hidden.
  2. Most complicated staff are in the train-loop, and encapsulate the loop in a train function.

For more details, let us take a look at the current example program fluid/test_fit_a_line.py, which has the following structure:

  1. Define the forward pass

    x = fluid.layers.data(name='x', shape=[13], dtype='float32')
    y_predict = fluid.layers.fc(input=x, size=1, act=None)
    y = fluid.layers.data(name='y', shape=[1], dtype='float32')
    cost = fluid.layers.square_error_cost(input=y_predict, label=y)
    avg_cost = fluid.layers.mean(cost)

  2. Generate the backward pass

    sgd_optimizer = fluid.optimizer.SGD(learning_rate=0.001)
    optimize_ops, params_grads = sgd_optimizer.minimize(avg_cost)

  3. Create the reader

    BATCH_SIZE = 20
    train_reader = paddle.batch(
    paddle.reader.shuffle(
    paddle.dataset.uci_housing.train(), buf_size=500),
    batch_size=BATCH_SIZE)

  4. Run the startup program

    exe.run(fluid.default_startup_program())

  5. Run the Python train-loop and calls the main program

    PASS_NUM = 100
    for pass_id in range(PASS_NUM):
    for data in train_reader():
    avg_loss_value, = exe.run(main_program,
    feed=feeder.feed(data),
    fetch_list=[avg_cost])

@wangkuiyi
Copy link
Collaborator Author

wangkuiyi commented Apr 27, 2018

A Proposal

This idea came from @emailweixu . Here is a brief description with example code.

  1. Let us encapsulate the forward pass into a Python function:

    def F():
       x = fluid.layers.data(...)
      ...
      avg_cost = fluid.layers.mean(...)
  2. Let us invent a standard Fluid function fluid.train, which encapsulates the creation of the reader, the train-loop, and the generation of backward pass:

    def train(F, ...):
      F()  # fills in startup_program and main_program
      exe = fluid.Executor(...)
      exe.run(startup_program)
      for iter in xrange(1000):
        exe.run(main_program)
  3. So, the users could rewrite test_fit_a_line.py as

    def F():
      x = ...
      ...
      avg_cost = ...
    
    train(F, ...)

@reyoung
Copy link
Collaborator

reyoung commented Apr 27, 2018

Following the proposal, an example train script could be

import paddle.fluid as paddle
import paddle.v2.dataset as dataset

def conv_network():
	image = fluid.layers.data(name='image', shape=[1, 28, 28])
	label = fluid.layers.data(name='label', shape=[1], dtype='int64')

	hidden = fluid.layers.simple_img_conv_pool(image, 
			num_filters=32, filter_size=3,
			pool_size=3, pool_stride=1, act='relu')
	hidden = fluid.layers.dropout(hidden, 0.1)
	hidden = fluid.layers.batch_norm(hidden)

	prediction = fluid.layers.fc(hidden, size=10, act='softmax')
	loss = fluid.layers.cross_entropy(prediction, label)

	return loss

def main():
	trainer = fluid.Trainer(conv_network, optimizer=fluid.optimizer.SGD())
	
	def event_handler(event):
		if isinstance(event, fluid.EndIteration):
			print event.metrics
		elif isinstance(event, fluid.EndPass):
			test_metrics = trainer.test(reader=dataset.mnist.test())
			print test_metrics
	
	trainer.train(reader=dataset.mnist.train(), num_pass=100)


if __name__ == '__main__':
	main()

@reyoung reyoung closed this as completed Apr 27, 2018
@reyoung reyoung reopened this Apr 27, 2018
@panyx0718
Copy link
Contributor

For common models, this skeleton looks good. (Still need more polish and thinking)

Overall, I think we need to have 2 level of APIs: high level and low level

high level API simplifies the network construction for normal models, such as ResNet, LSTM.
high level API is built on top of low level APIs.

however, we need to be sure that user still has the flexibility of building complex models
with our low level (more fine-grained) APIs.

One last point: When our design is more stable, we need ask our modeling team member (qingqing, yaming, yibing, etc) for advice. We need to make sure our API has a good coverage of current
and future models.

@JiayiFeng
Copy link
Collaborator

JiayiFeng commented Apr 27, 2018

I think the key problem makes current Fluid hard to use is that users can hardly understand our 'program'. Furthermore, in Fluid most features require more than one program. For example, if a user needs to do inference on test data every 10 training batches, he has to build and maintain two programs: the one for training and another one for test. Most users know neither why there should be two programs nor how to correctly build them.

In my view, the most exciting point of this issue's proposal is to warp user's net config in a function and then pass the function to some other objects. Based on this idea, maybe we can introduce a conception of ProgramBuilder. A ProgramBuilder takes a forwarding net config function defined by users(F() in the demo), and adds complementary ops(optimizers, gradient ops...) to generate specific programs(training program, testing program, and so on). Programs are built and maintained by ProgramBuilder automatically. A trainer can take a ProgramBuilder and execute the corresponding program.

In this method, users no longer need to understand programs, for they will not directly use them anymore.

By the way, in the proposed design, how to support GAN?

@wangkuiyi
Copy link
Collaborator Author

@JiayiFeng It seems that we need to allow users to write the train-loop. (I was taking the PyTorch version as a reference.) I am afraid that this simplified API cannot make it, and we might want it in the next milestone. What do you think?

@emailweixu
Copy link
Collaborator

Clearly, this high level API cannot satisfy all needs (e.g. reinforcement learning, GAN). The current V2 API cannot either. It might be possible to tweak a little bit (say, combining model and optimizer as one to pass to trainer) to make GAN possible. We need to think about to what level we can clean up the low-level API to support user train-loop in python.

@abhinavarora
Copy link
Contributor

@reyoung Do you have any suggestion on how Inference will work with the paradigm that you have shared? I am not sure if this API style will be compatible with the inference engine work done in Q1.

@helinwang
Copy link
Contributor

helinwang commented Apr 27, 2018

How about this? I think it can support GAN:

import paddle.fluid as fluid
import paddle.v2.dataset as dataset


def conv_network():
    image = fluid.layers.data(name='image', shape=[1, 28, 28])
    label = fluid.layers.data(name='label', shape=[1], dtype='int64')

    hidden = fluid.layers.simple_img_conv_pool(image,
                                               num_filters=32, filter_size=3,
                                               pool_size=3, pool_stride=1, act='relu')
    hidden = fluid.layers.dropout(hidden, 0.1)
    hidden = fluid.layers.batch_norm(hidden)

    prediction = fluid.layers.fc(hidden, size=10, act='softmax')
    loss = fluid.layers.cross_entropy(prediction, label)

    return loss

def train_conv_network():
    loss = conv_network()
    sgd_optimizer = fluid.optimizer.SGD(learning_rate=0.001)
    sgd_optimizer.minimize(loss)
    return loss

def main():
    # `fluid.Compile` creates a program,
    # the program owns the program desc, and a single scope.
    # Because the scope is shared by different methods (`conv_network`, `train_conv_network`),
    # GAN should be supported.
    program = fluid.Compile(conv_network, train_conv_network)
    for i in range(0, 100):
        for train_data in dataset.mnist.train():
            loss = program.run("train_conv_network", {"image": train_data[0], "label": train_data[1]})
            print("train loss", loss)

        for test_data in dataset.mnist.test():
            loss = program.run("conv_network", {"image": test_data[0], "label": test_data[1]})
            print("test loss", loss)


if __name__ == '__main__':
    main()

@emailweixu
Copy link
Collaborator

@helinwang how do your proposal handle distributed training?

@helinwang
Copy link
Contributor

helinwang commented Apr 28, 2018

@emailweixu for trainer, the fluid.Compile can check the environment variable to distinguish if it's distributed training and produce the correct compiled program:

TRAINING_ROLE=TRAINER PSERVERS=127.0.0.1:8000 python train.py

For pserver, the user can do something like:

TRAINING_ROLE=PSERVER paddle run --file train.py --main train_conv_network

The key is that the entry point is no longer Python, instead it's the paddle binary, which parses the train_conv_network function into a pserver program, and run it.

@helinwang
Copy link
Contributor

helinwang commented Apr 28, 2018

@emailweixu maybe a simpler way to start pserver is:

TRAINING_ROLE=PSERVER python train.py

And now fluid.Compile detects it's the PSERVER environment variable, produces a program that program.run will run the pserver operators.

@emailweixu
Copy link
Collaborator

@helinwang The problem is that pserver does not have a loop over data generator. With your design, user code has to do different things depends on whether it is pserver or trainer.

@pkuyym
Copy link
Contributor

pkuyym commented Apr 28, 2018

Awesome discussion. I have some naive thoughts, for some complicated networks, we should well handle the naming things. For example, fc layers may appear anywhere, I think auto-naming mechanism is not enough, event that we can pass a specified parameter name, however I think we can design better.

with net.module('generator') as generator:
    data1
    ...
    with network.name_scope('scope') as sub_scope:
        fc1 = fluid.layer.fc(...)
    ...

with net.module('discrimitor') as discrimitor:
    data1 
    data2
    ...
    fc2 = fluid.layer.fc(input=generator.sub_scope.fc1)
    ...

network.module holds a complete logic block. We may analysis the dependencies to decide whether compile one ProgramDesc or more one ProgramDesc. We can require that all computation logic within a module or name_scope will share a naming space.

@reyoung
Copy link
Collaborator

reyoung commented Apr 28, 2018

@pkuyym
Acutally, fluid supports name scope right now. fluid.unique_name.guard(). Basically as the same API as you proposed.

@pkuyym
Copy link
Contributor

pkuyym commented Apr 28, 2018

@reyoung Thanks for your reminder, I paste a snippet here:

with fluid.unique_name.guard():
        train_file_obj = fluid.layers.open_files(
            filenames=TRAIN_FILES,
            pass_num=pass_num,
            shapes=[[-1, 1], [-1, 1]],
            lod_levels=[1, 0],
            dtypes=['int64', 'int64'],
            thread_num=1)

I think it may make the API more friendly to extend current unique_name.guard to support:

# add prefix to make debug easier
with fluid.unique_name.guard('prefix_1') as scope_1:
    fc = fluid.layers.fc(...)

with fluid.unique_name.guard('prefix_2') as scope_2:
   fc = fluid.layers.fc(input=scope_1.fc) # very convenient to refer fc in scope_1

@JiayiFeng
Copy link
Collaborator

JiayiFeng commented Apr 28, 2018

@wangkuiyi In my opinion, even in GANs, multiple nets have a certain running order. So maybe we can allow the trainer takes more than one net configs(in the form of a list), generates multiple sets of programs, and use a for-loop inside the trainer to execute them in turn.

This idea is similar to @helinwang 's proposal. However, @helinwang proposes to compile all nets into a single program. I tend to assign every net with an independent program.

@helinwang
Copy link
Contributor

helinwang commented Apr 30, 2018

The problem is that pserver does not have a loop over data generator. With your design, user code has to do different things depends on whether it is pserver or trainer.

@emailweixu thanks for pointing out, that is correct. Another possibility is we do it in fluid.Compile: when running as pserver, fluid.Compile will compile the pserver program, and run it immediately.

Still, it's somewhat not satisfactory because the users may have done something in the Python code before fluid.Compile with the assumption that it is used for training, not for running the pserver. I think reusing fluid.train for the entry point of running pserver operators arguably has the same issue.

The most clean way I think is to "extract" out the Fluid program definition code from the Python glue code. And run only the Fluid program definition code. According to this logic, one way would be doing:

# assuming train.py is in the same folder
paddle run_pserver --main train.train_conv_network

Internally paddle run_pserver does something like:

import os
import paddle.fluid as fluid
import train

os.environ['TRAINING_ROLE'] = "PSERVER"
program = fluid.Compile(train.train_conv_network) # transpile happens inside
program.run()

@cs2be
Copy link
Contributor

cs2be commented Apr 30, 2018

All, we did some thinking about how inference can be done. Please review our proposal:

import paddle.fluid as paddle
import paddle.v2.dataset as dataset

def inference_network():
    image = fluid.layers.data(name='image', shape=[1, 28, 28])
    label = fluid.layers.data(name='label', shape=[1], dtype='int64')

    hidden = fluid.layers.simple_img_conv_pool(image, 
			num_filters=32, filter_size=3,
			pool_size=3, pool_stride=1, act='relu')
    hidden = fluid.layers.dropout(hidden, 0.1)
    hidden = fluid.layers.batch_norm(hidden)

    prediction = fluid.layers.fc(hidden, size=10, act='softmax')
    return prediction
	
def train_network():
    prediction = inference_network()
    loss = fluid.layers.cross_entropy(prediction, label)
    return loss

def main():
    params = fluid.Params('./params')
    # If params is not None it will be loaded to Trainer
    trainer = fluid.Trainer(train_network, optimizer=fluid.optimizer.SGD(), params=params)
	
    def event_handler(event):
        if isinstance(event, fluid.EndIteration):
	    print event.metrics
	elif isinstance(event, fluid.EndPass):
	    test_metrics = trainer.test(reader=dataset.mnist.test())
	    print test_metrics
	
    # Train over 100 epochs
    trainer.train(reader=dataset.mnist.train(), 100, event_handler=event_handler)
	
    inferencer = fluid.Inferencer(inference_network, trainer.params)
    prediction = inferencer.infer({ 'image': <IMAGE_DATA>})

if __name__ == '__main__':
	main()

@jetfuel
Copy link
Contributor

jetfuel commented May 10, 2018

When we were trying to implement the Param class, we realized it was pretty ugly to implement with a share scope. Therefore we update the syntax to the following.

import paddle.fluid as paddle
import paddle.v2.dataset as dataset

def inference_program():
    image = fluid.layers.data(name='image', shape=[1, 28, 28])

    hidden = fluid.layers.simple_img_conv_pool(image,
                                               num_filters=32, filter_size=3,
                                               pool_size=3, pool_stride=1, act='relu')
    hidden = fluid.layers.dropout(hidden, 0.1)
    hidden = fluid.layers.batch_norm(hidden)

    prediction = fluid.layers.fc(hidden, size=10, act='softmax')
    return prediction


def train_program():
    label = fluid.layers.data(name='label', shape=[1], dtype='int64')

    prediction = inference_program()
    cost = fluid.layers.cross_entropy(prediction, label)
    avg_cost = fluid.layers.mean(cost)
    return avg_cost


def main():
    place = fluid.CUDAPlace(0) if use_cuda else fluid.CPUPlace()
    trainer = fluid.Trainer(program_func=train_program, 
                                        optimizer=fluid.optimizer.SGD(),
                                        param_path="image.model",
                                        place=place)

    def event_handler(event):
        if isinstance(event, fluid.EndEpochEvent):
            pass
        elif isinstance(event, fluid.EndStepEvent):
            test_metrics = trainer.test(reader=test_reader)
            pass

    trainer.train(num_epochs=1,
                         event_handler=event_handler,
                         reader=train_reader,
                         feed_order=['image', 'label'])

    inferencer = fluid.Inferencer(inference_program, param_path="image.model", place=place)
    prediction = inferencer.infer({'image': < IMAGE_DATA >})

    if __name__ == '__main__':
        main()

I also noticed there is another design. The change is to have the Trainer to handle the infer_program. Is this version later than the above one?

def main():
    place = fluid.CUDAPlace(0) if use_cuda else fluid.CPUPlace()
    trainer = fluid.Trainer(program_func=train_program,
                            infer_func=inference_program,
                            optimizer=fluid.optimizer.SGD(),
                            param_path="image.model",
                            place=place)

    def event_handler(event):
        if isinstance(event, fluid.EndEpochEvent):
            pass
        elif isinstance(event, fluid.EndStepEvent):
            test_metrics = trainer.test(reader=test_reader)
            pass

    trainer.train(num_epochs=1,
                         event_handler=event_handler,
                         reader=train_reader,
                         feed_order=['image', 'label'])

    inferencer = fluid.Inferencer(param_path="image.model", place=place)
    prediction = inferencer.infer({'image': < IMAGE_DATA >})

@jetfuel
Copy link
Contributor

jetfuel commented May 11, 2018

Latest Syntax

import paddle.fluid as paddle
import paddle.v2.dataset as dataset

def inference_program():
    image = fluid.layers.data(name='image', shape=[1, 28, 28])

    hidden = fluid.layers.simple_img_conv_pool(image,
                                               num_filters=32, filter_size=3,
                                               pool_size=3, pool_stride=1, act='relu')
    hidden = fluid.layers.dropout(hidden, 0.1)
    hidden = fluid.layers.batch_norm(hidden)

    prediction = fluid.layers.fc(hidden, size=10, act='softmax')
    return prediction


def train_program():
    label = fluid.layers.data(name='label', shape=[1], dtype='int64')

    prediction = inference_program()
    cost = fluid.layers.cross_entropy(prediction, label)
    avg_cost = fluid.layers.mean(cost)
    return avg_cost

def main():
    place = fluid.CUDAPlace(0) if use_cuda else fluid.CPUPlace()
    trainer = fluid.Trainer(program_func=train_program,
                            optimizer=fluid.optimizer.SGD(),
                            param_path="image.model",
                            place=place)

    def event_handler(event):
        if isinstance(event, fluid.EndEpochEvent):
            trainer.save_inference_model("image.model")
        elif isinstance(event, fluid.EndStepEvent):
            test_metrics = trainer.test(reader=test_reader)
            pass

    trainer.train(num_epochs=1,
                         event_handler=event_handler,
                         reader=train_reader,
                         feed_order=['image', 'label'])

    inferencer = fluid.Inferencer(
                            infer_func=inference_program,
                            param_path="image.model", place=place)
                            prediction = inferencer.infer({'image': < IMAGE_DATA >})

    if __name__ == '__main__':
        main()

@daming-lu
Copy link
Contributor

Based on the discussion here, we should follow the pattern here

@shanyi15
Copy link
Collaborator

您好,此issue在近一个月内暂无更新,我们将于今天内关闭。若在关闭后您仍需跟进提问,可重新开启此问题,我们将在24小时内回复您。因关闭带来的不便我们深表歉意,请您谅解~感谢您对PaddlePaddle的支持!
Hello, this issue has not been updated in the past month. We will close it today for the sake of other user‘s experience. If you still need to follow up on this question after closing, please feel free to reopen it. In that case, we will get back to you within 24 hours. We apologize for the inconvenience caused by the closure and thank you so much for your support of PaddlePaddle Group!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment