API Simplification #10248

wangkuiyi · 2018-04-27T00:30:12Z

Our current implementation of Fluid is incomplete and exposed too many details. A consequence is that Fluid applications are lengthy and incomprehensive.

Let us target for a cleanup and simplification

Deadline: before July 5.
Goals:
- Fluid applications are cleaner and more concise than v2 applications.
- Try to make the work not conflicting with the long-term goal of Fluid -- a differentiable DL language.

wangkuiyi · 2018-04-27T00:30:36Z

The Current Problems

Concepts like Executor should be hidden.
Most complicated staff are in the train-loop, and encapsulate the loop in a train function.

For more details, let us take a look at the current example program fluid/test_fit_a_line.py, which has the following structure:

Define the forward pass

Paddle/python/paddle/fluid/tests/book/test_fit_a_line.py

Lines 26 to 33 in 83b1a8f

    
           x = fluid.layers.data(name='x', shape=[13], dtype='float32') 
        
           y_predict = fluid.layers.fc(input=x, size=1, act=None) 
        
           y = fluid.layers.data(name='y', shape=[1], dtype='float32') 
        
           cost = fluid.layers.square_error_cost(input=y_predict, label=y) 
        
           avg_cost = fluid.layers.mean(cost)

Generate the backward pass

Paddle/python/paddle/fluid/tests/book/test_fit_a_line.py

Lines 35 to 36 in 83b1a8f

    
           sgd_optimizer = fluid.optimizer.SGD(learning_rate=0.001) 
        
           optimize_ops, params_grads = sgd_optimizer.minimize(avg_cost)

Create the reader

Paddle/python/paddle/fluid/tests/book/test_fit_a_line.py

Lines 38 to 43 in 83b1a8f

    
           BATCH_SIZE = 20 
        
           train_reader = paddle.batch( 
        
               paddle.reader.shuffle( 
        
                   paddle.dataset.uci_housing.train(), buf_size=500), 
        
               batch_size=BATCH_SIZE)

Run the startup program

Paddle/python/paddle/fluid/tests/book/test_fit_a_line.py

Line 50 in 83b1a8f

exe.run(fluid.default_startup_program())

Run the Python train-loop and calls the main program

Paddle/python/paddle/fluid/tests/book/test_fit_a_line.py

Lines 52 to 57 in 83b1a8f

    
           PASS_NUM = 100 
        
           for pass_id in range(PASS_NUM): 
        
               for data in train_reader(): 
        
                   avg_loss_value, = exe.run(main_program, 
        
                                             feed=feeder.feed(data), 
        
                                             fetch_list=[avg_cost])

wangkuiyi · 2018-04-27T00:32:10Z

A Proposal

This idea came from @emailweixu . Here is a brief description with example code.

Let us encapsulate the forward pass into a Python function:

def F():
   x = fluid.layers.data(...)
  ...
  avg_cost = fluid.layers.mean(...)

Let us invent a standard Fluid function fluid.train, which encapsulates the creation of the reader, the train-loop, and the generation of backward pass:

def train(F, ...):
  F()  # fills in startup_program and main_program
  exe = fluid.Executor(...)
  exe.run(startup_program)
  for iter in xrange(1000):
    exe.run(main_program)

So, the users could rewrite test_fit_a_line.py as

def F():
  x = ...
  ...
  avg_cost = ...

train(F, ...)

reyoung · 2018-04-27T04:23:29Z

Following the proposal, an example train script could be

import paddle.fluid as paddle
import paddle.v2.dataset as dataset

def conv_network():
	image = fluid.layers.data(name='image', shape=[1, 28, 28])
	label = fluid.layers.data(name='label', shape=[1], dtype='int64')

	hidden = fluid.layers.simple_img_conv_pool(image, 
			num_filters=32, filter_size=3,
			pool_size=3, pool_stride=1, act='relu')
	hidden = fluid.layers.dropout(hidden, 0.1)
	hidden = fluid.layers.batch_norm(hidden)

	prediction = fluid.layers.fc(hidden, size=10, act='softmax')
	loss = fluid.layers.cross_entropy(prediction, label)

	return loss

def main():
	trainer = fluid.Trainer(conv_network, optimizer=fluid.optimizer.SGD())
	
	def event_handler(event):
		if isinstance(event, fluid.EndIteration):
			print event.metrics
		elif isinstance(event, fluid.EndPass):
			test_metrics = trainer.test(reader=dataset.mnist.test())
			print test_metrics
	
	trainer.train(reader=dataset.mnist.train(), num_pass=100)


if __name__ == '__main__':
	main()

panyx0718 · 2018-04-27T07:24:24Z

For common models, this skeleton looks good. (Still need more polish and thinking)

Overall, I think we need to have 2 level of APIs: high level and low level

high level API simplifies the network construction for normal models, such as ResNet, LSTM.
high level API is built on top of low level APIs.

however, we need to be sure that user still has the flexibility of building complex models
with our low level (more fine-grained) APIs.

One last point: When our design is more stable, we need ask our modeling team member (qingqing, yaming, yibing, etc) for advice. We need to make sure our API has a good coverage of current
and future models.

JiayiFeng · 2018-04-27T12:26:31Z

I think the key problem makes current Fluid hard to use is that users can hardly understand our 'program'. Furthermore, in Fluid most features require more than one program. For example, if a user needs to do inference on test data every 10 training batches, he has to build and maintain two programs: the one for training and another one for test. Most users know neither why there should be two programs nor how to correctly build them.

In my view, the most exciting point of this issue's proposal is to warp user's net config in a function and then pass the function to some other objects. Based on this idea, maybe we can introduce a conception of ProgramBuilder. A ProgramBuilder takes a forwarding net config function defined by users(F() in the demo), and adds complementary ops(optimizers, gradient ops...) to generate specific programs(training program, testing program, and so on). Programs are built and maintained by ProgramBuilder automatically. A trainer can take a ProgramBuilder and execute the corresponding program.

In this method, users no longer need to understand programs, for they will not directly use them anymore.

By the way, in the proposed design, how to support GAN?

wangkuiyi · 2018-04-27T18:25:06Z

@JiayiFeng It seems that we need to allow users to write the train-loop. (I was taking the PyTorch version as a reference.) I am afraid that this simplified API cannot make it, and we might want it in the next milestone. What do you think?

emailweixu · 2018-04-27T19:14:12Z

Clearly, this high level API cannot satisfy all needs (e.g. reinforcement learning, GAN). The current V2 API cannot either. It might be possible to tweak a little bit (say, combining model and optimizer as one to pass to trainer) to make GAN possible. We need to think about to what level we can clean up the low-level API to support user train-loop in python.

abhinavarora · 2018-04-27T22:06:49Z

@reyoung Do you have any suggestion on how Inference will work with the paradigm that you have shared? I am not sure if this API style will be compatible with the inference engine work done in Q1.

helinwang · 2018-04-27T22:25:27Z

How about this? I think it can support GAN:

import paddle.fluid as fluid
import paddle.v2.dataset as dataset


def conv_network():
    image = fluid.layers.data(name='image', shape=[1, 28, 28])
    label = fluid.layers.data(name='label', shape=[1], dtype='int64')

    hidden = fluid.layers.simple_img_conv_pool(image,
                                               num_filters=32, filter_size=3,
                                               pool_size=3, pool_stride=1, act='relu')
    hidden = fluid.layers.dropout(hidden, 0.1)
    hidden = fluid.layers.batch_norm(hidden)

    prediction = fluid.layers.fc(hidden, size=10, act='softmax')
    loss = fluid.layers.cross_entropy(prediction, label)

    return loss

def train_conv_network():
    loss = conv_network()
    sgd_optimizer = fluid.optimizer.SGD(learning_rate=0.001)
    sgd_optimizer.minimize(loss)
    return loss

def main():
    # `fluid.Compile` creates a program,
    # the program owns the program desc, and a single scope.
    # Because the scope is shared by different methods (`conv_network`, `train_conv_network`),
    # GAN should be supported.
    program = fluid.Compile(conv_network, train_conv_network)
    for i in range(0, 100):
        for train_data in dataset.mnist.train():
            loss = program.run("train_conv_network", {"image": train_data[0], "label": train_data[1]})
            print("train loss", loss)

        for test_data in dataset.mnist.test():
            loss = program.run("conv_network", {"image": test_data[0], "label": test_data[1]})
            print("test loss", loss)


if __name__ == '__main__':
    main()

emailweixu · 2018-04-27T23:27:06Z

@helinwang how do your proposal handle distributed training?

helinwang · 2018-04-28T00:14:58Z

@emailweixu for trainer, the fluid.Compile can check the environment variable to distinguish if it's distributed training and produce the correct compiled program:

TRAINING_ROLE=TRAINER PSERVERS=127.0.0.1:8000 python train.py

For pserver, the user can do something like:

TRAINING_ROLE=PSERVER paddle run --file train.py --main train_conv_network

The key is that the entry point is no longer Python, instead it's the paddle binary, which parses the train_conv_network function into a pserver program, and run it.

helinwang · 2018-04-28T00:44:25Z

@emailweixu maybe a simpler way to start pserver is:

TRAINING_ROLE=PSERVER python train.py

And now fluid.Compile detects it's the PSERVER environment variable, produces a program that program.run will run the pserver operators.

emailweixu · 2018-04-28T00:53:54Z

@helinwang The problem is that pserver does not have a loop over data generator. With your design, user code has to do different things depends on whether it is pserver or trainer.

pkuyym · 2018-04-28T02:58:57Z

Awesome discussion. I have some naive thoughts, for some complicated networks, we should well handle the naming things. For example, fc layers may appear anywhere, I think auto-naming mechanism is not enough, event that we can pass a specified parameter name, however I think we can design better.

with net.module('generator') as generator:
    data1
    ...
    with network.name_scope('scope') as sub_scope:
        fc1 = fluid.layer.fc(...)
    ...

with net.module('discrimitor') as discrimitor:
    data1 
    data2
    ...
    fc2 = fluid.layer.fc(input=generator.sub_scope.fc1)
    ...

network.module holds a complete logic block. We may analysis the dependencies to decide whether compile one ProgramDesc or more one ProgramDesc. We can require that all computation logic within a module or name_scope will share a naming space.

reyoung · 2018-04-28T05:33:25Z

@pkuyym
Acutally, fluid supports name scope right now. fluid.unique_name.guard(). Basically as the same API as you proposed.

pkuyym · 2018-04-28T06:15:08Z

@reyoung Thanks for your reminder, I paste a snippet here:

with fluid.unique_name.guard():
        train_file_obj = fluid.layers.open_files(
            filenames=TRAIN_FILES,
            pass_num=pass_num,
            shapes=[[-1, 1], [-1, 1]],
            lod_levels=[1, 0],
            dtypes=['int64', 'int64'],
            thread_num=1)

I think it may make the API more friendly to extend current unique_name.guard to support:

# add prefix to make debug easier
with fluid.unique_name.guard('prefix_1') as scope_1:
    fc = fluid.layers.fc(...)

with fluid.unique_name.guard('prefix_2') as scope_2:
   fc = fluid.layers.fc(input=scope_1.fc) # very convenient to refer fc in scope_1

JiayiFeng · 2018-04-28T09:19:55Z

@wangkuiyi In my opinion, even in GANs, multiple nets have a certain running order. So maybe we can allow the trainer takes more than one net configs(in the form of a list), generates multiple sets of programs, and use a for-loop inside the trainer to execute them in turn.

This idea is similar to @helinwang 's proposal. However, @helinwang proposes to compile all nets into a single program. I tend to assign every net with an independent program.

helinwang · 2018-04-30T18:31:14Z

The problem is that pserver does not have a loop over data generator. With your design, user code has to do different things depends on whether it is pserver or trainer.

@emailweixu thanks for pointing out, that is correct. Another possibility is we do it in fluid.Compile: when running as pserver, fluid.Compile will compile the pserver program, and run it immediately.

Still, it's somewhat not satisfactory because the users may have done something in the Python code before fluid.Compile with the assumption that it is used for training, not for running the pserver. I think reusing fluid.train for the entry point of running pserver operators arguably has the same issue.

The most clean way I think is to "extract" out the Fluid program definition code from the Python glue code. And run only the Fluid program definition code. According to this logic, one way would be doing:

# assuming train.py is in the same folder
paddle run_pserver --main train.train_conv_network

Internally paddle run_pserver does something like:

import os
import paddle.fluid as fluid
import train

os.environ['TRAINING_ROLE'] = "PSERVER"
program = fluid.Compile(train.train_conv_network) # transpile happens inside
program.run()

cs2be · 2018-04-30T22:12:32Z

All, we did some thinking about how inference can be done. Please review our proposal:

import paddle.fluid as paddle
import paddle.v2.dataset as dataset

def inference_network():
    image = fluid.layers.data(name='image', shape=[1, 28, 28])
    label = fluid.layers.data(name='label', shape=[1], dtype='int64')

    hidden = fluid.layers.simple_img_conv_pool(image, 
			num_filters=32, filter_size=3,
			pool_size=3, pool_stride=1, act='relu')
    hidden = fluid.layers.dropout(hidden, 0.1)
    hidden = fluid.layers.batch_norm(hidden)

    prediction = fluid.layers.fc(hidden, size=10, act='softmax')
    return prediction
	
def train_network():
    prediction = inference_network()
    loss = fluid.layers.cross_entropy(prediction, label)
    return loss

def main():
    params = fluid.Params('./params')
    # If params is not None it will be loaded to Trainer
    trainer = fluid.Trainer(train_network, optimizer=fluid.optimizer.SGD(), params=params)
	
    def event_handler(event):
        if isinstance(event, fluid.EndIteration):
	    print event.metrics
	elif isinstance(event, fluid.EndPass):
	    test_metrics = trainer.test(reader=dataset.mnist.test())
	    print test_metrics
	
    # Train over 100 epochs
    trainer.train(reader=dataset.mnist.train(), 100, event_handler=event_handler)
	
    inferencer = fluid.Inferencer(inference_network, trainer.params)
    prediction = inferencer.infer({ 'image': <IMAGE_DATA>})

if __name__ == '__main__':
	main()

jetfuel · 2018-05-10T18:57:17Z

When we were trying to implement the Param class, we realized it was pretty ugly to implement with a share scope. Therefore we update the syntax to the following.

import paddle.fluid as paddle
import paddle.v2.dataset as dataset

def inference_program():
    image = fluid.layers.data(name='image', shape=[1, 28, 28])

    hidden = fluid.layers.simple_img_conv_pool(image,
                                               num_filters=32, filter_size=3,
                                               pool_size=3, pool_stride=1, act='relu')
    hidden = fluid.layers.dropout(hidden, 0.1)
    hidden = fluid.layers.batch_norm(hidden)

    prediction = fluid.layers.fc(hidden, size=10, act='softmax')
    return prediction


def train_program():
    label = fluid.layers.data(name='label', shape=[1], dtype='int64')

    prediction = inference_program()
    cost = fluid.layers.cross_entropy(prediction, label)
    avg_cost = fluid.layers.mean(cost)
    return avg_cost


def main():
    place = fluid.CUDAPlace(0) if use_cuda else fluid.CPUPlace()
    trainer = fluid.Trainer(program_func=train_program, 
                                        optimizer=fluid.optimizer.SGD(),
                                        param_path="image.model",
                                        place=place)

    def event_handler(event):
        if isinstance(event, fluid.EndEpochEvent):
            pass
        elif isinstance(event, fluid.EndStepEvent):
            test_metrics = trainer.test(reader=test_reader)
            pass

    trainer.train(num_epochs=1,
                         event_handler=event_handler,
                         reader=train_reader,
                         feed_order=['image', 'label'])

    inferencer = fluid.Inferencer(inference_program, param_path="image.model", place=place)
    prediction = inferencer.infer({'image': < IMAGE_DATA >})

    if __name__ == '__main__':
        main()

I also noticed there is another design. The change is to have the Trainer to handle the infer_program. Is this version later than the above one?

def main():
    place = fluid.CUDAPlace(0) if use_cuda else fluid.CPUPlace()
    trainer = fluid.Trainer(program_func=train_program,
                            infer_func=inference_program,
                            optimizer=fluid.optimizer.SGD(),
                            param_path="image.model",
                            place=place)

    def event_handler(event):
        if isinstance(event, fluid.EndEpochEvent):
            pass
        elif isinstance(event, fluid.EndStepEvent):
            test_metrics = trainer.test(reader=test_reader)
            pass

    trainer.train(num_epochs=1,
                         event_handler=event_handler,
                         reader=train_reader,
                         feed_order=['image', 'label'])

    inferencer = fluid.Inferencer(param_path="image.model", place=place)
    prediction = inferencer.infer({'image': < IMAGE_DATA >})

jetfuel · 2018-05-11T21:35:02Z

Latest Syntax

import paddle.fluid as paddle
import paddle.v2.dataset as dataset

def inference_program():
    image = fluid.layers.data(name='image', shape=[1, 28, 28])

    hidden = fluid.layers.simple_img_conv_pool(image,
                                               num_filters=32, filter_size=3,
                                               pool_size=3, pool_stride=1, act='relu')
    hidden = fluid.layers.dropout(hidden, 0.1)
    hidden = fluid.layers.batch_norm(hidden)

    prediction = fluid.layers.fc(hidden, size=10, act='softmax')
    return prediction


def train_program():
    label = fluid.layers.data(name='label', shape=[1], dtype='int64')

    prediction = inference_program()
    cost = fluid.layers.cross_entropy(prediction, label)
    avg_cost = fluid.layers.mean(cost)
    return avg_cost

def main():
    place = fluid.CUDAPlace(0) if use_cuda else fluid.CPUPlace()
    trainer = fluid.Trainer(program_func=train_program,
                            optimizer=fluid.optimizer.SGD(),
                            param_path="image.model",
                            place=place)

    def event_handler(event):
        if isinstance(event, fluid.EndEpochEvent):
            trainer.save_inference_model("image.model")
        elif isinstance(event, fluid.EndStepEvent):
            test_metrics = trainer.test(reader=test_reader)
            pass

    trainer.train(num_epochs=1,
                         event_handler=event_handler,
                         reader=train_reader,
                         feed_order=['image', 'label'])

    inferencer = fluid.Inferencer(
                            infer_func=inference_program,
                            param_path="image.model", place=place)
                            prediction = inferencer.infer({'image': < IMAGE_DATA >})

    if __name__ == '__main__':
        main()

PaddlePaddle#10248 (comment)

daming-lu · 2018-05-16T00:06:25Z

Based on the discussion here, we should follow the pattern here

shanyi15 · 2018-08-15T10:28:03Z

您好，此issue在近一个月内暂无更新，我们将于今天内关闭。若在关闭后您仍需跟进提问，可重新开启此问题，我们将在24小时内回复您。因关闭带来的不便我们深表歉意，请您谅解~感谢您对PaddlePaddle的支持!
Hello, this issue has not been updated in the past month. We will close it today for the sake of other user‘s experience. If you still need to follow up on this question after closing, please feel free to reopen it. In that case, we will get back to you within 24 hours. We apologize for the inconvenience caused by the closure and thank you so much for your support of PaddlePaddle Group!

wangkuiyi assigned cs2be, abhinavarora, reyoung, helinwang, jacquesqiao, JiayiFeng, tonyyang-svail and panyx0718 Apr 27, 2018

reyoung closed this as completed Apr 27, 2018

reyoung reopened this Apr 27, 2018

wangkuiyi added the Code Cleanup label Apr 27, 2018

This was referenced Apr 27, 2018

Propose fit_a_line.py in the high-level (simplified) Fluid API #10273

Closed

To utilize the test-driven development approach, add test_high-level_api_fit_a_line.py #10274

Closed

reyoung mentioned this issue Apr 28, 2018

[DRAFT]Fluid Core Framework TODOs #10283

Closed

daming-lu mentioned this issue Apr 30, 2018

Simplify fluid api for fit a line #10301

Merged

JiayiFeng mentioned this issue May 7, 2018

Some issues in the framework.py #10460

Closed

tonyyang-svail pushed a commit to tonyyang-svail/Paddle that referenced this issue May 14, 2018

add notest_fit_a_line following the new pattern here:

1f01f71

PaddlePaddle#10248 (comment)

JiayiFeng mentioned this issue May 24, 2018

Fluid版本如何在训练之前加载之前训练好的模型 #8973

Closed

shanyi15 closed this as completed Aug 15, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

API Simplification #10248

API Simplification #10248

wangkuiyi commented Apr 27, 2018 •

edited

Loading

wangkuiyi commented Apr 27, 2018

wangkuiyi commented Apr 27, 2018 •

edited

Loading

reyoung commented Apr 27, 2018 •

edited by tonyyang-svail

Loading

panyx0718 commented Apr 27, 2018

JiayiFeng commented Apr 27, 2018 •

edited

Loading

wangkuiyi commented Apr 27, 2018

emailweixu commented Apr 27, 2018

abhinavarora commented Apr 27, 2018

helinwang commented Apr 27, 2018 •

edited

Loading

emailweixu commented Apr 27, 2018

helinwang commented Apr 28, 2018 •

edited

Loading

helinwang commented Apr 28, 2018 •

edited

Loading

emailweixu commented Apr 28, 2018

pkuyym commented Apr 28, 2018 •

edited

Loading

reyoung commented Apr 28, 2018

pkuyym commented Apr 28, 2018

JiayiFeng commented Apr 28, 2018 •

edited

Loading

helinwang commented Apr 30, 2018 •

edited

Loading

cs2be commented Apr 30, 2018 •

edited

Loading

jetfuel commented May 10, 2018 •

edited

Loading

jetfuel commented May 11, 2018 •

edited

Loading

daming-lu commented May 16, 2018

shanyi15 commented Aug 15, 2018

API Simplification #10248

API Simplification #10248

Comments

wangkuiyi commented Apr 27, 2018 • edited Loading

wangkuiyi commented Apr 27, 2018

The Current Problems

wangkuiyi commented Apr 27, 2018 • edited Loading

A Proposal

reyoung commented Apr 27, 2018 • edited by tonyyang-svail Loading

panyx0718 commented Apr 27, 2018

JiayiFeng commented Apr 27, 2018 • edited Loading

wangkuiyi commented Apr 27, 2018

emailweixu commented Apr 27, 2018

abhinavarora commented Apr 27, 2018

helinwang commented Apr 27, 2018 • edited Loading

emailweixu commented Apr 27, 2018

helinwang commented Apr 28, 2018 • edited Loading

helinwang commented Apr 28, 2018 • edited Loading

emailweixu commented Apr 28, 2018

pkuyym commented Apr 28, 2018 • edited Loading

reyoung commented Apr 28, 2018

pkuyym commented Apr 28, 2018

JiayiFeng commented Apr 28, 2018 • edited Loading

helinwang commented Apr 30, 2018 • edited Loading

cs2be commented Apr 30, 2018 • edited Loading

jetfuel commented May 10, 2018 • edited Loading

jetfuel commented May 11, 2018 • edited Loading

daming-lu commented May 16, 2018

shanyi15 commented Aug 15, 2018

wangkuiyi commented Apr 27, 2018 •

edited

Loading

wangkuiyi commented Apr 27, 2018 •

edited

Loading

reyoung commented Apr 27, 2018 •

edited by tonyyang-svail

Loading

JiayiFeng commented Apr 27, 2018 •

edited

Loading

helinwang commented Apr 27, 2018 •

edited

Loading

helinwang commented Apr 28, 2018 •

edited

Loading

helinwang commented Apr 28, 2018 •

edited

Loading

pkuyym commented Apr 28, 2018 •

edited

Loading

JiayiFeng commented Apr 28, 2018 •

edited

Loading

helinwang commented Apr 30, 2018 •

edited

Loading

cs2be commented Apr 30, 2018 •

edited

Loading

jetfuel commented May 10, 2018 •

edited

Loading

jetfuel commented May 11, 2018 •

edited

Loading