Add csp.md #7706

wangkuiyi · 2018-01-20T00:18:01Z

kavyasrinet

Thank you for the first version of the design doc. I had a few questions that I have posted in the review comments, along with a few suggested fixes.

kavyasrinet · 2018-01-20T00:59:19Z

doc/design/csp.md

@@ -0,0 +1,96 @@
+# Design Doc: CSP in PaddlePaddle Fluid
+
+## Motivations


Motivations => Motivation

kavyasrinet · 2018-01-20T00:59:45Z

doc/design/csp.md

+
+## Motivations
+
+Concurrent programming is important for deep learning.  Example applications include


Example applications include => Few example applications are :

kavyasrinet · 2018-01-20T01:02:00Z

doc/design/csp.md

+
+Concurrent programming is important for deep learning.  Example applications include
+
+1. A thread uses the GPU for computing while the main thread keeps loading the next minibatch, and


Maybe re-write to:

The main thread keeps reading the next mini-batch while another thread uses the GPU for computing.

The main thread performs the computation while another thread uploads the local gradients from each trainer to the parameter server.

kavyasrinet · 2018-01-20T01:02:30Z

doc/design/csp.md

+1. A thread uses the GPU for computing while the main thread keeps loading the next minibatch, and
+1. a thread uploads the local gradients to the parameter server while the main thread keeps computing.
+
+Most DL systems, including TensorFlow, Caffe2, and MxNet, can asynchronously execute operators in a graph. However, Fluid doesn't have the concept graph at all, as the design goal of Fluid is a programming language.


concept => concept of a
is a => is that of a

Thanks! Memorized the expressions!

kavyasrinet · 2018-01-20T01:03:23Z

doc/design/csp.md

+| message passing | MPI |
+| bulk synchronous parallel (BSP) | Pregel distributed programming framework |
+
+Because Fluid was designed to be a programming language, we would like to implement CSP.


Because => Since
implement CSP => implement CSP in Fluid.

kavyasrinet · 2018-01-20T01:09:28Z

doc/design/csp.md

+
+The type *channel* is conceptually the blocking queue.  In Go, its implemented is a [blocking circular queue](https://github.com/golang/go/blob/68ce117cf17b8debf5754bfd476345779b5b6616/src/runtime/chan.go#L31-L50), which supports send and recv.  The challenge lies more in select.
+
+The operation select has been in OS kernels long before Go language.  All Unix kernels implement system calls *poll* and *select*.  They work by inquiry all file descriptors under their monitoring.  This takes O(N) time.  Since Linux 2.6, a new system call, *epoll*, can do O(1).  In BSD systems, there is a similar system call *kqueue*.  Go's Linux implementation uses epoll.


operation select => select operation
They work by inquiry all file descriptors under their monitoring. => They monitor multiple file descriptors to see if I/O is possible on any of them.
can do O(1). => can do the same in O(1) time.

kavyasrinet · 2018-01-20T01:10:09Z

doc/design/csp.md

+
+The operation select has been in OS kernels long before Go language.  All Unix kernels implement system calls *poll* and *select*.  They work by inquiry all file descriptors under their monitoring.  This takes O(N) time.  Since Linux 2.6, a new system call, *epoll*, can do O(1).  In BSD systems, there is a similar system call *kqueue*.  Go's Linux implementation uses epoll.
+
+It might be a great idea to implement Fluid's select using epoll too.  In this design doc, we start from the O(N) way, so could we focus on Python binding and the syntax.


great idea => good idea

could we => we could

kavyasrinet · 2018-01-20T01:10:30Z

doc/design/csp.md

+
+Fluid supports many data types:
+
+1. Tensor,


The numbering is all 1s.

kavyasrinet · 2018-01-20T01:12:00Z

doc/design/csp.md

+
+### Select
+
+## Exmaple Programs


Exmaple => Example

kavyasrinet · 2018-01-20T01:12:50Z

doc/design/csp.md

+
+Fluid has two fundamental control-flows: *if-else* and *while*.  If we are to implement CSP, we need:
+
+1. a new data type: *channel*,


Do we also need something similar to the concept of a Go-routine ?

You are very right! I added goroutine.

sidgoyal78

Thanks @wangkuiyi for the initial version of the design doc. After reading this doc, I seem to get the answer to my question about the workload and the use-cases that we wish to address using concurrent programming in fluid. Thanks 👍

sidgoyal78 · 2018-01-21T07:15:49Z

doc/design/csp.md

+
+The operation select has been in OS kernels long before Go language.  All Unix kernels implement system calls *poll* and *select*.  They work by inquiry all file descriptors under their monitoring.  This takes O(N) time.  Since Linux 2.6, a new system call, *epoll*, can do O(1).  In BSD systems, there is a similar system call *kqueue*.  Go's Linux implementation uses epoll.
+
+It might be a great idea to implement Fluid's select using epoll too.  In this design doc, we start from the O(N) way, so could we focus on Python binding and the syntax.


could we => we could

typhoonzero · 2018-01-22T07:51:06Z

After some thinking, I came up with some thoughts that we can port Go directly to PaddlePaddle to make use of CSP like below. But never mind, this may not be the main branch but could be some "experimental" thing.

Go side:
- Program definitions, executors
- Control operators, can use the channel, select etc.
- Calculation operators, reference kernel calls
- Wrapper types (Variable...)
C++ side:
- kernels
- memory management (CPU and GPU)
- data types (Variable, Tensor...)

This enables reusing the Go's CSP concurrent programming model so we don't have to implement it again.

This makes C++ acts like the driver of the computing
devices and Go act as the user API.

The current Fluid implementation does not have to change, and the new Go implementation can be under a distinct directory as "experimental" feature. For
Tensorflow they use https://github.com/tensorflow/tensorflow/tree/master/tensorflow/go but it was a
simply "graph builder", but we can do more.

helinwang · 2018-01-22T18:25:00Z

doc/design/csp.md

+1. A thread uses the GPU for computing while the main thread keeps loading the next minibatch, and
+1. a thread uploads the local gradients to the parameter server while the main thread keeps computing.
+
+Most DL systems, including TensorFlow, Caffe2, and MxNet, can asynchronously execute operators in a graph. However, Fluid doesn't have the concept graph at all, as the design goal of Fluid is a programming language.


Just a note: the big majority of the TensorFlow OP is synchronous, the TF executor runs none-dependent OPs in parallel on different threads.

Thanks for the note!

helinwang · 2018-01-22T18:45:13Z

doc/design/csp.md

+In Fluid, we should be able to do the same:
+
+```python
+ch  = fluid.make_chan(dtype=INT)


I think a very important element type that fluid's channel should support is pair, for example, we want to send pair(image, label) using one channel, rather than using multiple channels.

I agree with @helinwang. I also think we can generalize a pair to an n-tuple.

Yes, great point @helinwang

abhinavarora

I have added some suggestions. Would love your feedback on that.

abhinavarora · 2018-01-20T00:57:19Z

doc/design/csp.md

+1. A thread uses the GPU for computing while the main thread keeps loading the next minibatch, and
+1. a thread uploads the local gradients to the parameter server while the main thread keeps computing.
+
+Most DL systems, including TensorFlow, Caffe2, and MxNet, can asynchronously execute operators in a graph. However, Fluid doesn't have the concept graph at all, as the design goal of Fluid is a programming language.


concept of graph

abhinavarora · 2018-01-20T01:01:10Z

doc/design/csp.md

+
+### CSP v.s. Actor Model
+
+A well-known implementation of Actor Model is the Erlang programming language.  In Actor Model, *processes* could send messages to and receive messages from another process given it ID.  We can find the three ingredients, process with ID, send, and recv, in MPI too.  Indeed, we can rewrite Erlang programs in Python + MPI with possibly fewer lines of code.  Our concern with Actor Model is that it doesn't look reasonable to implement process management in a programming language's runtime library; instead, it seems the OS's responsibility to manage processes and libraries like MPI for send/recv.


I think we should write processes/actor, because in actor model processes are actors.

We should also mention in this doc that a major concern with the Actor model is that, we need to define the concept of a mailbox. Hence every receiver should know its sender, which might be difficult in our paradigm.

abhinavarora · 2018-01-22T18:29:27Z

doc/design/csp.md

+In addition to that, we want channels that can hold more complex element types, e.g., Tensors of float16:
+
+```python
+ch = fluid.make_chan(dtype=Tensor, etype=float16)


Do you think we should define a new type class in Python that can represent such a hierarchy? Using etype will not be scalable if the composition is long.

I guess that our VarDesc should be upgraded to describe such composite types. I am not a Python expert, but it might be reasonable to have a Python class hierarchy.

abhinavarora · 2018-01-22T18:49:33Z

doc/design/csp.md

+In Fluid, we should be able to do the same:
+
+```python
+ch  = fluid.make_chan(dtype=INT)


I agree with @helinwang. I also think we can generalize a pair to an n-tuple.

abhinavarora · 2018-01-22T19:23:34Z

@typhoonzero I had the same idea and I discussed it with @wangkuiyi on Friday. We came to an understanding that the prominent reason, we do not want to do that is because we would like to attend CSP implementation instead of wrapping Go's runtime.

typhoonzero · 2018-01-23T01:03:37Z

@abhinavarora I see, Thank you!

wangkuiyi · 2018-01-23T05:35:35Z

@typhoonzero When we were discussing could we wrap up our Go's RecordIO implementation into a C++ library, @reyoung reminded that such a wrapper would need to link the Go's runtime library, which is too heavy for a C++ library. I think here we are facing the similar choice.

Another reason, as @abhinavarora explained, I think we'd anyway implement CSP in C++ by ourselves because we need to grasp the core ideas we present to our users.

typhoonzero · 2018-01-23T05:41:17Z

@wangkuiyi

When we were discussing could we wrap up our Go's RecordIO implementation into a C++ library, @reyoung reminded that such a wrapper would need to link the Go's runtime library, which is too heavy for a C++ library. I think here we are facing the similar choice.

I'm afraid this happens only when calling Go from C++ code, If we call C++ kernels from Go, I think it's Okay.

Another reason, as @abhinavarora explained, I think we'd anyway implement CSP in C++ by ourselves because we need to grasp the core ideas we present to our users.

Yep, definitely agree with this!

wangkuiyi · 2018-01-23T05:53:49Z

@typhoonzero When you say "port Go to C++", do you mean something like reuse the Go's source code? If that is what you mean, I am sorry to tell that the Go source code is a mixture of Go and C and assembly, and the C and assembly part is in Plan 9 syntax and cannot be built using gcc/gas. Also, Go's implementation assumes that the language runtime supports multi-threading, which implies that we need to port Go's runtime into C++ as well. It is too much more work than we could imagine. :-)

typhoonzero · 2018-01-23T06:06:23Z

@wangkuiyi "When you say "port Go to C++", do you mean something like reuse the Go's source code?" -- No. I mean implement operators and executors in Go, and operators written in Go can call C++ written kernels which can run on any devices. Then people can directly write a Go concurrent program and then compile to a binary which launches a go-runtime embedded with current kernel implementations. Then the control flow is done by Go and calculations is done by kernels, which is easy to implement. But we still need to care about memory allocation, so I put Variable allocations inside C++ side, but all the other memories can be handled by original Go's GC.

typhoonzero · 2018-01-23T09:47:42Z

I think channel should not be a server which send_op can send messages to. Operators can deal with channels in current program but not on the remote server. But we may put messages in channel before sending them. Communications between nodes should be done by RPC ops like listen_and_serv and recv.

I have some code to describe how to use CSP with send/recv #6508

Sorry I updated the comment to make it more clear.

helinwang

LGTM, I saw @typhoonzero have some comments, but I could not quite understand #7706 (comment) , maybe we can have another separate issue discussing it?

Add csp.md

4f3640e

wangkuiyi requested review from abhinavarora, reyoung, kavyasrinet, chengduoZH, helinwang and tonyyang-svail January 20, 2018 00:18

kavyasrinet reviewed Jan 20, 2018

View reviewed changes

sidgoyal78 reviewed Jan 21, 2018

View reviewed changes

helinwang reviewed Jan 22, 2018

View reviewed changes

abhinavarora reviewed Jan 22, 2018

View reviewed changes

update csp.md

a301cf0

Yancey1989 mentioned this pull request Jan 23, 2018

Implement fluid API using python with guard. #6508

Closed

4 tasks

helinwang approved these changes Jan 23, 2018

View reviewed changes

abhinavarora merged commit 7ccbc70 into PaddlePaddle:develop Jan 24, 2018

wangkuiyi deleted the csp branch January 26, 2018 18:32

		@@ -0,0 +1,96 @@
		# Design Doc: CSP in PaddlePaddle Fluid

		## Motivations


		## Motivations

		Concurrent programming is important for deep learning. Example applications include


		Concurrent programming is important for deep learning. Example applications include

		1. A thread uses the GPU for computing while the main thread keeps loading the next minibatch, and


		The type channel is conceptually the blocking queue. In Go, its implemented is a [blocking circular queue](https://github.com/golang/go/blob/68ce117cf17b8debf5754bfd476345779b5b6616/src/runtime/chan.go#L31-L50), which supports send and recv. The challenge lies more in select.

		The operation select has been in OS kernels long before Go language. All Unix kernels implement system calls poll and select. They work by inquiry all file descriptors under their monitoring. This takes O(N) time. Since Linux 2.6, a new system call, epoll, can do O(1). In BSD systems, there is a similar system call kqueue. Go's Linux implementation uses epoll.


		The operation select has been in OS kernels long before Go language. All Unix kernels implement system calls poll and select. They work by inquiry all file descriptors under their monitoring. This takes O(N) time. Since Linux 2.6, a new system call, epoll, can do O(1). In BSD systems, there is a similar system call kqueue. Go's Linux implementation uses epoll.

		It might be a great idea to implement Fluid's select using epoll too. In this design doc, we start from the O(N) way, so could we focus on Python binding and the syntax.


		Fluid has two fundamental control-flows: if-else and while. If we are to implement CSP, we need:

		1. a new data type: channel,


		### CSP v.s. Actor Model

		A well-known implementation of Actor Model is the Erlang programming language. In Actor Model, processes could send messages to and receive messages from another process given it ID. We can find the three ingredients, process with ID, send, and recv, in MPI too. Indeed, we can rewrite Erlang programs in Python + MPI with possibly fewer lines of code. Our concern with Actor Model is that it doesn't look reasonable to implement process management in a programming language's runtime library; instead, it seems the OS's responsibility to manage processes and libraries like MPI for send/recv.

Add csp.md #7706

Add csp.md #7706

Conversation

wangkuiyi commented Jan 20, 2018 • edited Loading

kavyasrinet left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sidgoyal78 left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

typhoonzero commented Jan 22, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

helinwang Jan 22, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

abhinavarora left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

abhinavarora commented Jan 22, 2018

typhoonzero commented Jan 23, 2018

wangkuiyi commented Jan 23, 2018 • edited Loading

typhoonzero commented Jan 23, 2018

wangkuiyi commented Jan 23, 2018 • edited Loading

typhoonzero commented Jan 23, 2018

typhoonzero commented Jan 23, 2018 • edited Loading

helinwang left a comment • edited Loading

Choose a reason for hiding this comment

wangkuiyi commented Jan 20, 2018 •

edited

Loading

sidgoyal78 left a comment •

edited

Loading

helinwang Jan 22, 2018 •

edited

Loading

wangkuiyi commented Jan 23, 2018 •

edited

Loading

wangkuiyi commented Jan 23, 2018 •

edited

Loading

typhoonzero commented Jan 23, 2018 •

edited

Loading

helinwang left a comment •

edited

Loading