Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

Engine refactor #55

Merged
merged 31 commits into from
Sep 10, 2015
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
31 commits
Select commit Hold shift + click to select a range
e214034
[engine-refactor] object pool optimization
hotpxl Sep 5, 2015
0eb739d
[engine-refactor] DAG_ENGINE_DEBUG macro
hotpxl Sep 5, 2015
25b9ee6
[engine-refactor] rename
hotpxl Sep 5, 2015
152b6de
[engine-refactor] rename to ThreadedEngine
hotpxl Sep 5, 2015
f23bc01
[engine-refactor] remove redundant lines
hotpxl Sep 5, 2015
a5c19c9
[engine-refactor] switch engine in dag_engine.cc
hotpxl Sep 5, 2015
5bdb007
[engine-refactor] switch back to threaded engine
hotpxl Sep 5, 2015
73f0110
[engine-refactor] remove executable permission
hotpxl Sep 5, 2015
94c70ed
[engine-refactor] refactor using var queue abstraction
hotpxl Sep 5, 2015
5b06ae3
[engine-refactor] lambdas are faster if inlined
hotpxl Sep 5, 2015
d6f8459
[engine-refactor] reindent
hotpxl Sep 5, 2015
aa91f4b
[engine-refactor] Merge branch 'master' into engine-refactor
hotpxl Sep 7, 2015
9794621
[engine-refactor] fix storage concurrency issue
hotpxl Sep 7, 2015
ad97494
[engine-refactor] DAGEngine -> Engine
hotpxl Sep 7, 2015
261c757
[engine-refactor] renaming [fixes #40]
hotpxl Sep 7, 2015
933b86f
[engine-refactor] rename in test file
hotpxl Sep 7, 2015
d64a83f
engine developer doc
jermainewang Sep 7, 2015
938aa2c
[engine-refactor] Merge branch 'master' into engine-refactor
hotpxl Sep 7, 2015
29b20f4
[engine-refactor] Merge commit 'd64a83f' into engine-refactor
hotpxl Sep 7, 2015
0a0d814
[engine-refactor] doc
hotpxl Sep 7, 2015
8e4d2e2
[engine-refactor] stream manager and travis script
hotpxl Sep 8, 2015
6f09066
[engine-refactor] integrate stream manager
hotpxl Sep 9, 2015
f38bca7
[engine-refactor] optimizations: inplace operation and waiting return…
hotpxl Sep 9, 2015
a202a30
[engine-refactor] less log
hotpxl Sep 9, 2015
d5475d5
[engine-refactor] optimize for IO operations
hotpxl Sep 9, 2015
56579ae
[engine-refactor] lint
hotpxl Sep 9, 2015
8fef335
[engine-refactor] Merge branch 'master' into engine-refactor
hotpxl Sep 9, 2015
8addb28
[engine-refactor] bump submodule version
hotpxl Sep 9, 2015
d4e14ac
[engine-refactor] fix compiling issues
hotpxl Sep 9, 2015
4a85448
[engine-refactor] Merge branch 'master' into engine-refactor
hotpxl Sep 10, 2015
036644a
[engine-refactor] fix compiling issue with new master
hotpxl Sep 10, 2015
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ env:
- TASK=build CXX=g++
- TASK=python CXX=g++
- TASK=python3 CXX=g++
- TASK=python_naive CXX=g++
- TASK=unittest_gtest CXX=g++

# dependent apt packages
Expand Down
6 changes: 5 additions & 1 deletion Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -70,6 +70,10 @@ ifeq ($(USE_CUDNN), 1)
LDFLAGS += -lcudnn
endif

ifeq ($(USE_THREADED_ENGINE), 1)
CFLAGS += -DMXNET_USE_THREADED_ENGINE
endif

ifneq ($(ADD_CFLAGS), NONE)
CFLAGS += $(ADD_CFLAGS)
endif
Expand All @@ -80,7 +84,7 @@ endif

.PHONY: clean all test lint doc

BIN = tests/test_simple_engine
BIN = tests/test_threaded_engine
all: lib/libmxnet.a lib/libmxnet.so $(BIN)

SRC = $(wildcard src/*.cc src/*/*.cc)
Expand Down
83 changes: 77 additions & 6 deletions doc/developer-guide/engine.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,79 @@
DAG Engine
==========
Execution Engine
================

NArray
------
MXNet's engine is not only for deep learning or any domain-specific problem. Rather, it is designed to face a general problem: execute a bunch of functions following their dependencies. Execution of any two functions with dependencies should be serialized. Functions with no dependencies *may* be executed in parallel to boost performance.

Push Function
-------------
Interface
==============
The core interface of execution engine is:
```c++
virtual void Push(Fn exec_fun, Context exec_ctx,
std::vector<Variable> const& use_vars,
std::vector<Variable> const& mutate_vars) = 0;
```
This API allows users to push a function (`exec_fun`), along with its context information and dependencies to the engine. The `exec_ctx` is the context information in which the `exec_fun` should be executed. `use_vars` denotes the variables that the function would read from while `mutate_vars` are the variables that to be modified. Regardless of the details that would be explained later, the engine guarantees following order:

>*The execution of any two functions that any one of them modifies at least one common variable would be serialized in their push order.*

Function
--------
The function type of the engine is:
```c++
using Fn = std::function<void(RunContext)>;
```
The `RunContext` contains runtime information which is determined by the engine:
```c++
struct RunContext {
// stream pointer which could be safely cast to
// cudaStream_t* type
void *stream;
};
```
Alternatively, one could use `mxnet::engine::DAGEngine::Fn` which is the same type defination.

All the functions will be executed by the internal threads of the engine. In such model, it is usually not suggested to push *blocking* functions to the engine (usually for dealing with I/O tasks like disk, web service, UI, etc.) since it will occupy the execution thread and reduce the total throughput. In such case, we provide another *asynchronous* function type:
```c++
using Callback = std::function<void()>;
using AsyncFn = std::function<void(RunContext, Callback)>;
```
In the `AsyncFn` function, user could pass the heavy part to their own threads and safely exit the function body. The engine will not consider the function to be finished until the `Callback` function is called.

Context
--------
User could specify the `Context` of the function to be executed within. This usually includes whether the function should be run on CPU or GPU, and if GPU, which GPU to use. `Context` is different from `RunContext`. `Context` contains device type (gpu/cpu) and device id while `RunContext` contains information that could only be decided during runtime like on which stream the function should be executed.

Variable
--------
`Variable` is used to specify the dependencies of functions. The design of MXNet engine is to decouple it with other modules in MXNet. So `Variable` is like an engine-given token for user to represent the external resources the functions may use or modified. It is designed to be light, so create, delete or copy a variable will incur little overhead. Upon pushing functions, users need to specify the variables that will be used (immutable) in `use_vars` vector and the variables to be modified (mutable) in `mutate_vars` vector. The only rule for the engine to resolve the dependencies among functions pushed is:

>*The execution of any two functions that any one of them modifies at least one common variable would be serialized in their push order.*

For example, if `Fn1`, `Fn2` both mutate `V2`, `Fn2` is guaranteed to be executed after `Fn1` if `Fn2` is pushed after `Fn1`. On the other hand, if `Fn1` and `Fn2` both use `V2`, their actual execution order could be any kind.

This design allows the engine to schedule *non-functional* operations. For example, the weight update function in DNN could now use `+=` operator rather than generating a new weight array each time.

To create a variable, use `NewVar()` API. To delete a variable, use `PushDelete` API.

Push & Wait
-----------
**All `Push` APIs are asynchronous.** The API call will return immediately no matter the pushed `Fn` is finished or not. This allows engine to start computing at the same time user thread is pushing functions. All `Push` APIs are not thread-safe. To be specific, only one thread should make engine API calls at one time.

If you want to wait for a specific `Fn` to be finished, include a callback function in the closure and call the function at the end of your `Fn`.

If you want to wait for all `Fn` that involves (use/mutate) a certain variable to be finished, use `WaitForVar(var)` API.

If you want to wait for all pushed `Fn` to be finished, use `WaitForAll()` API.

Save Object Creation Cost
----------------------------
In some cases, you need to push several functions to the engine but for tons of times. If the computation of these functions are light, the overhead of copying lambdas and creating use/mutate variable lists would become relatively high. We provide an API to create an `OprHandle` beforehand:
```c++
virtual OprHandle NewOperator(AsyncFn fn,
std::vector<Variable> const& use_vars,
std::vector<Variable> const& mutate_vars) = 0;
```
So you could keep pushing the `OprHandle` without repeatedly creating them:
```c++
virtual void Push(OprHandle op, Context exec_ctx) = 0;
```
To delete it, simply call `DeleteOperator(OprHandle op)` but please make sure the operator has finished computing.
173 changes: 0 additions & 173 deletions include/mxnet/dag_engine.h

This file was deleted.

Loading