From d88436e35ca105aaeaa1b27cabcc67bfc742d69b Mon Sep 17 00:00:00 2001 From: Zhi <5145158+zhiics@users.noreply.github.com> Date: Thu, 6 Feb 2020 17:38:48 -0800 Subject: [PATCH] Fix doc after moving to unified IR (#4835) --- docs/dev/relay_pass_infra.rst | 10 ++++++---- docs/dev/runtime.rst | 36 +++++++++++++++++------------------ 2 files changed, 24 insertions(+), 22 deletions(-) diff --git a/docs/dev/relay_pass_infra.rst b/docs/dev/relay_pass_infra.rst index b4f3f6b0b7c9..8bd5a05534a1 100644 --- a/docs/dev/relay_pass_infra.rst +++ b/docs/dev/relay_pass_infra.rst @@ -78,7 +78,7 @@ C++ Backend We provide a ``PassInfo`` object to contain the basic information needed by a pass. ``name`` is the pass name, ``opt_level`` indicates at which optimization level the pass will be enabled, and ``required`` represents the passes that are -required to execute a certain pass (see `include/tvm/relay/transform.h`_ for +required to execute a certain pass (see `include/tvm/ir/transform.h`_ for more details). For example, during registration of a pass (will be covered in later), the pass developers can specify the name of the pass, the optimization level it will be performed at, and/or the passes that are required. @@ -183,7 +183,7 @@ optimization passes, e.g., function-level passes, module-level passes, and sequential passes. Each subclass itself could act as a pass manager. For instance, they could collect the required passes and execute them or build a dependency graph based on the given metadata. The full definition of them -can be found in `src/relay/pass/pass_manager.cc`_ +can be found in `src/relay/ir/transform.cc`_ and `src/ir/transform.cc`_. Module-Level Passes ^^^^^^^^^^^^^^^^^^^ @@ -651,9 +651,11 @@ For more pass infra related examples in Python and C++, please refer to .. _Relay module: https://docs.tvm.ai/langref/relay_expr.html#module-and-global-functions -.. _include/tvm/relay/transform.h: https://github.com/apache/incubator-tvm/blob/master/include/tvm/relay/transform.h +.. _include/tvm/ir/transform.h: https://github.com/apache/incubator-tvm/blob/master/include/tvm/ir/transform.h -.. _src/relay/pass/pass_manager.cc: https://github.com/apache/incubator-tvm/blob/master/src/relay/pass/pass_manager.cc +.. _src/relay/ir/transform.cc: https://github.com/apache/incubator-tvm/blob/master/src/relay/ir/transform.cc + +.. _src/ir/transform.cc: https://github.com/apache/incubator-tvm/blob/master/src/ir/transform.cc .. _src/relay/pass/fold_constant.cc: https://github.com/apache/incubator-tvm/blob/master/src/relay/pass/fold_constant.cc diff --git a/docs/dev/runtime.rst b/docs/dev/runtime.rst index ed8bf39432b0..bb129b038aa9 100644 --- a/docs/dev/runtime.rst +++ b/docs/dev/runtime.rst @@ -94,7 +94,7 @@ Here are the common ones: - PackedFunc itself - Module for compiled modules - DLTensor* for tensor object exchange -- TVM Node to represent any object in IR +- TVM Object to represent any object in IR The restriction makes the implementation simple without the need of serialization. Despite being minimum, the PackedFunc is sufficient for the use-case of deep learning deployment as @@ -141,7 +141,7 @@ One fun fact about PackedFunc is that we use it for both compiler and deployment .. _here: https://github.com/apache/incubator-tvm/tree/master/src/api -To keep the runtime minimum, we isolated the IR Node support from the deployment runtime. The resulting runtime takes around 200K - 600K depending on how many runtime driver modules (e.g., CUDA) get included. +To keep the runtime minimum, we isolated the IR Object support from the deployment runtime. The resulting runtime takes around 200K - 600K depending on how many runtime driver modules (e.g., CUDA) get included. The overhead of calling into PackedFunc vs. a normal function is small, as it is only saving a few values on the stack. So it is OK as long as we don't wrap small functions. @@ -182,7 +182,7 @@ RPC server on iPhone/android/raspberry pi or even the browser. The cross compila This instant feedback gives us a lot of advantages. For example, to test the correctness of generated code on iPhone, we no longer have to write test-cases in swift/objective-c from scratch -- We can use RPC to execute on iPhone, copy the result back and do verification on the host via numpy. We can also do the profiling using the same script. -TVM Node and Compiler Stack +TVM Object and Compiler Stack --------------------------- As we mentioned earlier, we build compiler stack API on top of the PackedFunc runtime system. @@ -192,17 +192,17 @@ However, we don't want to change our API from time to time. Besides that, we als - be able to serialize any language object and IRs - be able to explore, print, and manipulate the IR objects in front-end language to do quick prototyping. -We introduced a base class, called `Node`_ to solve this problem. -All the language object in the compiler stack is a subclass of Node. Each node contains a string type_key that uniquely identifies -the type of object. We choose string instead of int as type key so new Node class can be added in the decentralized fashion without +We introduced a base class, called `Object`_ to solve this problem. +All the language object in the compiler stack is a subclass of ``Object``. Each object contains a string type_key that uniquely identifies +the type of object. We choose string instead of int as type key so new ``Object`` class can be added in the decentralized fashion without adding the code back to the central repo. To ease the speed of dispatching, we allocate an integer type_index at runtime for each type_key. -.. _Node: https://github.com/dmlc/HalideIR/blob/master/src/tvm/node/node.h#L61 +.. _Object: https://github.com/apache/incubator-tvm/blob/master/include/tvm/runtime/object.h -Since usually one Node object could be referenced in multiple places in the language, we use a shared_ptr to keep -track of reference. We use NodeRef class to represent a reference to the Node. -We can roughly view NodeRef class as shared_ptr to the Node container. -We can also define subclass NodeRef to hold each subtypes of Node. Each Node class needs to define the VisitAttr function. +Since usually one ``Object`` could be referenced in multiple places in the language, we use a shared_ptr to keep +track of reference. We use ``ObjectRef`` class to represent a reference to the ``Object``. +We can roughly view ``ObjectRef`` class as shared_ptr to the ``Object`` container. +We can also define subclass ``ObjectRef`` to hold each subtypes of ``Object``. Each subclass of ``Object`` needs to define the VisitAttr function. .. code:: c @@ -216,21 +216,21 @@ We can also define subclass NodeRef to hold each subtypes of Node. Each Node cla virtual void Visit(const char* key, std::string* value) = 0; virtual void Visit(const char* key, void** value) = 0; virtual void Visit(const char* key, Type* value) = 0; - virtual void Visit(const char* key, NodeRef* value) = 0; + virtual void Visit(const char* key, ObjectRef* value) = 0; // ... }; - class Node { + class BaseAttrsNode : public Object { public: - virtual void VisitAttrs(AttrVisitor* visitor) {} + virtual void VisitAttrs(AttrVisitor* v) {} // ... }; -Each Node subclass will override this to visit its members. Here is an example implementation of TensorNode. +Each ``Object`` subclass will override this to visit its members. Here is an example implementation of TensorNode. .. code:: c - class TensorNode : public Node { + class TensorNode : public Object { public: /*! \brief The shape of the tensor */ Array shape; @@ -251,7 +251,7 @@ Each Node subclass will override this to visit its members. Here is an example i } }; -In the above examples, both ``Operation`` and ``Array`` are NodeRef. +In the above examples, both ``Operation`` and ``Array`` are ObjectRef. The VisitAttrs gives us a reflection API to visit each member of the object. We can use this function to visit the node and serialize any language object recursively. It also allows us to get members of an object easily in front-end language. @@ -264,7 +264,7 @@ For example, in the following code, we accessed the op field of the TensorNode. # access the op field of TensorNode print(x.op.name) -New Node can be added to C++ without changing the front-end runtime, making it easy to make extensions to the compiler stack. +New ``Object`` can be added to C++ without changing the front-end runtime, making it easy to make extensions to the compiler stack. Note that this is not the fastest way to expose members to front-end language, but might be one of the simplest approaches possible. We also find that it fits our purposes as we mainly use python for testing and prototyping and still use c++ to do the heavy lifting job.