[DOC] Add doc for Relay op strategy (apache#5078)

* [DOC] Add doc for Relay op strategy * update * address more comments * update * update
neo-ai · Apr 17, 2020 · 61e5f5a · 61e5f5a
1 parent 1e1ddfb
commit 61e5f5a
Show file tree

Hide file tree

Showing 2 changed files with 283 additions and 0 deletions.
diff --git a/docs/dev/index.rst b/docs/dev/index.rst
@@ -29,6 +29,7 @@ In this part of documentation, we share the rationale for the specific choices m
    hybrid_script
    relay_intro
    relay_add_op
+   relay_op_strategy
    relay_pass_infra
    relay_add_pass
    relay_bring_your_own_codegen

diff --git a/docs/dev/relay_op_strategy.rst b/docs/dev/relay_op_strategy.rst
@@ -0,0 +1,282 @@
+..  Licensed to the Apache Software Foundation (ASF) under one
+    or more contributor license agreements.  See the NOTICE file
+    distributed with this work for additional information
+    regarding copyright ownership.  The ASF licenses this file
+    to you under the Apache License, Version 2.0 (the
+    "License"); you may not use this file except in compliance
+    with the License.  You may obtain a copy of the License at
+
+..    http://www.apache.org/licenses/LICENSE-2.0
+
+..  Unless required by applicable law or agreed to in writing,
+    software distributed under the License is distributed on an
+    "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+    KIND, either express or implied.  See the License for the
+    specific language governing permissions and limitations
+    under the License.
+
+.. _relay-op-strategy:
+
+Relay Operator Strategy
+=======================
+
+In order to lower Relay operators to the implementations defined in TOPI
+library, a compute and schedule function need to be registered to each Relay
+operator.  However, compute and schedule functions are usually specialized for
+each target, and further, even for the same target, we may have multiple
+algorithms and implementations available. To deal with the complexity, we
+introduce operator strategy to allow developers to define a flexible lowering
+strategy for each operator and target.
+
+
+Operator Strategy Design
+------------------------
+
+The basic element in operator strategy is an ``OpImplementation``. It includes
+the a pair of compute and schedule function, the name of the implementation,
+and a priority level (the use of priority level is explained in
+`Select Implementation from Op Strategy`_).
+
+The ``OpStrategy`` includes a list of ``OpSpecialization``. Each ``OpSpecialization``
+contains a list of ``OpImplementation`` associated with a ``SpecializedCondition``
+(see definition in ``include/tvm/te/schedule.h``).  The ``SpecializedCondition``
+can be null, indicating the implementations are generally applicable;
+otherwise, the implementations are only considered when the specialized
+condition is satisfied. ``SpecializedCondition`` consists of a list
+of clauses defined in Tensor Expression in conjunctive normal form (CNF) and
+only supports conditions on tensor shapes.
+
+Last, a strategy function, or ``FTVMStrategy``, determines which pair(s) of
+compute and schedule functions should be used given a workload, and needs to be
+registered to each Relay operator.  ``FTVMStrategy`` is a generic function (see
+``include/tvm/target/generic_func.h``), that can be overwritten for each
+target. The function signature is
+
+.. code:: c
+
+    OpStrategy(const Attrs& attrs, const Array<Tensor>& inputs, const Type& out_type, const Target& target)
+
+that the function returns an ``OpStrategy`` given the op attributes, input
+tensors, output types, and target to compile to.
+
+
+Write A Strategy Function
+-------------------------
+
+We recommend developers to write strategy function in Python as
+most TOPI compute and schedule functions are written in Python.
+In python, we provide ``OpStrategy`` class in ``pyton/tvm/relay/op/op.py``.
+It only has one API, which is to add an implementation to the strategy:
+
+.. code:: python
+
+    def add_implementation(self, compute, schedule, name="default", plevel=10)
+
+
+We now take ``topk`` as an example to explain how to write the
+``FTVMStrategy`` function:
+
+.. code:: python
+
+    # add to python/tvm/relay/op/strategy/generic.py
+    @override_native_generic_func("topk_strategy")
+    def topk_strategy(attrs, inputs, out_type, target):
+        strategy = _op.OpStrategy()
+        strategy.add_implementation(
+            wrap_compute_topk(topi.topk),
+            wrap_topi_schedule(topi.generic.schedule_topk),
+            name="topk.generic")
+        return strategy
+
+    # add to each target file in python/tvm/relay/op/strategy, e.g., x86.py, cuda.py, etc.
+    @topk_strategy.register(["cuda", "gpu"])
+    def topk_strategy_cuda(attrs, inputs, out_type, target):
+        strategy = _op.OpStrategy()
+        strategy.add_implementation(
+            wrap_compute_my_new_op(topi.cuda.topk),
+            wrap_topi_schedule(topi.cuda.schedule_topk),
+            name="topk.cuda")
+        return strategy
+
+In this example, we use ``topi.cuda.topk`` and ``topi.cuda.schedule_topk``
+as the compute and schedule function for CUDA or GPU target, while use TOPI
+generic compute and schedule for the rest of targets.
+Note that we use two wrapper functions that wrap the topi
+compute and schedule to conform with the required function signature (
+see ``FTVMCompute`` and ``FTVMSchedule`` in ``include/tvm/relay/op_attr_types.h``).
+Usually we need to write a customized compute wrapper function for each operator
+to get different fields from op attributes.
+
+The example above shows a very basic strategy function that only
+adds one implementation in the strategy. But for many complicated operators,
+we may need to add multiple implementations that use different algorithms.
+For example, we can use both direct and winograd algorithm to
+compute a conv2d op. In order to achieve this, we can write the strategy function
+as follows:
+
+.. code:: python
+
+    strategy.add_implementation(
+        wrap_compute_conv2d(topi.cuda.conv2d_nchw),
+        wrap_topi_schedule(topi.cuda.schedule_conv2d_nchw),
+        name="conv2d_nchw.cuda",
+        plevel=10)
+
+    if winograd_condition:
+        strategy.add_implementation(
+            wrap_compute_conv2d(topi.cuda.conv2d_nchw_winograd),
+            wrap_topi_schedule(topi.cuda.schedule_conv2d_nchw_winograd),
+            name="conv2d_nchw_winograd.cuda",
+            plevel=15)
+
+In this example, we add two implementations to the conv2d strategy where
+winograd algorithm is only added when ``winograd_condition`` is true.
+The implementation ``"conv2d_nchw_winograd.cuda"`` will be used to compile
+conv2d when ``winograd_condition`` is true as it has higher
+priority level (this could be changed if certain implementation is an AutoTVM
+template. See `Select Implementation from Op Strategy`_ for more
+details). Otherwise, ``"conv2d_nchw.cuda"`` is used.
+
+We can extend the example above to third party library implementation. For
+example, we can add the implementation that invokes kernel in the cblas
+library when cblas is included in the target.
+
+.. code:: python
+
+    if "cblas" in target.libs:
+        strategy.add_implementation(
+            wrap_compute_dense(topi.x86.dense_cblas),
+            wrap_topi_schedule(topi.x86.schedule_dense_cblas),
+            name="dense_cblas.x86",
+            plevel=15)
+
+
+Further, we can add implementation specialized for a certain range of shapes.
+The code below shows an example of dense strategy that adds an implementation
+that is specialized for ``m`` greater than 16. The main difference between
+hardcode python condition like examples above and specialized condition is that
+it allows TVM to generate multiple kernels when the input tensors have symbolic
+shapes. The compile engine will generate a dispatch function that invokes the
+specialized kernel when the corresponding condition is met; otherwise,
+invoke the kernel that has no associated specialized condition (``dense_common``
+in this example). This part is still work in progress. More details will be
+provided after it is done.
+
+.. code:: python
+
+    def dense_strategy(attrs, inputs, out_type, target):
+        m = inputs[0].shape[0]
+        strategy = _op.OpStrategy()
+        strategy.add_implementation(
+            wrap_compute_dense(dense_compute1),
+            wrap_topi_schedule(dense_schedule1),
+            name="dense_common")
+
+        with tvm.te.SpecializedCondition(m > 16):
+            strategy.add_implementation(
+                wrap_compute_dense(dense_compute2),
+                wrap_topi_schedule(dense_schedule2),
+                name="dense_for_large_m",
+                plevel=15)
+
+        return strategy
+
+
+Register Strategy Function to An Operator
+-----------------------------------------
+
+After we define the strategy function for an operator, we can now
+register the strategy function to this operator with
+
+.. code:: python
+
+    register_strategy("topk", strategy.topk_strategy)
+
+However, it takes much effort to write a strategy function for an operator.
+Therefore, we provide two other methods for simpler operators.
+
+First, for operators that have injective, broadcast, or reduction pattern, we
+can call ``register_injective_schedule``, ``register_broadcast_schedule``, and
+``register_reduce_schedule`` repsectively. The schedule function for these
+patterns are already registered by each target and can be applied to these
+operators. We assume the compute function should be the same across all targets,
+and ``FTVMCompute`` needs to be registered to the op before invoking register
+schedule.
+
+.. code:: python
+
+    register_broadcast_schedule("add")
+
+Second, for operators that doesn't have these common patterns mentioned before,
+but also have the same compute function for all targets, we can use
+``register_schedule`` API. It is easier to write ``FTVMSchedule`` function
+as we only need to provide which schedule function to use. The following
+code snippet shows ``FTVMSchedule`` function for pooling.
+
+.. code:: python
+
+    # add to python/tvm/relay/op/strategy/generic.py
+    @generic_func
+    def schedule_pool(attrs, outs, target):
+        with target:
+            return topi.generic.schedule_pool(outs, attrs.layout)
+
+    # add to each target file in python/tvm/relay/op/strategy, e.g., x86.py, cuda.py, etc.
+    @schedule_pool.register("cpu")
+    def schedule_pool_cpu(attrs, outs, target):
+        ...
+
+After we created the ``FTVMSchedule`` for an operator, we can
+register the strategy using ``register_schedule``:
+
+.. code:: python
+
+    register_schedule("nn.max_pool2d", strategy.schedule_pool)
+
+
+Register Strategies for A New Target
+------------------------------------
+
+There are two ways to register strategies for a new target. The more
+straightforward one is adding a new target file in the directory
+``python/tvm/relay/op/strategy``. You only need to customize the strategy for
+ops that have been implemented for this new target and reuse the generic
+strategies for the rest.
+
+Alternatively, you can also register the strategy for the new target outside the
+TVM python library. The following code snippet shows an example how to do
+so. You can find more examples in ``vta/python/vta/top/op.py``.
+
+.. code:: python
+
+    @relay.op.strategy.conv2d_strategy.register("mytarget")
+    def conv2d_strategy_mytarget(attrs, inputs, out_type, target):
+        ...
+
+
+Select Implementation from Op Strategy
+--------------------------------------
+
+During the compilation, Relay compile engine needs to determine which
+implementation to use for an operator when there are multiple. The selection
+policy works as follows.
+
+When the input tensors to an operator or a fused op all have constant shapes,
+the compile engine first finds the best implementation based on AutoTVM tuning
+logs. If there is no implementation that is an AutoTVM template or all AutoTVM
+templates have fallback configs, the implementation with highest priority level
+will then be chosen. Implementations with same priority level in this case leads
+to an undefined behavior, and any of them might be selected.
+
+The selection policy for ops with symbolic input shapes is still work in
+progess. Currently, if any input tensor has a symbolic shape, only the
+implementation with highest priority level will be used for this operator. This
+will be updated after the implemention finishes.
+
+For debug purpose, you can add the following lines before you compile the Relay
+model to learn which implementation is used for each operator.
+
+.. code:: python
+
+    logging.getLogger("compile_engine").setLevel(logging.INFO)
+    logging.getLogger("compile_engine").addHandler(logging.StreamHandler(sys.stdout))