From 9b59b87d30ca40d6601227f2e701239b84036915 Mon Sep 17 00:00:00 2001
From: Luke Hutton <luke.hutton@arm.com>
Date: Fri, 24 Jul 2020 10:57:45 +0100
Subject: [PATCH] Address comments

Change-Id: I88db6d9d539a8f06e2dfe1b9a0a3ac7a4b46cece
---
 docs/deploy/arm_compute_lib.rst | 37 ++++++++++++++++++++++-----------
 1 file changed, 25 insertions(+), 12 deletions(-)

diff --git a/docs/deploy/arm_compute_lib.rst b/docs/deploy/arm_compute_lib.rst
index 7c550ab85564..eaffc0a565d8 100644
--- a/docs/deploy/arm_compute_lib.rst
+++ b/docs/deploy/arm_compute_lib.rst
@@ -15,8 +15,9 @@
     specific language governing permissions and limitations
     under the License.
 
-Relay Arm|reg| Compute Library Integration
-==========================================
+Relay Arm :sup:`®` Compute Library Integration
+==============================================
+**Author**: `Luke Hutton <https://github.com/lhutton1>`_
 
 Introduction
 ------------
@@ -29,6 +30,10 @@ a performance boost on such devices.
 Installing Arm Compute Library
 ------------------------------
 
+Before installing Arm Compute Library, it is important to know what architecture to build for. One way
+to determine this is to use `lscpu` and look for the "Model name" of the CPU. You can then use this to
+determine the architecture by looking online.
+
 We recommend two different ways to build and install ACL:
 
 * Use the script located at `docker/install/ubuntu_install_arm_compute_library.sh`. You can use this
@@ -38,7 +43,14 @@ We recommend two different ways to build and install ACL:
   `install_path`.
 * Alternatively, you can download and use pre-built binaries from:
   https://github.com/ARM-software/ComputeLibrary/releases. When using this package, you will need to
-  select the binaries for the architecture you require and make sure they are visible to cmake.
+  select the binaries for the architecture you require and make sure they are visible to cmake. This
+  can be done like so:
+
+  .. code:: bash
+
+      cd <acl-prebuilt-package>/lib
+      mv ./linux-<architecture-to-build-for>-neon/* .
+
 
 In both cases you will need to set USE_ARM_COMPUTE_LIB_GRAPH_RUNTIME to the path where the ACL package
 is located. Cmake will look in /path-to-acl/ along with /path-to-acl/lib and /path-to-acl/build for the
@@ -60,8 +72,9 @@ to compile an ACL module on an x86 machine and then run the module on a remote A
 need to use USE_ARM_COMPUTE_LIB=ON on the x86 machine and USE_ARM_COMPUTE_LIB_GRAPH_RUNTIME=ON on the remote
 AArch64 device.
 
-Using USE_ARM_COMPUTE_LIB_GRAPH_RUNTIME=ON will mean that ACL binaries are searched for by cmake in the
-default locations (see https://cmake.org/cmake/help/v3.4/command/find_library.html). In addition to this,
+By default both options are set to OFF. Using USE_ARM_COMPUTE_LIB_GRAPH_RUNTIME=ON will mean that ACL
+binaries are searched for by cmake in the default locations
+(see https://cmake.org/cmake/help/v3.4/command/find_library.html). In addition to this,
 /path-to-tvm-project/acl/ will also be searched. It is likely that you will need to set your own path to
 locate ACL. This can be done by specifying a path in the place of ON.
 
@@ -105,7 +118,7 @@ max_pool2d operator).
 
 Annotate and partition the graph for ACL.
 
-..code:: python
+.. code:: python
 
     from tvm.relay.op.contrib.arm_compute_lib import partition_for_arm_compute_lib
     module = partition_for_arm_compute_lib(module)
@@ -131,7 +144,7 @@ Export the module.
 
 Run Inference. This must be on an Arm device. If compiling on x86 device and running on AArch64,
 consider using the RPC mechanism. Tutorials for using the RPC mechanism:
-https://tvm.apache.org/docs/tutorials/cross_compilation_and_rpc.html#sphx-glr-tutorials-cross-compilation-and-rpc-py
+https://tvm.apache.org/docs/tutorials/get_started/cross_compilation_and_rpc.html
 
 .. code:: python
 
@@ -186,12 +199,12 @@ what needs to be changed and where, it will not however dive into the complexiti
 individual operator. This is left to the developer.
 
 There are a series of files we need to make changes to:
+
 * `python/relay/op/contrib/arm_compute_lib.py` In this file we define the operators we wish to offload using the
-`op.register` decorator. This will mean the annotation pass recognizes this operator as ACL
-offloadable.
+  `op.register` decorator. This will mean the annotation pass recognizes this operator as ACL offloadable.
 * `src/relay/backend/contrib/arm_compute_lib/codegen.cc` Implement `Create[OpName]JSONNode` method. This is where we
-declare how the operator should be represented by JSON. This will be used to create the ACL module.
+  declare how the operator should be represented by JSON. This will be used to create the ACL module.
 * `src/runtime/contrib/arm_compute_lib/acl_runtime.cc` Implement `Create[OpName]Layer` method. This is where we
-define how the JSON representation can be used to create an ACL function. We simply define how to
-translate from the JSON representation to ACL API.
+  define how the JSON representation can be used to create an ACL function. We simply define how to
+  translate from the JSON representation to ACL API.
 * `tests/python/contrib/test_arm_compute_lib` Add unit tests for the given operator.