diff --git a/docs/deploy/index.rst b/docs/deploy/index.rst index a43cce728f61..db2938635b82 100644 --- a/docs/deploy/index.rst +++ b/docs/deploy/index.rst @@ -23,7 +23,7 @@ Deploy and Integration This page contains guidelines on how to deploy TVM to various platforms as well as how to integrate it with your project. -.. image:: http://www.tvm.ai/images/release/tvm_flexible.png +.. image:: https://tvm.apache.org/images/release/tvm_flexible.png Unlike traditional deep learning frameworks. TVM stack is divided into two major components: diff --git a/docs/dev/runtime.rst b/docs/dev/runtime.rst index 0d2a92538d31..7a001fa9b4ab 100644 --- a/docs/dev/runtime.rst +++ b/docs/dev/runtime.rst @@ -23,7 +23,7 @@ TVM Runtime System TVM supports multiple programming languages for the compiler stack development and deployment. In this note, we explain the key elements of the TVM runtime. -.. image:: http://www.tvm.ai/images/release/tvm_flexible.png +.. image:: https://tvm.apache.org/images/release/tvm_flexible.png We need to satisfy quite a few interesting requirements: @@ -174,7 +174,7 @@ Remote Deployment The PackedFunc and Module system also makes it easy to ship the function into remote devices directly. Under the hood, we have an RPCModule that serializes the arguments to do the data movement and launches the computation on the remote. -.. image:: http://www.tvm.ai/images/release/tvm_rpc.png +.. image:: https://tvm.apache.org/images/release/tvm_rpc.png The RPC server itself is minimum and can be bundled into the runtime. We can start a minimum TVM RPC server on iPhone/android/raspberry pi or even the browser. The cross compilation on server and shipping of the module for testing can be done in the same script. Checkout diff --git a/docs/vta/dev/hardware.rst b/docs/vta/dev/hardware.rst index 6eb30407997f..4d06826a22ab 100644 --- a/docs/vta/dev/hardware.rst +++ b/docs/vta/dev/hardware.rst @@ -36,7 +36,7 @@ In addition the design adopts decoupled access-execute to hide memory access lat To a broader extent, VTA can serve as a template deep learning accelerator design for full stack optimization, exposing a generic tensor computation interface to the compiler stack. -.. image:: http://raw.githubusercontent.com/uwsaml/web-data/master/vta/blogpost/vta_overview.png +.. image:: https://raw.githubusercontent.com/uwsaml/web-data/master/vta/blogpost/vta_overview.png :align: center :width: 80% @@ -175,7 +175,7 @@ Finally, the ``STORE`` instructions are executed by the store module exclusively The fields of each instruction is described in the figure below. The meaning of each field will be further explained in the :ref:`vta-uarch` section. -.. image:: http://raw.githubusercontent.com/uwsaml/web-data/master/vta/developer/vta_instructions.png +.. image:: https://raw.githubusercontent.com/uwsaml/web-data/master/vta/developer/vta_instructions.png :align: center :width: 100% @@ -191,7 +191,7 @@ VTA relies on dependence FIFO queues between hardware modules to synchronize the The figure below shows how a given hardware module can execute concurrently from its producer and consumer modules in a dataflow fashion through the use of dependence FIFO queues, and single-reader/single-writer SRAM buffers. Each module is connected to its consumer and producer via read-after-write (RAW) and write-after-read (WAR) dependence queues. -.. image:: http://raw.githubusercontent.com/uwsaml/web-data/master/vta/developer/dataflow.png +.. image:: https://raw.githubusercontent.com/uwsaml/web-data/master/vta/developer/dataflow.png :align: center :width: 100% @@ -258,7 +258,7 @@ There are two types of compute micro-ops: ALU and GEMM operations. To minimize the footprint of micro-op kernels, while avoiding the need for control-flow instructions such as conditional jumps, the compute module executes micro-op sequences inside a two-level nested loop that computes the location of each tensor register location via an affine function. This compression approach helps reduce the micro-kernel instruction footprint, and applies to both matrix multiplication and 2D convolution, commonly found in neural network operators. -.. image:: http://raw.githubusercontent.com/uwsaml/web-data/master/vta/developer/gemm_core.png +.. image:: https://raw.githubusercontent.com/uwsaml/web-data/master/vta/developer/gemm_core.png :align: center :width: 100% @@ -269,7 +269,7 @@ This tensorization intrinsic is defined by the dimensions of the input, weight a Each data type can have a different integer precision: typically both weight and input types are low-precision (8-bits or less), while the accumulator tensor has a wider type to prevent overflows (32-bits). In order to keep the GEMM core busy, each of the input buffer, weight buffer, and register file have to expose sufficient read/write bandwidth. -.. image:: http://raw.githubusercontent.com/uwsaml/web-data/master/vta/developer/alu_core.png +.. image:: https://raw.githubusercontent.com/uwsaml/web-data/master/vta/developer/alu_core.png :align: center :width: 100% @@ -289,7 +289,7 @@ The micro-code in the context of tensor ALU computation only takes care of speci Load and Store Modules ~~~~~~~~~~~~~~~~~~~~~~ -.. image:: http://raw.githubusercontent.com/uwsaml/web-data/master/vta/developer/2d_dma.png +.. image:: https://raw.githubusercontent.com/uwsaml/web-data/master/vta/developer/2d_dma.png :align: center :width: 100% diff --git a/docs/vta/dev/index.rst b/docs/vta/dev/index.rst index 575a9d4b2c51..0ba3bd1ec2a4 100644 --- a/docs/vta/dev/index.rst +++ b/docs/vta/dev/index.rst @@ -20,7 +20,7 @@ VTA Design and Developer Guide This developer guide details the complete VTA-TVM hardware-software stack. -.. image:: http://raw.githubusercontent.com/uwsaml/web-data/master/vta/blogpost/vta_stack.png +.. image:: https://raw.githubusercontent.com/uwsaml/web-data/master/vta/blogpost/vta_stack.png :align: center :width: 60% diff --git a/docs/vta/index.rst b/docs/vta/index.rst index 9d786fdd9dea..357c0616eb4a 100644 --- a/docs/vta/index.rst +++ b/docs/vta/index.rst @@ -22,7 +22,7 @@ VTA: Deep Learning Accelerator Stack The Versatile Tensor Accelerator (VTA) is an open, generic, and customizable deep learning accelerator with a complete TVM-based compiler stack. We designed VTA to expose the most salient and common characteristics of mainstream deep learning accelerators. Together TVM and VTA form an end-to-end hardware-software deep learning system stack that includes hardware design, drivers, a JIT runtime, and an optimizing compiler stack based on TVM. -.. image:: http://raw.githubusercontent.com/uwsampl/web-data/master/vta/blogpost/vta_overview.png +.. image:: https://raw.githubusercontent.com/uwsampl/web-data/master/vta/blogpost/vta_overview.png :align: center :width: 60%