Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[VTA] Refactor to increase platform coverage (Ultra96 etc.) #3496

Merged
merged 64 commits into from
Jul 29, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
64 commits
Select commit Hold shift + click to select a range
21f20d4
hardware refactor for increased FPGA coverage, small optimizations
tmoreau89 Jul 2, 2019
7577717
fix header
tmoreau89 Jul 4, 2019
9ed1535
cleaning up parameters that won't be needed for now
tmoreau89 Jul 4, 2019
66f27e8
streamlining makefile, and simplifying tcl scripts
tmoreau89 Jul 5, 2019
d631d72
moving parameter derivation into pkg_config.py, keeping tcl scripts l…
tmoreau89 Jul 6, 2019
ef83ceb
refactoring tcl script to avoid global variables
tmoreau89 Jul 8, 2019
19c5b75
deriving AXI signals in pkg_config.py
tmoreau89 Jul 8, 2019
0b55f88
unifying address map definition for hardware and software drivers
tmoreau89 Jul 9, 2019
073be04
single channel design for ultra96 to simplify build
tmoreau89 Jul 9, 2019
1506708
enable alu by default, no mul opcode for now
tmoreau89 Jul 9, 2019
c0ac967
hardware fix
tmoreau89 Jul 9, 2019
54755f4
new bitstream; vta version
tmoreau89 Jul 9, 2019
a68758c
avoid error when env variable is not set
tmoreau89 Jul 9, 2019
f00ab18
ultra96 cleanup
tmoreau89 Jul 9, 2019
689dd54
further cleaning up tcl script for bitstream generation
tmoreau89 Jul 10, 2019
001561d
preliminary rpc server support on ultra96
tmoreau89 Jul 10, 2019
47f25fb
rpc server tracker scripts
tmoreau89 Jul 10, 2019
bec1a4d
ultra96 ldflag
tmoreau89 Jul 11, 2019
76f9d84
ultra96 support
tmoreau89 Jul 11, 2019
389b7e7
ultra96 support
tmoreau89 Jul 11, 2019
14163c1
cleanup line
tmoreau89 Jul 12, 2019
1874192
cmake support for ultra96
tmoreau89 Jul 12, 2019
3fd595e
simplify memory instantiation
tmoreau89 Jul 12, 2019
382669d
cleaning up IP parameter initialization
tmoreau89 Jul 12, 2019
2ab49ef
fix queue instantiation
tmoreau89 Jul 12, 2019
c31927f
2019.1 transition
tmoreau89 Jul 13, 2019
b61f546
fix macro def
tmoreau89 Jul 13, 2019
5c101b4
removing bus width from config
tmoreau89 Jul 13, 2019
9600436
cleanup
tmoreau89 Jul 13, 2019
caf230b
fix
tmoreau89 Jul 13, 2019
4c70b0e
turning off testing for now
tmoreau89 Jul 14, 2019
4fb9121
cleanup ultra96 ps insantiation
tmoreau89 Jul 14, 2019
0e3fb59
minor refactor
tmoreau89 Jul 14, 2019
3c35f14
adding comments
tmoreau89 Jul 18, 2019
2a7985d
upgrading to tophub v0.6
tmoreau89 Jul 18, 2019
7c74cb4
model used in TVM target now refers to a specific version of VTA for …
tmoreau89 Jul 18, 2019
240ec61
revert change due to bug
tmoreau89 Jul 19, 2019
d8f4c90
rename driver files to be for zynq-type devices
tmoreau89 Jul 19, 2019
15558cd
streamlining address mapping
tmoreau89 Jul 22, 2019
095e623
unifying register map offset values between driver and hardware gener…
tmoreau89 Jul 22, 2019
150b661
rely on cma library for cache flush/invalidation
tmoreau89 Jul 24, 2019
0107b06
coherence management
tmoreau89 Jul 25, 2019
ba90bce
not make buffer packing depend on data types that can be wider than 6…
tmoreau89 Jul 25, 2019
01caa90
refactor config derivation to minimize free parameters
tmoreau89 Jul 25, 2019
37d7d37
fix environment/pkg config interaction
tmoreau89 Jul 25, 2019
a6d2a0a
adding cfg dump property to pkgconfig:
tmoreau89 Jul 25, 2019
cac6b4a
fix rpc reconfig
tmoreau89 Jul 25, 2019
be70933
fix spacing
tmoreau89 Jul 26, 2019
9e230e4
cleanup
tmoreau89 Jul 26, 2019
efcb849
fix spacing
tmoreau89 Jul 26, 2019
c11bd8e
long line fix
tmoreau89 Jul 26, 2019
4633e6b
fix spacing and lint
tmoreau89 Jul 26, 2019
fc6cf79
fix line length
tmoreau89 Jul 26, 2019
03916b4
cmake fix
tmoreau89 Jul 26, 2019
ca68ff4
environment fix
tmoreau89 Jul 26, 2019
300a581
renaming after pynq since the driver stack relies on the pynq library…
tmoreau89 Jul 26, 2019
109f3de
update doc
tmoreau89 Jul 26, 2019
965ce98
adding parameterization to name
tmoreau89 Jul 26, 2019
d94a751
space
tmoreau89 Jul 27, 2019
12ab252
removing reg width
tmoreau89 Jul 27, 2019
714b885
vta RPC
tmoreau89 Jul 27, 2019
f4234f3
update doc on how to edit vta_config.json
tmoreau89 Jul 28, 2019
84c3405
fix path
tmoreau89 Jul 28, 2019
46db39b
fix path
tmoreau89 Jul 28, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
File renamed without changes.
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,10 @@
# under the License.
PROJROOT="$( cd "$( dirname "${BASH_SOURCE[0]}" )/../../" && pwd )"

# Derive target specified by vta_config.json
VTA_CONFIG=${PROJROOT}/vta/config/vta_config.py
TARGET=$(python ${VTA_CONFIG} --target)

export PYTHONPATH=${PYTHONPATH}:${PROJROOT}/python:${PROJROOT}/vta/python
export PYTHONPATH=${PYTHONPATH}:/home/xilinx/pynq
python3 -m vta.exec.rpc_server --tracker fleet:9190 --key pynq
python3 -m vta.exec.rpc_server --tracker fleet:9190 --key $TARGET
17 changes: 12 additions & 5 deletions cmake/modules/VTA.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -38,18 +38,25 @@ elseif(PYTHON)
string(REGEX MATCHALL "(^| )-D[A-Za-z0-9_=.]*" VTA_DEFINITIONS "${__vta_defs}")

file(GLOB VTA_RUNTIME_SRCS vta/src/*.cc)
file(GLOB __vta_target_srcs vta/src/${VTA_TARGET}/*.cc)
# Add sim driver sources
if(${VTA_TARGET} STREQUAL "sim")
file(GLOB __vta_target_srcs vta/src/sim/*.cc)
endif()
# Add pynq driver sources
if(${VTA_TARGET} STREQUAL "pynq" OR ${VTA_TARGET} STREQUAL "ultra96")
file(GLOB __vta_target_srcs vta/src/pynq/*.cc)
endif()
list(APPEND VTA_RUNTIME_SRCS ${__vta_target_srcs})

add_library(vta SHARED ${VTA_RUNTIME_SRCS})

# Add tsim driver sources
if(${VTA_TARGET} STREQUAL "tsim")
target_compile_definitions(vta PUBLIC USE_TSIM)
include_directories("vta/include")
file(GLOB RUNTIME_DPI_SRCS vta/src/dpi/module.cc)
list(APPEND RUNTIME_SRCS ${RUNTIME_DPI_SRCS})
endif()

add_library(vta SHARED ${VTA_RUNTIME_SRCS})

target_include_directories(vta PUBLIC vta/include)

foreach(__def ${VTA_DEFINITIONS})
Expand All @@ -62,7 +69,7 @@ elseif(PYTHON)
endif(APPLE)

# PYNQ rules for Pynq v2.4
if(${VTA_TARGET} STREQUAL "pynq")
if(${VTA_TARGET} STREQUAL "pynq" OR ${VTA_TARGET} STREQUAL "ultra96")
find_library(__cma_lib NAMES cma PATH /usr/lib)
target_link_libraries(vta ${__cma_lib})
endif()
Expand Down
23 changes: 5 additions & 18 deletions docs/vta/dev/config.rst
Original file line number Diff line number Diff line change
Expand Up @@ -36,10 +36,6 @@ below.
+=======================+============+========================================================+
| ``TARGET`` | String | The TVM device target. |
+-----------------------+------------+--------------------------------------------------------+
| ``HW_TARGET`` | Int | FPGA frequency in MHz. |
+-----------------------+------------+--------------------------------------------------------+
| ``HW_CLK_TARGET`` | Int | FPGA clock period in ns target for HLS tool. |
+-----------------------+------------+--------------------------------------------------------+
| ``HW_VER`` | String | VTA hardware version number. |
+-----------------------+------------+--------------------------------------------------------+
| ``LOG_INP_WIDTH`` | Int (log2) | Input data type signed integer width. |
Expand All @@ -48,13 +44,9 @@ below.
+-----------------------+------------+--------------------------------------------------------+
| ``LOG_ACC_WIDTH`` | Int (log2) | Accumulator data type signed integer width. |
+-----------------------+------------+--------------------------------------------------------+
| ``LOG_OUT_WIDTH`` | Int (log2) | Output data type signed integer width. |
+-----------------------+------------+--------------------------------------------------------+
| ``LOG_BATCH`` | Int (log2) | VTA matrix multiply intrinsic output dimension 0. |
+-----------------------+------------+--------------------------------------------------------+
| ``LOG_BLOCK_IN`` | Int (log2) | VTA matrix multiply reduction dimension. |
| ``LOG_BATCH`` | Int (log2) | VTA matrix multiply intrinsic input/output dimension 0.|
+-----------------------+------------+--------------------------------------------------------+
| ``LOG_BLOCK_OUT`` | Int (log2) | VTA matrix multiply intrinsic output dimension 1. |
| ``LOG_BLOCK`` | Int (log2) | VTA matrix multiply inner dimensions. |
+-----------------------+------------+--------------------------------------------------------+
| ``LOG_UOP_BUFF_SIZE`` | Int (log2) | Micro-op on-chip buffer in Bytes. |
+-----------------------+------------+--------------------------------------------------------+
Expand All @@ -75,13 +67,8 @@ below.

We provide additional detail below regarding each parameter:

- ``TARGET``: Can be set to ``"pynq"`` or ``"sim"``.
- ``HW_TARGET``: In pynq mode, can be set to ``100``, ``142``, ``167``, or ``200`` MHz.
- ``HW_CLK_TARGET``: The lower the target, the more pipeline stages HLS will insert to achieve timing closure during place and route (this can also slightly decrease performance).
- ``TARGET``: Can be set to ``"pynq"``, ``"ultra96"``, ``"sim"`` (fast simulator), or ``"tsim"`` (cycle accurate sim with verilator).
- ``HW_VER``: Hardware version which increments everytime the VTA hardware design changes. This parameter is used to uniquely idenfity hardware bitstreams.
- ``LOG_OUT_WIDTH``: We recommend matching ``LOG_OUT_WIDTH`` to ``LOG_INP_WIDTH``.
- ``LOG_BATCH``: Equivalent to A in multiplication of shape (A, B) x (B, C), or typically, the batch dimension.
- ``LOG_BATCH``: Equivalent to A in multiplication of shape (A, B) x (B, C), or typically, the batch dimension.
- ``LOG_BLOCK_IN``: Equivalent to B in multiplication of shape (A, B) x (B, C), or typically, the input channel dimension.
- ``LOG_BLOCK_OUT``: Equivalent to C in multiplication of shape (A, B) x (B, C), or typically, the output channel dimension.
- ``LOG_BATCH``: Equivalent to A in multiplication of shape (A, B) x (B, C), or typically, the batch dimension of inner tensor computation.
- ``LOG_BLOCK``: Equivalent to B and C in multiplication of shape (A, B) x (B, C), or typically, the input/output channel dimensions of the innter tensor computation.

25 changes: 12 additions & 13 deletions docs/vta/install.md
Original file line number Diff line number Diff line change
Expand Up @@ -61,7 +61,7 @@ To do so,

```bash
cd <tvm root>
cp vta/config/vta_config.json vta_config.json
vim vta/config/vta_config.json
# edit vta_config.json
make vta
```
Expand Down Expand Up @@ -118,7 +118,7 @@ cd /home/xilinx/tvm
mkdir build
cp cmake/config.cmake build/.
# Copy pynq specific configuration
cp vta/config/pynq_sample.json build/vta_config.json
cp vta/config/pynq_sample.json vta/config/vta_config.json
cd build
cmake ..
make runtime vta -j2
Expand Down Expand Up @@ -147,13 +147,12 @@ export VTA_PYNQ_RPC_PORT=9091
```

In addition, you'll need to edit the `vta_config.json` file on the host to indicate that we are targeting the Pynq platform, by setting the `TARGET` field to `"pynq"`.
Alternatively, you can copy the default `vta/config/pynq_sample.json` into the TVM root as `vta_config.json`.
> Note: in contrast to our simulation setup, there are no libraries to compile on the host side since the host offloads all of the computation to the Pynq board.

```bash
# On the Host-side
cd <tvm root>
cp vta/config/pynq_sample.json vta_config.json
cp vta/config/pynq_sample.json vta/config/vta_config.json
```

This time again, we will run the 2D convolution testbench.
Expand Down Expand Up @@ -187,28 +186,28 @@ This third and last guide allows users to generate custom VTA bitstreams using f

### Xilinx Toolchain Installation

We recommend using `Vivado 2018.2` since our scripts have been tested to work on this version of the Xilinx toolchains.
We recommend using `Vivado 2019.1` since our scripts have been tested to work on this version of the Xilinx toolchains.
Our guide is written for Linux (Ubuntu) installation.

You’ll need to install Xilinx’ FPGA compilation toolchain, [Vivado HL WebPACK 2018.2](https://www.xilinx.com/products/design-tools/vivado.html), which a license-free version of the Vivado HLx toolchain.
You’ll need to install Xilinx’ FPGA compilation toolchain, [Vivado HL WebPACK 2019.1](https://www.xilinx.com/products/design-tools/vivado.html), which a license-free version of the Vivado HLx toolchain.

#### Obtaining and Launching the Vivado GUI Installer

1. Go to the [download webpage](https://www.xilinx.com/support/download/index.html/content/xilinx/en/downloadNav/vivado-design-tools/2018-2.html), and download the Linux Self Extracting Web Installer for Vivado HLx 2018.2: WebPACK and Editions.
1. Go to the [download webpage](https://www.xilinx.com/support/download/index.html/content/xilinx/en/downloadNav/vivado-design-tools/2019-1.html), and download the Linux Self Extracting Web Installer for Vivado HLx 2019.1: WebPACK and Editions.
2. You’ll have to sign in with a Xilinx account. This requires a Xilinx account creation that will take 2 minutes.
3. Complete the Name and Address Verification by clicking “Next”, and you will get the opportunity to download a binary file, called `Xilinx_Vivado_SDK_Web_2018.2_0614_1954_Lin64.bin`.
3. Complete the Name and Address Verification by clicking “Next”, and you will get the opportunity to download a binary file, called `Xilinx_Vivado_SDK_Web_2019.1_0524_1430_Lin64.bin`.
4. Now that the file is downloaded, go to your `Downloads` directory, and change the file permissions so it can be executed:
```bash
chmod u+x Xilinx_Vivado_SDK_Web_2018.2_0614_1954_Lin64.bin
chmod u+x Xilinx_Vivado_SDK_Web_2019.1_0524_1430_Lin64.bin
```
5. Now you can execute the binary:
```bash
./Xilinx_Vivado_SDK_Web_2018.2_0614_1954_Lin64.bin
./Xilinx_Vivado_SDK_Web_2019.1_0524_1430_Lin64.bin
```

#### Xilinx Vivado GUI Installer Steps

At this point you've launched the Vivado 2018.2 Installer GUI program.
At this point you've launched the Vivado 2019.1 Installer GUI program.

1. Click “Next” on the *Welcome* screen.
2. On the *Select Install Type* screen, enter your Xilinx user credentials under the “User Authentication” box and select the “Download and Install Now” option before clicking “Next” .
Expand All @@ -230,8 +229,8 @@ At this point you've launched the Vivado 2018.2 Installer GUI program.

The last step is to update your `~/.bashrc` with the following lines. This will include all of the Xilinx binary paths so you can launch compilation scripts from the command line.
```bash
# Xilinx Vivado 2018.2 environment
export XILINX_VIVADO=${XILINX_PATH}/Vivado/2018.2
# Xilinx Vivado 2019.1 environment
export XILINX_VIVADO=${XILINX_PATH}/Vivado/2019.1
export PATH=${XILINX_VIVADO}/bin:${PATH}
```

Expand Down
2 changes: 1 addition & 1 deletion python/tvm/autotvm/tophub.py
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@
'opencl': "v0.02",
'mali': "v0.05",

'vta': "v0.05",
'vta': "v0.06",
}

logger = logging.getLogger('autotvm')
Expand Down
10 changes: 3 additions & 7 deletions vta/config/pynq_sample.json
Original file line number Diff line number Diff line change
@@ -1,17 +1,13 @@
{
"TARGET" : "pynq",
"HW_FREQ" : 100,
"HW_CLK_TARGET" : 8,
"HW_VER" : "0.0.0",
"HW_VER" : "0.0.1",
"LOG_INP_WIDTH" : 3,
"LOG_WGT_WIDTH" : 3,
"LOG_ACC_WIDTH" : 5,
"LOG_OUT_WIDTH" : 3,
"LOG_BATCH" : 0,
"LOG_BLOCK_IN" : 4,
"LOG_BLOCK_OUT" : 4,
"LOG_BLOCK" : 4,
"LOG_UOP_BUFF_SIZE" : 15,
"LOG_INP_BUFF_SIZE" : 15,
"LOG_INP_BUFF_SIZE" :15,
"LOG_WGT_BUFF_SIZE" : 18,
"LOG_ACC_BUFF_SIZE" : 17
}
13 changes: 13 additions & 0 deletions vta/config/ultra96_sample.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
{
"TARGET" : "ultra96",
"HW_VER" : "0.0.1",
"LOG_INP_WIDTH" : 3,
"LOG_WGT_WIDTH" : 3,
"LOG_ACC_WIDTH" : 5,
"LOG_BATCH" : 0,
"LOG_BLOCK" : 4,
"LOG_UOP_BUFF_SIZE" : 15,
"LOG_INP_BUFF_SIZE" :15,
"LOG_WGT_BUFF_SIZE" : 18,
"LOG_ACC_BUFF_SIZE" : 17
}
8 changes: 2 additions & 6 deletions vta/config/vta_config.json
Original file line number Diff line number Diff line change
@@ -1,15 +1,11 @@
{
"TARGET" : "sim",
"HW_FREQ" : 100,
"HW_CLK_TARGET" : 7,
"HW_VER" : "0.0.0",
"HW_VER" : "0.0.1",
"LOG_INP_WIDTH" : 3,
"LOG_WGT_WIDTH" : 3,
"LOG_ACC_WIDTH" : 5,
"LOG_OUT_WIDTH" : 3,
"LOG_BATCH" : 0,
"LOG_BLOCK_IN" : 4,
"LOG_BLOCK_OUT" : 4,
"LOG_BLOCK" : 4,
"LOG_UOP_BUFF_SIZE" : 15,
"LOG_INP_BUFF_SIZE" : 15,
"LOG_WGT_BUFF_SIZE" : 18,
Expand Down
Loading