Skip to content

Latest commit

 

History

History
200 lines (154 loc) · 5.57 KB

README.md

File metadata and controls

200 lines (154 loc) · 5.57 KB

Custom accelerators

Tinsel supports external custom accelerators written in Verilog or SystemVerilog when the UseCustomAccelerators configuration option is set. With this option, each tile in the Tinsel overlay now includes a custom accelerator. The mailbox and the custom accelerator share connections to the NoC router.

Custom accelerator interface

The accelerator interface is currently quite simple:

module ExternalTinselAccelerator
  ( // Clock and reset
    // By default, BSV synchronises on negedge(clock) and uses negative reset
    input wire clk
  , input wire rst_n

    // Coordinates of this FPGA board in the board mesh
  , input wire [`TinselMeshXBits-1:0] board_x
  , input wire [`TinselMeshYBits-1:0] board_y

    // Stream of flits coming in
  , input wire [$bits(Flit)-1:0] in_data
  , input wire in_valid
  , output wire in_ready

    // Stream of flits going out
  , output wire [$bits(Flit)-1:0] out_data
  , output wire out_valid
  , input wire out_ready
  );

  // Compile-time NoC coordinates of this accelerator
  parameter TILE_X = 0;
  parameter TILE_Y = 0;

  // Module body here
  // ...
endmodule

Note the use of Verilog macros such as TinselMeshXBits and TinselMeshYBits. These can be generated automatically by running config.py with the vpp option (which stands for Verilog pre-processor).

Flit format

Here is the flit format, as a SystemVerilog structure:

typedef struct packed {
  // Destination address
  NetAddr dest;
  // Payload
  logic [`TinselBitsPerFlit-1:0] payload;
  // Is this the final flit in the message?
  logic notFinalFlit;
  // Is this a special packet for idle-detection?
  logic isIdleToken;
} Flit;

Address format

Here is the address format for the dest field in a flit. Note the acc field, which determines whether a packet is destined for a custom accelerator or a mailbox.

typedef struct packed {
  logic acc;
  logic isKey;
  logic host;
  logic hostDir;
  logic [`TinselMeshYBits-1:0] boardY;
  logic [`TinselMeshXBits-1:0] boardX;
  logic [`TinselMailboxMeshYBits-1:0] tileY;
  logic [`TinselMailboxMeshXBits-1:0] tileX;
  logic [`TinselLogCoresPerMailbox-1:0] coreId;
  logic [`TinselLogThreadsPerCore-1:0] threadId;
} NetAddr;

Tinsel API extensions

The following Tinsel API function is provided for obtaining the address of a specified custom accelerator from software land.

inline uint32_t tinselAccId(
         uint32_t boardX, uint32_t boardY,
           uint32_t tileX, uint32_t tileY);

Full walkthrough

This walkthrough assumes you are using the POETS box Zitura -- a 2-FPGA box containing one bridge FPGA and one worker FPGA. For testing distributed applications, we could easily extend Zitura with a second worker FPGA -- please request this if you would find it helpful. We can look at using the big POETS boxes in future.

Step 1. Modify config.py to your requirements. For example, if you want a 2x2 mesh of tiles, i.e. 4 Tinsel tiles and 4 accelerators, then choose the following parameters.

# Number of bits in mailbox mesh X coord
p["MailboxMeshXBits"] = 1
 
# Number of bits in mailbox mesh Y coord
p["MailboxMeshYBits"] = 1

# Log of number of caches per DRAM port
p["LogDCachesPerDRAM"] = 1

# Enable custom accelerators (experimental feature)
p["UseCustomAccelerator"] = True

If using a single worker box such as Zitura, then also specify:

# Mesh X length within a box
p["MeshXLenWithinBox"] = 1
 
# Mesh Y length within a box
p["MeshYLenWithinBox"] = 1

Step 2. We made a sample accelerator for testing purposes: ExampleAccelerator.sv. This accelerator receives any flit and sends it to the address specified in the first word of the flit's payload.

Step 3. Update de5/Golden_top.qsf to specify location of the custom accelerator. For example, add the line

set_global_assignment -name SYSTEMVERILOG_FILE ../doc/custom/ExampleAccelerator.sv

to use the sample accelerator.

Step 4. Use the config.py script to generate Verilog macros containing all the Tinsel parameters. Make sure the generated file is in the same directory as any file that includes it.

python config.py vpp > config.v

Step 5. Build the Quartus project on a box of your choice, but avoid using Zitura for this -- it is too old and slow. In the de5 subdirectory, simply type make one to do a single build, or make to do multiple builds with different seeds (good if timing is tight).

Step 6. Copy the bitfile generated by Quartus (Golden_top.sof) into your home directory on Zitura. Then, on Zitura:

/local/tinsel/bin/fpga-power.sh on     # Turn worker FPGA on
/local/flash/flash.sh Golden_top.sof 1 # Flashing will take around 20 mins
/local/tinsel/bin/fpga-power.sh off    # Turn worker FPGA off

Users of Zitura are expected to coordinate with one another when doing a reflash. For example, announcing that you are doing a reflash on the POETS slack group (channel #zitura) would help avoid clashes and confusion.

Step 7. We have made a sample software application to go with the example accelerator. It can be found in the apps/custom/ directory of the Tinsel repository. To run this:

cd tinsel/apps/custom
make
./run

You should see the following output:

Booting
Starting
Waiting for message from accelerator
Got it: 8000

Future plans

Currently, custom accelerators break the idle-detection feature. If you would like me to fix this, please ask, as is should be fairly straightforward to do.