Tinsel supports external custom accelerators written in Verilog or
SystemVerilog when the UseCustomAccelerators
configuration option is
set. With this option, each tile in the Tinsel overlay
now includes a custom accelerator. The mailbox and the custom
accelerator share connections to the NoC router.
The accelerator interface is currently quite simple:
module ExternalTinselAccelerator
( // Clock and reset
// By default, BSV synchronises on negedge(clock) and uses negative reset
input wire clk
, input wire rst_n
// Coordinates of this FPGA board in the board mesh
, input wire [`TinselMeshXBits-1:0] board_x
, input wire [`TinselMeshYBits-1:0] board_y
// Stream of flits coming in
, input wire [$bits(Flit)-1:0] in_data
, input wire in_valid
, output wire in_ready
// Stream of flits going out
, output wire [$bits(Flit)-1:0] out_data
, output wire out_valid
, input wire out_ready
);
// Compile-time NoC coordinates of this accelerator
parameter TILE_X = 0;
parameter TILE_Y = 0;
// Module body here
// ...
endmodule
Note the use of Verilog macros such as TinselMeshXBits
and
TinselMeshYBits
. These can be generated automatically by running
config.py
with the vpp
option (which stands for Verilog pre-processor).
Here is the flit format, as a SystemVerilog structure:
typedef struct packed {
// Destination address
NetAddr dest;
// Payload
logic [`TinselBitsPerFlit-1:0] payload;
// Is this the final flit in the message?
logic notFinalFlit;
// Is this a special packet for idle-detection?
logic isIdleToken;
} Flit;
Here is the address format for the dest
field in a flit. Note the
acc
field, which determines whether a packet is destined for a
custom accelerator or a mailbox.
typedef struct packed {
logic acc;
logic isKey;
logic host;
logic hostDir;
logic [`TinselMeshYBits-1:0] boardY;
logic [`TinselMeshXBits-1:0] boardX;
logic [`TinselMailboxMeshYBits-1:0] tileY;
logic [`TinselMailboxMeshXBits-1:0] tileX;
logic [`TinselLogCoresPerMailbox-1:0] coreId;
logic [`TinselLogThreadsPerCore-1:0] threadId;
} NetAddr;
The following Tinsel API function is provided for obtaining the address of a specified custom accelerator from software land.
inline uint32_t tinselAccId(
uint32_t boardX, uint32_t boardY,
uint32_t tileX, uint32_t tileY);
This walkthrough assumes you are using the POETS box Zitura -- a 2-FPGA box containing one bridge FPGA and one worker FPGA. For testing distributed applications, we could easily extend Zitura with a second worker FPGA -- please request this if you would find it helpful. We can look at using the big POETS boxes in future.
Step 1. Modify config.py
to your requirements. For example, if
you want a 2x2 mesh of tiles, i.e. 4 Tinsel tiles and 4 accelerators,
then choose the following parameters.
# Number of bits in mailbox mesh X coord
p["MailboxMeshXBits"] = 1
# Number of bits in mailbox mesh Y coord
p["MailboxMeshYBits"] = 1
# Log of number of caches per DRAM port
p["LogDCachesPerDRAM"] = 1
# Enable custom accelerators (experimental feature)
p["UseCustomAccelerator"] = True
If using a single worker box such as Zitura, then also specify:
# Mesh X length within a box
p["MeshXLenWithinBox"] = 1
# Mesh Y length within a box
p["MeshYLenWithinBox"] = 1
Step 2. We made a sample accelerator for testing purposes: ExampleAccelerator.sv. This accelerator receives any flit and sends it to the address specified in the first word of the flit's payload.
Step 3. Update de5/Golden_top.qsf
to specify location of the
custom accelerator. For example, add the line
set_global_assignment -name SYSTEMVERILOG_FILE ../doc/custom/ExampleAccelerator.sv
to use the sample accelerator.
Step 4. Use the config.py
script to generate Verilog macros
containing all the Tinsel parameters. Make sure the generated file is
in the same directory as any file that includes it.
python config.py vpp > config.v
Step 5. Build the Quartus project on a box of your choice, but
avoid using Zitura for this -- it is too old and slow. In the de5
subdirectory, simply type make one
to do a single build, or make
to do multiple builds with different seeds (good if timing is tight).
Step 6. Copy the bitfile generated by Quartus (Golden_top.sof
)
into your home directory on Zitura. Then, on Zitura:
/local/tinsel/bin/fpga-power.sh on # Turn worker FPGA on
/local/flash/flash.sh Golden_top.sof 1 # Flashing will take around 20 mins
/local/tinsel/bin/fpga-power.sh off # Turn worker FPGA off
Users of Zitura are expected to coordinate with one another when
doing a reflash. For example, announcing that you are doing a reflash
on the POETS slack group (channel #zitura
) would help avoid clashes
and confusion.
Step 7. We have made a sample software application to go with the
example accelerator. It can be found in the apps/custom/
directory
of the Tinsel repository. To run this:
cd tinsel/apps/custom
make
./run
You should see the following output:
Booting
Starting
Waiting for message from accelerator
Got it: 8000
Currently, custom accelerators break the idle-detection feature. If you would like me to fix this, please ask, as is should be fairly straightforward to do.