-
Notifications
You must be signed in to change notification settings - Fork 140
NetThreadsRE
NetThreads allows you to run threaded software on the NetFPGA with very little effort. Researchers interested to prototype ideas and test new theories/algorithms/applications, will find that it is much easier to write a few lines of C code than to write, synthesize and debug Verilog. The programs are cross-compiled (tools included in the release) and executed by custom soft-processors instantiated in the FPGA. By using NetThreads you can create applications for the NetFPGA without having to write a single line of Verilog.
This release of NetThreads is optimized for routing: it has a merged input and output buffer so it doesn't require coying packets from the input to the output which speeds up forwarding immensely. It also has thread scheduling for fast critical section execution paper). In this release is also included a simulator for detailed debugging/performance evaluation.
NetThreads is part of a larger project studying soft processor architectures, with the goal of allowing software programmers to easily take advantage of the FPGA fabric. The bitfile released corresponds to a 4-way multithreaded, 5-pipeline-stage, two-processor system described in the paper. This version of the processor functions with the default round-robin thread interleaving, which should give you a better performance than a comparable single-threaded 2-processor system, when all the threads compute in parallel.
- Status :
- Version :
- Authors :
- NetFPGA base source :
- Release document :
First, make sure that you installed the NetFPGA base properly (i.e. it is listed as a network device using ifconfig), that you can load a bitfile on the FPGA and verify that packets flow as expected through the design.
To run a program on the NetFPGA, you need to obtain the compiler package, install it, and run the sample applications. Once you establish that those applications work, you can move on to edit those applications or create your own.
Download from NetThreads-RE
Decompress the tarball,
tar xzvf netthreads_2.0.tar.gz
you should obtain the following result:
netthreads-re/ bit/ compiler/ loader/ sim/ src/ bench/ pipe/ ping/ reference_router template/ common/
Reprogram the CPCI.
cpci_reprogram.pl --all
This needs to be done once every time the computer is booted (doing it more often doesn't hurt though). The NetThreads system was built against the Verilog files from an older version of the NetFPGA distribution (1.2.5), and it is recommended to use a matching CPCI bit-file (there are however unverified reports of it working with version 2.0).
Load the bitfile onto the FPGA:
nf2_download netthreads-re/bit/netthreads-re.bit
UPDATE:
If you get a message similar to this:
Error Registers: 0 Good, after resetting programming interface the FIFO is empty Download completed - 2377668 bytes. (expected 2377668). DONE went high - chip has been successfully programmed. CPCI Information ---------------- Version: 4 (rev 1) Device (Virtex) Information --------------------------- Device info not found WARNING: NetFPGA device info not found. Cannot verify that the CPCI version matches. ######################################################################do not be alarmed, all is well. While the host software has new checks to determine if the code is loaded properly (that I didn't take into consideration when I built the bitfile), the FPGA was nonetheless programmed successfully and NetThreads-RE should be operational.
This test consists of a executing a simple application where packets are echoed from one port of the NetFPGA to another.
Wire your computers:
Wire eth1 to nf2c0 and eth2 to nf2c1
Obtain a program such as tcpreplay to send packets directly through a network interface. Obtain also a packet trace to replay (you can record one easily with tcpdump -w -s0 -i <dev></dev> ) or you can just take the trace called pcap0.trace in src/bench/pipe
Compile the loader, i.e. the program used to upload the program to the NetFPGA. Note that NF2_ROOT must be defined in the environment (the root folder of the NetFPGA distribution): this should be setup already if you followed the NetFPGA installation instructions.
cd netthreads/loader make cd ../../
Get ready to monitor the two wires that you connected above. In two separate terminals, run:
tcpdump -i eth1 -n -vv -XX -s 0 tcpdump -i eth2 -n -vv -XX -s 0This program takes packets from one port and sends them to port XOR 2 (e.g. packets received on the MAC port 0 will be sent on the MAC port 1, because destination ports are one-hot encoded and the DMA and MAC ports are interleaved Wiki]).
cd netthreads/src/bench/pipe make execyour program is compiled and ready to run. The program is loaded in three stages. To make things easier we provide a script to load your program. Once loaded, it will start working almost instantly. Assuming you run the script from the compiled benchmark folder (the benchmark name is grabbed by the script from the Makefile):
../../../loader/run.shNote that you will have to issue this command as the root user or by preceding the command by 'sudo '. Send packets through eth1, they will be received by nf2c0 and sent to nf2c1
tcpreplay -i eth1 some_packet_trace.traceIn your two terminals, you will see packets being copied from one port to the other: keep in mind that packet ordering is not enforced in this application. The pipe application above is a very simple application, you can now try the ping and finally the reference_router application, which should behave like the NetFPGA reference_router (take a look at the README file first). You will find more information on making your own applications in netthreads/compiler/src/bench/template/. TIP:
- The compiler provided contains the full compiler tool-chain if you want to dig deeper into what instructions are actually performed, you might want to take a look a the disassembler.
make clean make CONTEXT=swan executable for the machine you are using will be produced. It will be single threaded and has the option of reading packets either from a packet trace or from the network (by default using tap devices, see the sw_* files in the netthreads/compiler/src/bench/common/ folder). Using this mechanism, you can run the exact same code on the host machine (no changes necessary) to verify that the functionality is correct. Note that the execution will be single-threaded on the PC. For example, to debug using a pre-recorded packet trace:
make sure the line in src/bench/bench.mk after the one that says "# this is with a packet trace" is uncommented. cd src/bench/pipe make CONTEXT=sw setenv PACKET_TRACE some_packet_trace.trace ./pipeThere exists a cycle-accurate simulator/debugger for the processor that models the parallelism but it is not released as of yet.
cd netthreads/src/bench/pipe make clean make exec ../../../sim/trace -limit 20 -pcapmac0 pcap0.trace -prio backoff -sched static pipeTo enable the printing of debug information, you must compile in the sim CONTEXT. To do so, uncomment the first line and comment the second line in
#FLAG="CONTEXT=sim" FLAG=in compiler/bin/embed.sh before doing "make exec". Don't forget to revert that flag to its original value before compiling to run on the NetFPGA. The simulator is extremely powerful and has too many features to describe here. Please refer to the HOWTO in the sim folder for its usage. The simulator can be very effective at debugging parallel bugs such as race conditions. NetThreads has two CPUs, each of which has 4 threads. There is only a small software library and no operating system. Unlike threads and processes in a normal CPU, there is no context-switch overhead between the threads. They execute in round-robin order when no thread is waiting for synchronization or for a packet (in which case they are de-scheduled). The CPUs use network byte-order regardless of the endian-ness of the host CPU. The CPUs in NetThreads fit into the same hardware pipeline used by other NetFPGA applications like the NIC and router. A description of the Reference pipeline also applies to NetThreads. In the diagram of the pipeline, the NetThreads CPUs replace the Output Port Lookup module and sit between the Input Arbiter and the Output Queues. In the reference pipeline, there are 4 input queues (labelled "CPU RxQ" in the diagram) for packets copied over the PCI bus from the host computer's CPU. However, in NetThreads, only one of these queues is connected to the input arbiter. The remaining 3 are not connected and packets sent to them will never be read by NetThreads, however NetThreads can send packets to them. The 4 MAC RxQs are also connected to the input arbiter. Developing NetThreads programs requires a cross-compiler and software library. The tools and compiler are available here. Here is an overview of the important parts of the source code:
src/ the root of the source repository bench/ Contains software that runs in NetFPGA common/ library used by all NetThreads applications pipe/ simple application that resends all received packets template/ example skeleton of a typical applicationNetThreads applications have multiple threads with a single entry point. Threads are not dynamically created or destroyed. Instead, they all begin executing the same function at the same time. To change the behaviour of a specific thread, call nf_tid() to get the current thread's unique thread id and branch based on the result.
#include "support.h" int main (void) { if ( nf_tid() == 2 ) { log (" Hello world. I am the wonderful thread 2\n"); } else { log (" Hello world. I am thread %d\n" , nf_tid() ); } return 0; }Behind the scenes, the build system for NetThreads applications is a bit complicated. Fortunately, the complexities are mostly hidden and the individual Makefiles in the application directories are relatively simple. Here is pipe's Makefile:
TARGETS=bounce include ../bench.mk pipe : process.o pktbuff.o memcpy.oThe TARGETS variable should contain the names of the one or more resulting binaries, and the variable must be set before including bench.mk. For each target in TARGETS, the makefile must explicitly say what object files the application depends on. These objects files, plus one or two other default ones, are linked together to build the binary. Note that although pipe depends on process.o, pktbuff.o, and memcpy.o, only process.c is in the pipe directory. The other source files, pktbuff.c and memcpy.c, are in bench/common. Any object file that builds from a source file in the common directory can be used in this way. Also, the headers in the common directory can be included in source files as if they were local. It's not necessary to add "../common/" to the paths. In this release, NetThreads applications can be compiled for two different contexts:
nf: | Builds the app to run on the NetFPGA. This is the default context. All calls to the log function are ignored. |
sim: | Builds the app to run in the simulator.All calls to the log function are performed. |
sw: | Builds the app to run as software on the host computer. Instead of compiling using a cross-compiler this uses the native gcc toolchain. A lot of the normal NetThreads functions like sending or receiving packets can be noops in this context, use packet traces or use live network devices on the host (see ../common/sw_* files). This is useful when porting existing applications to NetThreads to verify the functionality: the same code running on the NetFPGA should run on the host (the difference is that the supplied C function library is not as extensive as the one on your host computer, but you can always supply your own implementation). |
uint nf_time() | Returns the current NetFPGA time. It increments once every clock cycle at a frequency of 125MHz. The time is returned as a 32-bit number, so it will wrap around to zero roughly every 34 seconds. |
int nf_tid() | Returns the current threads unique id. The id is a number in the range [0,]. |
void log(char *frmt, ...) | Prints a string. The arguments of log are the same as printf's. This function only has an effect in the sim and sw contexts. In the nf context, the function is defined away. |
void nf_pktin_init() | Initializes NetThreads for receiving packets by dividing the input memory into tens slots of 1600 bytes each. Must be called at most once from a single thread. |
t_addr nf_pktin_pop()* | Checks if a packet has been received. Arriving packets will be returned by nf_pktin_pop() only once and are returned in the order they arrived. If a packet is waiting, then this function returns a pointer to the IOQ header at the start of the packet. Use nf pktin is valid to determine if a packet is actually returned. |
int nf_pktin_is_valid(t_addr addr)* | Determines if a pointer returned by nf_pktin_pop() is actually a packet or not. Returns true if the pointer is a valid packet, false otherwise. |
void nf_pktin_free(t_addr val)* | Tells NetThreads that the application has finished reading a packet returned by nf_pktin_pop(). After calling this function, an application should not read the packet contents again. Do not call this function on sent packets since the hardware will take care of recycling the buffer when it has been sent. It is important to call this function as soon as possible for buffers that are not forwarded. If packets in the input memory are not freed then arriving packets cannot be stored, which quickly leads to packet drops. |
void do_send(char start addr, char* end addr, uint ctrl)* | Sends a packet. The argument start addr points to the start of the packet and end addr points to the last byte of the packet, as follows. |
if (nf_pktin_is_valid(next_packet)) { struct ioq_header *dioq = (struct ioq_header *)next_packet; unsigned int size = ntohs(dioq->byte_length); char* start_addr= next_packet; char* end_addr = (char*)(out + size + sizeof(struct ioq_header)); uint ctrl = calc_ctrl(start_addr, end_addr); end_addr--; if(want_to_send) do_send(start_addr, end_addr, ctrl); // packet slots sent are recycled automatically else nf_pktin_free(next_packet); // recycle the packet slot }Note that the functions nf_pktout_init()/nf_pktout_alloc() of NetThreads v1.0 are no longer useful but they may be used as an example for someone wanting to manage the allocation of a portion of the merged input/output buffer. NetThreads offers 16 mutexes for synchronizing between threads. Each mu- tex or lock is identified by a integer between 0 and 15 (higher numbers simply wrap around and identify the same 16 locks).
void nf_lock(int id) | Acquires a lock. |
void nf_unlock(int id) | Releases a lock. |
cd netthreads/src/bench/pipe make exec
your program is compiled and ready to run.
The program is loaded in three stages. To make things easier we provide a script to load your program. Once loaded, it will start working almost instantly. Assuming you run the script from the compiled benchmark folder (the benchmark name is grabbed by the script from the Makefile):
../../../loader/run.sh
Note that you will have to issue this command as the root user or by preceding the command by 'sudo '.
Send packets through eth1, they will be received by nf2c0 and sent to nf2c1
tcpreplay &#45;i eth1 some_packet_trace.trace
In your two terminals, you will see packets being copied from one port to the other: keep in mind that packet ordering is not enforced in this application.
The pipe application above is a very simple application, you can now try the ping and finally the reference_router application, which should behave like the NetFPGA reference_router (take a look at the README file first). You will find more information on making your own applications in netthreads/compiler/src/bench/template/.
TIP:
- The compiler provided contains the full compiler tool-chain if you want to dig deeper into what instructions are actually performed, you might want to take a look a the disassembler.
When compiling with
make clean make CONTEXT&#61;sw
an executable for the machine you are using will be produced. It will be single threaded and has the option of reading packets either from a packet trace or from the network (by default using tap devices, see the sw_* files in the netthreads/compiler/src/bench/common/ folder). Using this mechanism, you can run the exact same code on the host machine (no changes necessary) to verify that the functionality is correct. Note that the execution will be single-threaded on the PC.
For example, to debug using a pre-recorded packet trace:
make sure the line in src/bench/bench.mk after the one that says &quot;&#35; this is with a packet trace&quot; is uncommented. cd src/bench/pipe make CONTEXT&#61;sw setenv PACKET_TRACE some_packet_trace.trace ./pipe
There exists a cycle-accurate simulator/debugger for the processor that models the parallelism but it is not released as of yet.
cd netthreads/src/bench/pipe make clean make exec ../../../sim/trace &#45;limit 20 &#45;pcapmac0 pcap0.trace &#45;prio backoff &#45;sched static pipe
To enable the printing of debug information, you must compile in the sim CONTEXT. To do so, uncomment the first line and comment the second line in
&#35;FLAG&#61;&quot;CONTEXT&#61;sim&quot; FLAG&#61;in compiler/bin/embed.sh before doing "make exec". Don't forget to revert that flag to its original value before compiling to run on the NetFPGA.
The simulator is extremely powerful and has too many features to describe here. Please refer to the HOWTO in the sim folder for its usage. The simulator can be very effective at debugging parallel bugs such as race conditions.
NetThreads has two CPUs, each of which has 4 threads. There is only a small software library and no operating system. Unlike threads and processes in a normal CPU, there is no context-switch overhead between the threads. They execute in round-robin order when no thread is waiting for synchronization or for a packet (in which case they are de-scheduled). The CPUs use network byte-order regardless of the endian-ness of the host CPU.
The CPUs in NetThreads fit into the same hardware pipeline used by other NetFPGA applications like the NIC and router. A description of the Reference pipeline also applies to NetThreads. In the diagram of the pipeline, the NetThreads CPUs replace the Output Port Lookup module and sit between the Input Arbiter and the Output Queues. In the reference pipeline, there are 4 input queues (labelled "CPU RxQ" in the diagram) for packets copied over the PCI bus from the host computer's CPU. However, in NetThreads, only one of these queues is connected to the input arbiter. The remaining 3 are not connected and packets sent to them will never be read by NetThreads, however NetThreads can send packets to them. The 4 MAC RxQs are also connected to the input arbiter.
Developing NetThreads programs requires a cross-compiler and software library. The tools and compiler are available here.
Here is an overview of the important parts of the source code:
src/ the root of the source repository bench/ Contains software that runs in NetFPGA common/ library used by all NetThreads applications pipe/ simple application that resends all received packets template/ example skeleton of a typical application
NetThreads applications have multiple threads with a single entry point. Threads are not dynamically created or destroyed. Instead, they all begin executing the same function at the same time. To change the behaviour of a specific thread, call nf_tid() to get the current thread's unique thread id and branch based on the result.
&#35;include &quot;support.h&quot; int main (void) &#123; if ( nf_tid() &#61;&#61; 2 ) &#123; log (&quot; Hello world. I am the wonderful thread 2\n&quot;)&#59; &#125; else &#123; log (&quot; Hello world. I am thread %d\n&quot; , nf_tid() )&#59; &#125; return 0&#59; &#125;
Behind the scenes, the build system for NetThreads applications is a bit complicated. Fortunately, the complexities are mostly hidden and the individual Makefiles in the application directories are relatively simple. Here is pipe's Makefile:
TARGETS&#61;bounce include ../bench.mk pipe &#58; process.o pktbuff.o memcpy.o
The TARGETS variable should contain the names of the one or more resulting binaries, and the variable must be set before including bench.mk. For each target in TARGETS, the makefile must explicitly say what object files the application depends on. These objects files, plus one or two other default ones, are linked together to build the binary. Note that although pipe depends on process.o, pktbuff.o, and memcpy.o, only process.c is in the pipe directory. The other source files, pktbuff.c and memcpy.c, are in bench/common. Any object file that builds from a source file in the common directory can be used in this way. Also, the headers in the common directory can be included in source files as if they were local. It's not necessary to add "../common/" to the paths.
In this release, NetThreads applications can be compiled for two different contexts:
nf: | Builds the app to run on the NetFPGA. This is the default context. All calls to the log function are ignored. |
sim: | Builds the app to run in the simulator.All calls to the log function are performed. |
sw: | Builds the app to run as software on the host computer. Instead of compiling using a cross-compiler this uses the native gcc toolchain. A lot of the normal NetThreads functions like sending or receiving packets can be noops in this context, use packet traces or use live network devices on the host (see ../common/sw_* files). This is useful when porting existing applications to NetThreads to verify the functionality: the same code running on the NetFPGA should run on the host (the difference is that the supplied C function library is not as extensive as the one on your host computer, but you can always supply your own implementation). |
Follow the instructions in the sections nf, sw, sim for instructions on how to compile in each mode.
The default context is nf. When switching between contexts, run make clean to ensure all object files are recompiled.
The bench/common directory contains functions and structs useful for writing NetThreads applications. This document describes the most important and commonly used functions, but the best and most up-to-date source of information is still the code itself.
uint nf_time() | Returns the current NetFPGA time. It increments once every clock cycle at a frequency of 125MHz. The time is returned as a 32-bit number, so it will wrap around to zero roughly every 34 seconds. |
int nf_tid() | Returns the current threads unique id. The id is a number in the range [0,]. |
void log(char *frmt, ...) | Prints a string. The arguments of log are the same as printf's. This function only has an effect in the sim and sw contexts. In the nf context, the function is defined away. |
NetThreads places arriving packets into the input memory.
void nf_pktin_init() | Initializes NetThreads for receiving packets by dividing the input memory into tens slots of 1600 bytes each. Must be called at most once from a single thread. |
t_addr nf_pktin_pop()* | Checks if a packet has been received. Arriving packets will be returned by nf_pktin_pop() only once and are returned in the order they arrived. If a packet is waiting, then this function returns a pointer to the IOQ header at the start of the packet. Use nf pktin is valid to determine if a packet is actually returned. |
int nf_pktin_is_valid(t_addr addr)* | Determines if a pointer returned by nf_pktin_pop() is actually a packet or not. Returns true if the pointer is a valid packet, false otherwise. |
void nf_pktin_free(t_addr val)* | Tells NetThreads that the application has finished reading a packet returned by nf_pktin_pop(). After calling this function, an application should not read the packet contents again. Do not call this function on sent packets since the hardware will take care of recycling the buffer when it has been sent. It is important to call this function as soon as possible for buffers that are not forwarded. If packets in the input memory are not freed then arriving packets cannot be stored, which quickly leads to packet drops. |
All packets received and sent by NetThreads start with an 8 byte header that is added and removed by the NetFPGA itself and does not exist on the wire. This header is called the IOQ Modules Header (or just IOQ Header) and is described here.
The IOQ header specifies both the length of a packet and which NetFPGA port it was received on and which port it will be sent to. The file bench/common/pktbuff.h contains the definition of the IOQ header.
If an input packet is sent without modifying the IOQ header, it will be sent by default to the first MAC port (queue 0).
With this version of NetThreads, the packet in the input memory is ready to be sent as soon as it is received. The hardware can only send packets that are stored in the merged input/output memory. The easiest is to send a received packet. Otherwise, you may use a slot that has not be given to the hardware to use. You may set the destination port in the ioq_header (if different than 0), compute the ctrl word for the sending hardware logic, than call the function in src/bench/common/iomem_send.c.
void do_send(char start addr, char* end addr, uint ctrl)* | Sends a packet. The argument start addr points to the start of the packet and end addr points to the last byte of the packet, as follows. |
unsigned int size &#61; ntohs(dioq&#45;&gt;byte_length)&#59; char&#42; start_addr&#61; next_packet&#59; char&#42; end_addr &#61; (char&#42;)(out + size + sizeof(struct ioq_header))&#59; uint ctrl &#61; calc_ctrl(start_addr, end_addr)&#59; end_addr&#45;&#45;&#59; if(want_to_send) do_send(start_addr, end_addr, ctrl)&#59; // packet slots sent are recycled automatically else nf_pktin_free(next_packet)&#59; // recycle the packet slot &#125;
Note that the functions nf_pktout_init()/nf_pktout_alloc() of NetThreads v1.0 are no longer useful but they may be used as an example for someone wanting to manage the allocation of a portion of the merged input/output buffer.
NetThreads offers 16 mutexes for synchronizing between threads. Each mu- tex or lock is identified by a integer between 0 and 15 (higher numbers simply wrap around and identify the same 16 locks).
void nf_lock(int id) | Acquires a lock. |
void nf_unlock(int id) | Releases a lock. |
Think of the locks as 16 booleans. If a lock is false, then nf_lock() will set the lock true and return immediately. If a flag is already true, then nf_lock() will block the calling thread until the lock is false. Calling nf_unlock() sets a lock false and never blocks.
Other versions of the processors exist (for example, single threaded and other new and more exciting features). There also exists an interface to trace processors as they run in hardware. If you have a serious interest in utilizing these other processors, or using the additional packages, please contact the authors. The source code can be made available.
We are always on the lookout for interesting applications to run on the NetFPGA: if you think that your program could be a good soft processor benchmark, please contact the authors.
For all NetFPGA-related problems, consult the Netfpga forum.
Few hard-working researchers are involved in this project and unless you have a specific bug report regarding a specific piece of code that doesn't run the way it is expected to run, packaged in a way that we can easily reproduce, it is likely that we will not be able to assist you.