Cheetah is a system that optimizes queries using programmable switches.
- Cheetah: Accelerating Database Queries with Switch Pruning (ACM SIGMOD 2020) [link]
- Extended version (Arxiv 2020) [link]
- Poster (ACM SIGCOMM 2019) [link]
The code in this repository is classified into three categories.
The data_plane
directory contains code designed for Intel's Barefoot Tofino programmable switch. The code is written in a Tofino-specific variant of the P4-14 programming language. See here for more information on standard P4.
The control_plane
directory contains the control plane rules you need to install to enable pruning for the queries implemented in the data plane. They are in a markdown file with separate sections for each query. The rules in each section are intended to be installed in the same order they are presented.
The host
directory contains code used to serialize / deserialize any given list of values into a cheetah packet. This code is generic and does not make any assumptions regarding the query engine you are using. You can make it work with any kind of query engine or key value store as long as you write some patchwork code to allow Cheetah's packet serializer / deserializer to understand the file format(s) of the system you are integrating Cheetah with. Cheetah's packet serializer and deserializer is optimized using Intel's Dataplane Development Kit (DPDK). Therefore, you can only run this code on a DPDK-compliant NIC. For our evaluation, we used version 18.11 of Intel's DPDK along with Mellanox NICs.
You need a proprietary compiler, the P4 compiler that is packaged with Intel's Barefoot Software Development Environment, to compile this code regardless of whether or not you intend to run it on real hardware. This is why we have chosen to release the code under the MIT License instead of one of the GNU GPL family of licenses (relevant).
To deploy our implementation on hardware, we recommend using the first generation of Intel's Barefoot Tofino switches. We did not test Cheetah on the more recent Intel Barefoot Tofino 2 chip and do not know if our implementation of Cheetah is compatible with it. Note that we have not included code required for TCP/IP forwarding and match-action table placement in our P4 scripts. You will need to add your switch deployment's implementation TCP/IP forwarding to the P4 scripts before you can deploy our implementation. Depending on your hardware, you may also need to add some code related to match-action table placement in order to be conformant with switch constraints (discussed in our ACM SIGMOD publication and Arxiv report).
You need to define the following constants based on your hardware:
- For DISTINCT,
LRU_WIDTH
- For GROUP-BY and JOIN,
REGISTER_COUNT_MAX
- For TOP-N,
CHEETAH_TOPN_INT_MAX
You also need to define the following constants based on the particular query you are optimizing:
- For TOP-N,
CHEETAH_TOPN_VALUE
- For FILTERING,
FILTERING_CUTOFF
We have included the default implementation of Cheetah's queries. However, to get optimal performance, you need to tune these queries based on your hardware constraints, the workload you are running, the query engine you are using and other factors. Here is a (not necessarily complete) set of knobs you can tune in our dataplane implementation. Note that these will also require (fairly intuitive) modifications / additions to control plane rules.
- For DISTINCT, you can increase or decrease the number of LRU caches (see the
cheetah_lru_import
files) used or assign more memory to each cache (tuneLRU_WIDTH
) - For JOIN, you can increase or decrease the number of bloom filter blocks (see the definition for
CREATE_BLOOM
) or the memory assigned for each block (tuneREGISTER_COUNT_MAX
) - For SKYLINE, you can change the heuristic you use for pruning (see our Arxiv report for more details on this) by modifying
alu_heuristic
- For TOP-N, you can increase or decrease the number of cutoffs (see our Arxiv report for more details on this) by modifying the definition (or the number of times you use) the
GENERATE_PACKET_COUNTER_ALU
andGENERATE_TOPN_MIN_ALU
macros - For DISTINCT, JOIN and GROUP-BY, you can try changing the hash function(s) you use; our implementation uses CRC32 with offsets by default
Please feel free to use the Cheetah project mailing list: harvard-cns-cheetah AT googlegroups DOT com. However, this is a low volume mailing list. You are more likely to receive a helpful response if your question is specific, self-contained and concise.