RaftLib is an open-source C++ Library that provides a framework for implementing parallel and concurrent data processing pipelines. It is designed to simplify the development of high-performance data processing applications by abstracting away the complexities of parallelism, concurrency, and data flow management.
It enables stream/data-flow parallel computation by linking parallel compute kernels together using simple right shift operators, similar to C++ streams for string manipulation. RaftLib eliminates the need for explicit usage of traditional threading libraries such as pthreads, std::thread, or OpenMP, which can lead to non-deterministic behavior when misused.
RaftLib's model provides a lock-free FIFO-like access to communication channels that connect each compute kernel. This design ensures efficient and deterministic data transfer between kernels. The library incorporates auto-parallelization, optimization, and convenient features to simplify the development of performant applications.
Key Components:
-
Pipelines: The core concept in RaftLib is a pipeline, which represents a sequence of data processing stages. Each stage in the pipeline is implemented as a separate computational unit called a "kernel." Kernels are connected together to form a directed acyclic graph (DAG) that represents the data flow between stages.
-
Kernels: Kernels are the building blocks of a pipeline and encapsulate the computation performed on the input data. Each kernel can have one or more input ports and output ports, allowing data to flow between stages. Kernels are responsible for processing data, applying transformations, and generating output for downstream stages.
-
Schedulers: RaftLib provides different schedulers to control the execution and parallelism of the pipeline. Schedulers determine how and when kernels are executed, taking into account factors like data dependencies, load balancing, and available computational resources. The library offers various scheduling strategies, including static scheduling, dynamic scheduling, and hybrid scheduling.
-
Data Flow Management: RaftLib handles the data flow between kernels automatically. It manages the movement of data between stages, ensuring that input data is available when needed and that output data is delivered to the correct destination. The library provides mechanisms for handling backpressure and buffering, allowing efficient processing of data streams.
-
Parallelism and Concurrency: RaftLib enables parallel execution of pipeline stages by utilizing the available computational resources efficiently. It supports multi-threading and takes advantage of multiple CPU cores to achieve parallelism. Additionally, it can leverage GPU acceleration for certain kernels, further boosting performance.
-
Integration and Extensibility: RaftLib provides an API and set of tools for integrating the library into existing applications. It supports interoperability with other libraries and frameworks, making it possible to combine RaftLib with domain-specific tools. The library is extensible, allowing developers to define custom kernels and schedulers to fit specific application requirements.
-
Fault Tolerance: RaftLib offers mechanisms for handling failures and recovering from errors. It supports fault-tolerant execution by providing checkpointing and recovery capabilities, allowing pipelines to resume from a previous state in case of failures.
Overall, RaftLib simplifies the development of parallel and concurrent data processing applications by providing a high-level abstraction for building data flow pipelines. It allows developers to focus on the computation and data transformations, while the library handles the complexities of parallel execution, data flow management, and fault tolerance.
Feel free to give RaftLib a try! If you encounter any issues, please create an issue request. For minor issues, we recommend joining our Slack group for quick resolutions. We also welcome pull requests from the community! If you're interested in benchmarking, you can send the authors an email. We have started a benchmark collection, but it's a work in progress, and we would be delighted to include your code.
User Group / Mailing List: slack channel
- Compiler: c++17 capable -> Clang, GNU GCC 5.0+, or Intel icc
- Latest build runs under Linux with above compilers on both x86 and AArch64, with both pthreads and QThreads.
- Compiler: c++17 capable -> Clang, GNU GCC 5.0+, or Intel icc
- OS X M1 runs, compiles, but has some test case hiccups on templates, but doesn't seem to impact functionality.
- Note for OS X users without a /user/local, specify an install prefix when using CMake.
- Builds and runs under Win10
Clone using the --recurse-submodules to download the library including all submodules and other libraries
git clone --recurse-submodules https://github.com/RaftLib/RaftLib.git
Using a build directory called e.g.: "build":
mkdir build
cd build
cmake ..
make && make test
sudo make install
NOTE: The default prefix in the makefile is:
PREFIX ?= /usr/local
If you want to build the OpenCV example, then you'll need to add to your cmake invocation:
-DBUILD_WOPENCV=true
Building the examples can be enabled using:
-DBUILD_EXAMPLES=true
Building the benchmarks can be enabled using:
-DBUILD_BENCHMARKS=false
Building tests can be disabled using:
-DBUILD_TESTS=false
To use the QThreads User space HPC threading library you will need to use the version with the RaftLib org and follow the RaftLib specific readme. This QThreads version has patches for hwloc2.x applied and fixes for test cases. To compile RaftLib with QThreads linked, add the following (assumes the QThreads library is in your path):
-DUSEQTHREAD=1
This is still an experimental feature. Default is to use legacy string-named ports.
-DSTRING_NAMES=1
Set the pkg-config path where to install the raftlib.pc
configuration file. Leave empty for the application to figure it out.
-DPKG_CONFIG_PATHWAY="<path>"
Building the library with position independet code
-DBUILD_FPIC=1
When building applications with RaftLib, on Linux it is best to use the pkg-config file, as an example, using the poc.cpp example,
g++ `pkg-config --cflags raftlib` poc.cpp -o poc `pkg-config --libs raftlib`
Feel free to substitute your favorite build tool. I use Ninja and make depending on which machine I'm on. To change out, use cmake to generate the appropriate build files with the -Gxxx flag.
The primary use of pkg-config is to provide the necessary details for compiling and linking a program to a library. This metadata is stored in pkg-config files. These files have the suffix .pc and reside in specific locations known to the pkg-config tool. RaftLib provides a configuration which which is installed together with the library. Once the configuration file is installed, the command pkg-config --cflags raftlib
can be used to provide the compiling details.
Following is an example of what is returned by above command:
-std=c++14 -DL1D_CACHE_LINE_SIZE=64 -DPLATFORM_HAS_NUMA=0 -I/usr/local/include
If you use this framework for something that gets published, please cite it as:
@article{blc16,
author = {Beard, Jonathan C and Li, Peng and Chamberlain, Roger D},
title = {RaftLib: A C++ Template Library for High Performance Stream Parallel Processing},
year = {2016},
doi = {http://dx.doi.org/10.1177/1094342016672542},
eprint = {http://hpc.sagepub.com/content/early/2016/10/18/1094342016672542.full.pdf+html},
journal = {International Journal of High Performance Computing Applications}
}
- OpenCV wrappers for RaftLib
- Project web page
- Project wiki page
- Blog post intro
- Jonathan Beard's thesis
- Views on parallel computing, general philosphy
Feel free to e-mail one of the authors of the repo