WindFlow is a C++17 header-only library for parallel data stream processing targeting heterogeneous shared-memory architectures equipped with multi-core CPUs and NVIDIA GPUs. The library provides traditional stream processing operators like map, flatmap, filter, reduce as well as window-based operators. The API allows building streaming applications through the MultiPipe and the PipeGraph programming constructs. The first is used to create parallel pipelines (with shuffle connections), while the second allows several MultiPipe instances to be interconnected through merge and split operations, in order to create complex directed acyclic graphs of interconnected operators.
Analogously to existing popular stream processing engines like Apache Storm and FLink, WindFlow supports general-purpose streaming applications by enabling operators to run user-defined code. The WindFlow runtime system has been designed to be suitable for embedded architectures equipped with low-power multi-core CPUs and integrated NVIDIA GPUs (like the Jetson family of NVIDIA boards). However, it works well also on traditional multi-core servers equipped with discrete NVIDIA GPUs.
At the moment WindFlow is for single-node execution. We are working to a distributed implementation.
The web site of the library is available at: https://paragroup.github.io/WindFlow/.
The library requires the following dependencies:
- a C++ compiler with full support for C++17 (WindFlow tests have been successfully compiled with both GCC and CLANG)
- FastFlow version >= 3.0 (https://github.com/fastflow/fastflow)
- CUDA (version >= 11.5 is preferred for using operators targeting GPUs)
- libtbb-dev required by GPU operators only
- libgraphviz-dev and rapidjson-dev when compiling with -DWF_TRACING_ENABLED to report statistics and to use the Web Dashboard for monitoring purposes
- librdkafka-dev for using the integration with Kafka (special Kafka_Source and Kafka_Sink operators)
- librocksdb-dev for using the suite of persistent operators keeping their internal state in RocksDB KVS
- doxygen (to generate the documentation)
Important about the FastFlow dependency -> after downloading FastFlow, the user needs to configure the library for the underlying multi-core environment. By default, FastFlow pins its threads onto the cores of the machine. To make FastFlow aware of the ordering of cores, and their correspondence in CPUs and NUMA regions, it is important to run (just one time) the script "mapping_string.sh" in the folder fastflow/ff before compiling your WindFlow programs.
WindFlow, and its underlying level FastFlow, come with some important macros that can be used during compilation to enable specific behaviors. Some of them are reported below:
- -DWF_TRACING_ENABLED -> enables tracing (logging) at the WindFlow level (operator replicas), and allows streaming applications to continuously report statistics to a Web Dashboard (which is a separate sub-project). Outputs are also written in log files at the end of the processing
- -DTRACE_FASTFLOW -> enables tracing (logging) at the FastFlow level (raw threads and FastFlow nodes). Outputs are written in log files at the end of the processing
- -DFF_BOUNDED_BUFFER -> enables the use of bounded lock-free queues for pointer passing between threads. Otherwise, queues are unbounded (no backpressure mechanism)
- -DDEFAULT_BUFFER_CAPACITY=VALUE -> set the size of the lock-free queues capacity. The default size of the queues is of 2048 entries
- -DNO_DEFAULT_MAPPING -> if this macro is enabled, FastFlow threads are not pinned onto CPU cores, but they are scheduled by the Operating System
- -DBLOCKING_MODE -> if this macro is enabled, FastFlow queues use the blocking concurrency mode (pushing to a full queue or polling from an empty queue might suspend the underlying thread). If not set, waiting conditions are implemented by busy-waiting spin loops.
Some macros are useful to configure the runtime system when GPU operators are utilized in your application. The default version of the GPU support is based on explicit CUDA memory management and overlapped data transfers, which is a version suitable for a wide range of NVIDIA GPU models. However, the developer might want to switch to a different implementation that makes use of the CUDA unified memory support. This can be done by compiling with the macro -DWF_GPU_UNIFIED_MEMORY. Alternatively, the user can configure the runtime system to use pinned memory on NVIDIA System-on-Chip devices (e.g., Jetson Nano and Jetson Xavier), where pinned memory is directly accessed by CPU and GPU without extra copies. This can be done by compiling with the macro -DWF_GPU_PINNED_MEMORY.
WindFlow is a header-only template library. To build your applications you have to include the main header of the library (windflow.hpp). For using the operators targeting GPUs, you further have to include the windflow_gpu.hpp header file and compile using the nvcc
CUDA compiler (or through clang
with CUDA support). The source code in this repository includes several examples that can be used to understand the use of the API and the advanced features of the library. The examples can be found in the tests folder. To compile them:
$ cd <WINDFLOW_ROOT>
$ mkdir ./build
$ cd build
$ cmake ..
$ make -j<no_cores> # compile all the tests (not the doxygen documentation)
$ make all_cpu -j<no_cores> # compile only CPU tests
$ make all_gpu -j<no_cores> # compile only GPU tests
$ make docs # generate the doxygen documentation (if doxygen has been installed)
In order to use the Kafka integration, consisting of special Source and Sink operators, the developer has to include the additional header kafka/windflow_kafka.hpp and properly link the library librdkafka-dev. Analogously, to use persistent operators, you need to include the header persistent/windflow_rocksdb.hpp and link the library librocksdb-dev.
Two Docker images are available in the WindFlow GitHub repository. The images contain all the synthetic tests compiled and ready to be executed. To build the first image (the one without tests using GPU operators) execute the following commands:
$ cd <WINDFLOW_ROOT>
$ cd dockerimages
$ docker build -t windflow_nogpu -f Dockerfile_nogpu .
$ docker run windflow_nogpu ./bin/graph_tests/test_graph_1 -r 1 -l 10000 -k 10
The last command executes one of the synthetic experiments (test_graph_1). You can execute any of the compiled tests in the same mannner.
The second image contains all synthetic tests with GPU operators. To use your GPU device with Docker, please follow the guidelines in the following page (https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html). Then, you can build the image and run the container as follows:
$ cd <WINDFLOW_ROOT>
$ cd dockerimages
$ docker build -t windflow_gpu -f Dockerfile_gpu .
$ docker run --gpus all windflow_gpu ./bin/graph_tests_gpu/test_graph_gpu_1 -r 1 -l 10000 -k 10
Again, the last command executes one of the synthetic experiments (test_graph_gpu_1). You can execute any of the compiled tests in the same mannner.
WindFlow has its own Web Dashboard that can be used to profile and monitor the execution of running WindFlow applications. The dashboard code is in the sub-folder WINDFLOW_ROOT/dashboard. It is a Java package based on Spring (for the Web Server) and developed using React for the front-end part. To start the Web Dashboard run the following commands:
cd <WINDFLOW_ROOT>/dashboard/Server
mvn spring-boot:run
The web server listens on the default port 8080 of the machine. To change the port, and other configuration parameters, users can modify the configuration file WINDFLOW_ROOT/dashboard/Server/src/main/resources/application.properties for the Spring server (e.g., to change the HTTP port), and the file WINDFLOW_ROOT/dashboard/Server/src/main/java/com/server/CustomServer/Configuration/config.json for the internal server receiving reports of statistics from the WindFlow applications (e.g., to change the port used by applications to report statistics to the dashboard).
WindFlow applications compiled with the macro -DWF_TRACING_ENABLED try to connect to the Web Dashboard and report statistics to it every second. By default, the applications assume that the dashboard is running on the local machine. To change the hostname and the port number, developers can use the macros WF_DASHBOARD_MACHINE=hostname/ip_addr and WF_DASHBOARD_PORT=port_number.
From version 3.1.0, WindFlow is released with a double license: LGPL-3 and MIT. Programmers should check the licenses of the other libraries used as dependencies.
In order to cite our work, we kindly ask interested people to use the following references:
@article{WindFlow,
author={Mencagli, Gabriele and Torquati, Massimo and Cardaci, Andrea and Fais, Alessandra and Rinaldi, Luca and Danelutto, Marco},
journal={IEEE Transactions on Parallel and Distributed Systems},
title={WindFlow: High-Speed Continuous Stream Processing With Parallel Building Blocks},
year={2021},
volume={32},
number={11},
pages={2748-2763},
doi={10.1109/TPDS.2021.3073970}
}
@article{WindFlow-GPU,
title = {General-purpose data stream processing on heterogeneous architectures with WindFlow},
journal = {Journal of Parallel and Distributed Computing},
volume = {184},
pages = {104782},
year = {2024},
issn = {0743-7315},
doi = {https://doi.org/10.1016/j.jpdc.2023.104782},
url = {https://www.sciencedirect.com/science/article/pii/S0743731523001521},
author = {Gabriele Mencagli and Massimo Torquati and Dalvan Griebler and Alessandra Fais and Marco Danelutto},
}
If you are using WindFlow for your purposes and you are interested in specific modifications of the API (or of the runtime system), please send an email to the maintainer.
The main developer and maintainer of WindFlow is Gabriele Mencagli (Department of Computer Science, University of Pisa, Italy).