The Bumblebee
is a set of Krylov subspace solvers including: CG, BiCG, BiCGStab and BiCGStab(2).
All solvers have two versions - unpreconditioned and preconditioned.
The library relies on the Epetra
and ML
libs from the Trilinos
project (see https://trilinos.org/).
The Epetra
library is used to represent distributed matrices and vectors, as well as to
perform Mat-Vec product. The ML
library is used as an AMG preconditioner.
The Bumblebee
supports external preconditioners if they can be provided as an object of
a class with two methods: ::solve()
and ::IsBuilt()
. The ::solve()
method should take
reference to the matrix object, references to the LHS and RHS vectors and a boolean indicating
that the method is calles from BiCG method and special treatment of preconditioning matrix
might be applied. The method ::IsBuit()
should simply return true
if preconditioning matrix
is built and false
otherwise.
As an example of preconditioner class implementation see folder AMG
, where ML library from
the Trilinos project is wrapped in order to provide required interface.
The Bumblebee
utilizes low-level intrinsics if symbol USE_MAGIC_POWDER
is added as a
compilation flag. Explicit vectorization is implemented for main vector-based operations:
update, copy, assing, multiply, divide, etc. All these operations are implemented as stand-alone
functions in order to simplify couling with old C-based codes (see mpi_wrp::
namespace).
The Bumblebee
is created to support MPI parallelization strategy at first, but can also work
with hybrid parallelization. Just include BUMBLEBEE_USE_OPENMP
symbol into the makefile and
be sure that Trilinos packages are compiled with OpenMP support (add -D Trilinos_ENABLE_OpenMP:BOOL=ON
to the do-configure script)
The script is a simple call for CMake in order to build Trilinos. Use it in conjunction with the
Trilinos
manuals and tutorials to build all required libraries.
Parallel performance tests were done for the BiCGStab(2) solver accelerated with AMG. The solver scales well up to 1536 tested cored (strong scaling). An example is shown in the figure below. Tested problem consisted of a solution of a heat diffusion problem (with constant diffusion coefficient) discretized on a uniform Cartesian grid with 27M DoF. Tests were performed on 2x12 cores Intel Xeon E5-2690 v3 CPU with basic clock speed 2.6 GHz. Code was compiled with Intel C++ compiler v16.0.3 and Intel MPI v5.1.3.