-
Notifications
You must be signed in to change notification settings - Fork 54
Building and testing
Cyclops can be built in a few different ways, depending on your intended usage. For the most basic library build, only MPI header files are required as a prerequisite. The following components can be built
- Cyclops C++ library (static or dynamic)
-
- with/without OpenMP
-
- with/without HPTT (high performance tensor transpose)
-
- with/without MKL sparse BLAS functionality
-
- with/without ScaLAPACK functionality
-
- with/without CUDA offloading (experimental)
- Cyclops Python library
- C++ test suite and examples
- Python test suite
The first-step to any build is to run the configure
script, which sets up a config.mk
and setup.py
file with appropriate build parameters, flags, and libraries. The script tests some of the configuration parameters and attempts standard linux defaults (such as -llapack
and -lblas
). The script can also build with external optional dependencies via --build-hptt
and --build-scalapack
. For a full, up-to-date list of configuration parameters, execute
./configure --help
Usage: configure [options]
--install-dir=dir Specify where to install header and lib files (default is /usr/local/,
so headers will be installed to /usr/local/include and libs to /usr/local/lib)
--build-dir=dir Specify where to build object files, library, and executable (default is .)
--with-lapack Tells CTF build to enable LAPACK functionality regardless of whether LAPACK libs have been given.
--with-scalapack Tells CTF build to enable ScaLAPACK functionality regardless of whether ScaLAPACK libs have been given.
--build-scalapack Tells CTF to download and build ScaLAPACK library.
--with-hptt Tells CTF build to enable HPTT functionality.
--build-hptt Tells CTF to download and build HPTT library.
--with-cuda Tells CTF to setup and use NVCC, NVCCFLAGS, and CUBLAS libs
--no-dynamic Turns off configuration and build of dynamic (shared) libraries (these are needed for Python codes and some C++ codes)
--no-static Turns off configuration and build of static libraries (these are needed for C++ codes)
--verbose Does not suppress tests of compile/link commands
LIB_PATH=-Ldir Specify paths to static libraries, e.g. -L/usr/local/lib/ to be used for test executables
LD_LIB_PATH=-Ldir Specify paths to dynamic libraries to be used for test executables as part of LD_LIBRARY_PATH
LIBS=-llibname Specify list of static libraries to link to, by default "-lblas -llapack -lscalapack" or a subset
(for each -l<name> ensure lib<name>.a is found in LIBRARY_PATH or LIB_PATH)
LD_LIBS=-llibname Specify list of dynamic libraries to link to, by default "-lblas -llapack -lscalapack" or a subset
(for each -l<name> ensure lib<name>.a is found in LD_LIBRARY_PATH or LD_LIB_PATH)
CXX=compiler Specify the C++ compiler (e.g. mpicxx)
CXXFLAGS=flags Specify the compiler flags (e.g. "-g -O0" for a lightweight debug build),
can also be used to specify macros like -DOMP_OFF (turn off OpenMP) -DPROFILE -DPMPI (turn on performance profiling)
-DVERBOSE=1 -DDEBUG, see docs and generated config.mk for details)
LINKFLAGS=flags Specify flags for creating the static library
LDFLAGS=flags Specify flags for creating the dynamic library
INCLUDES=-Idir Specify directories and header files to include (e.g. -I/path/to/mpi/header/file/)
Additionally, the variables AR, NVCC, NVCCFLAGS, and WARNFLAGS
can be set on the command line, e.g. ./configure CXX=g++ CXXFLAGS="-fopenmp -O2 -g".
Each time the configure script is executed successfully a line is appended to the file how-did-i-configure
in the build directory.
Once the configure script has executed successfully (can review/change flags in config.mk
and setup.py
or reconfigure until these are satisfactory), the C++ and Python Cyclops library can be build via GNU make,
make #build static and dynamic libraries
make -j4 #... using four threads
make ctflib #build static library
make ctflibso #build dynamic library
make test #build test_suite using ctflib and execute
make python #use Cython to build the Cyclops python library
make python_test #run suite of Python tests using (locally built) Cyclops python library
Numereous sub-tests and examples can also be build via make, (typically for a file examples/<some_example>.cxx
, one can build the code with make <some_example>
and similarly for sub-tests). Locally build libraries are stored in the lib
, lib_shared
, and lib_python
subdirectories of the build directory (which may be specified by running the configure script from the desired build directory folder or by specifying ./configure --build-dir=...
).
For Apple/MAC-OS machines it may be necessary to set the MACOSX_DEPLYMENT_TARGET
environment variable to build the python library, e.g. export MACOSX_DEPLOYMENT_TARGET=10.8
.
The CTF C++ libraries and headers can be installed system-wide (to /usr/local/
by default, change via ./configure --install-dir=...
), via
make install #may need superuser permissions (sudo)
If using the default /usr/local/
directory, to make the shared library visible (necessary for python usage of CTF without extra path specifications) you may need to add /usr/local/lib
to LD_LIBRARY_PATH
(to do this permanently on a standard linux system add LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/lib
to your ~/.bashrc
file.
The python library can be installed via pip using
make python_install
See the Makefile
in the main directory for additional build targets, or if GNU make is well-configured, use auto-complete to view options via make <TAB>
, which currently (June 2018) yields,
algebraic_multigrid dft model_trainer scalar
all dft_3D multi_tsr_sym scan
ao_mo_transf diag_ctr neural_network shared
apsp diag_sym nonsq_pgemm_bench sparse_mp3
bench endomorphism nonsq_pgemm_test sparse_permuted_slice
bench_contraction endomorphism_cust particle_interaction spectral_element
bench_nosym_transp endomorphism_cust_sp permute_multiworld speye
bench_redistribution examples pip spmv
bitonic_sort executables python sptensor_sum
bivar_function fast_3mm python_base_test sssp
bivar_transform fast_as_as_sy_tensor_ctr python_dot_test strassen
block_sparse fast_diagram python_einsum_test studies
btwn_central fast_sy_as_as_tensor_ctr python_fancyindex_test subworld_gemm
ccsd fast_sym python_install svd
ccsdt_map_test fast_sym_4D python_la_test sy_times_ns
ccsdt_t3_to_t2 fast_tensor_ctr python_test test
checkpoint fft python_ufunc_test test_live
clean force_integration python_uninstall tests
clean_bin force_integration_sparse qinformatics test_suite
clean_lib gemm_4D qr trace
clean_obj hosvd readall_test uninstall
clean_py install readwrite_test univar_function
ctf_ext_objs jacobi recursive_matmul weigh_4D
ctflib matmul reduce_bcast
ctflibso mis repack
ctf_objs mis2 scalapack_tests
The Cyclops library requires a BLAS library and an MPI library. For Python, Cython (typically available via pip) and numpy are necessary. Optionally, Cyclops can be build alongside LAPACK and ScaLAPACK, providing distributed matrix factorization functionality. Cyclops uses basic MPI routines and is compatible with most MPI libraries (MPICH/OpenMPI). For a static Cyclops build, static versions of these libraries are necessary, for a dynamic (Python) build, dynamic libraries are necessary.
On an Apple machine, all necessary dependencies can be installed via brew
brew install gcc wget cmake openblas mpich2
All library paths and libraries can be provided to the configure files, for example static libraries like /path/to/static_libraries/libmyfastblas.a
may be specified as follows,
./configure LIB_PATH="-L/path/to/static_libraries" LIBS="-lmyfastblas -lgfortran"
(-lgfortran
may or may not be necessary). If the shared library is in path/to/dynamic_libaries/libmyfastblas.so
, the (additional) specification of LD_LIB_PATH
and LD_LIBS
is required as
./configure LIB_PATH="-L/path/to/dynamic_libraries" LIBS="-lmyfastblas"
The functionality provided by and performance of Cyclops depends on whether and which of these libraries are provided. MKL routines are used for sparse matrix multiplication, which yield significantly higher performance than the reference kernels. ScaLAPACK functionality enables routines such as QR and SVD to be executed on CTF::Matrix<float>
, CTF::Matrix<double>
, CTF::Matrix< std::complex<float> >
, CTF::Matrix< std::complex<double> >
. The presence of necessary symbols is tested when configure
is executed (creating flags like -DUSE_MKL
and -DUSE_SCALAPACK
in config.mk
). ScaLAPACK can be downloaded and built (via cmake
) automatically via
./configure --build-scalapack
The library can also be build without specifying a ScaLAPACK library, by providing the flag --with-scalapack
to ./configure
.
Building with MKL on an Intel compiler (e.g. ./configure CXX="mpicxx -cxx=icpc"
) typically requires simply -mkl
(but can be augmented with -mkl=parallel
for threading or -mkl=cluster
for ScaLAPACK). The configure scripts attempts -mkl
when building with an Intel compiler automatically, but it can also be provided as part of CXXFLAGS
or LIBS
. Custom (e.g. GNU-compiler or parallel+cluster) MKL library link-lines may be provided via LIB_PATH
, LIBS
, LD_LIB_PATH
, LD_LIBS
. For example, to build only the dynamic libraries (e.g. for Cyclops Python usage) with GNU compilers and MKL ScaLAPACK on a 64-bit system, the following link-line may be appropriate,
./configure 'LD_LIB_PATH=-L/opt/intel2016/mkl/lib/intel64/' 'LD_LIBS=-lmkl_scalapack_lp64 -lmkl_intel_lp64 -lmkl_sequential -lmkl_core -lmkl_blacs_intelmpi_lp64 -lmkl_rt -lpthread' '--no-static'
The C++ static Cyclops library can be tested via
make test #build test_suite and run sequential test
make testN #build test_suite and run via mpirun with N processors
make test_suite && ./bin/test_suite #manually build test_suite and run it
Additional tests (e.g. for ScALAPACK QR and SVD) can be build via make qr
, make svd
, etc., or in bulk via make examples
, make tests
, make scalapack_tests
.
The Python library may be tested by running
make python_test #run a sequence of test suite sequentially
make python_testN #run the tests with N mpi processes via mpirun python
mpirun python ./test/python/<...>.py #run test <...> manually
make test_live #launch ipython with numpy and ctf pre-imported
Cyclops uses internal performance models to make selections between algorithm variants. These models attempt to predict the performance of every subkernel within Cyclops. The model-coefficients are set to reasonable defaults, but better performance may be achieved by optimizing them for a particular architecture and number of threads per MPI process. To do this, the CTF static library must be build with -DPROFILE -DPMPI -DTUNE
(can uncomment appropriate lines in config.mk
) and the model_trainer
executable should be built via make model_trainer
. This executable should then be run for a suitable amount of time on the largest desired amount of nodes (e.g. for an hour on 256 nodes), via, e.g.
export OMP_NUM_THREADS=4; mpirun -np 256 ./bin/model_trainer --time 3600 #run model_trainer for roughly 3600 seconds
The actual execution time of the model_trainer may differ substantially (be smaller or greater) from what is specified by --time
. When executed successfully the model_trainer
executable outputs a set of model coefficients that can be provided to a subsequent build of CTF for the application (which should no longer use -DTUNE
).
Some benchmarks are provided as part of the Cyclops source distribution in bench/
(some examples also include benchmarks). In particular, the bench_contraction
target/executable can be used to benchmark an arbitrary sparse/dense/mixed tensor contraction.
The following configuration makes sense for the Stampede cluster with Intel 17.0.4,
module list
1) intel/17.0.4 2) impi/17.0.3 3) git/2.9.0 4) autotools/1.1 5) python/2.7.13 6) xalt/2.0.7 7) TACC
For a static-only C++ build that uses MKL with both threads (usually -mkl=parallel
) and ScaLAPACK (usually -mkl=cluster
), which can be tricky to combine,
./configure --no-dynamic LIB_PATH="-L$MKLROOT/lib/intel64/" LIBS="-lmkl_scalapack_lp64 -Wl,--start-group -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -lmkl_blacs_intelmpi_lp64 -Wl,--end-group -lpthread -lm"
For a dynamic-only Python build of CTF that uses sequential MKL, can use the link-line recommended by MKL link-line advisor plus -lmkl_rt
to avoid issues with incorrect linkage to static libs. We also set CXXFLAGS="-O3 -no-ipo"
to avoid an Intel internal error in compilation,
./configure '--no-static' 'LD_LIB_PATH=-L/opt/intel/compilers_and_libraries_2017.4.196/linux/mkl/lib/intel64 -Wl,--no-as-needed' 'LD_LIBS=-lmkl_scalapack_lp64 -lmkl_gf_lp64 -lmkl_intel_thread -lmkl_core -lmkl_blacs_intelmpi_lp64 -lmkl_def -liomp5 -lpthread -lm -ldl' 'CXXFLAGS=-O2 -no-ipo'
The use of -lmkl_rt
for dynamic library linking seems to have some limitations. On a standard linux configuration, it may link to C++ as opposed to Fortran MKL libraries, which leads to errors in MKL routines for complex types. To fix this, the necessary Fortran MKL libraries need to be specified manually, which also entails other caveats. The following configure specification is valid for a typical linux system configratuon using MKL,
./configure --build-scalapack --build-hptt --no-static LD_LIB_PATH="-L/opt/intel/mkl/lib/intel64/ -Wl,-no-as-needed" LD_LIBS="-lmkl_gf_lp64 -lmkl_intel_thread -lmkl_core -lmkl_def -liomp5" LDFLAGS="-lpthread -lm -ldl -lgfortran"
after running configure in this way, run make python
to build the python libs, and/or make python_test2
to test them with a 2 process execution.