Skip to content

attractivechaos/matmul

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

27 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

This repo evaluates different matrix multiplication implementations given two large square matrices (2000-by-2000 in the following example):

Implementation Long description
Naive Most obvious implementation
Transposed Transposing the second matrix for cache efficiency
sdot w/o hints Replacing the inner loop with BLAS sdot()
sdot with hints sdot() with a bit unrolled loop
SSE sdot vectorized sdot() with explicit SSE instructions
SSE+tiling sdot SSE sdot() with loop tiling
OpenBLAS sdot sdot() provided by OpenBLAS
OpenBLAS sgemm sgemm() provided by OpenBLAS

To compile the evaluation program:

make CBLAS=/path/to/cblas/prefix

or omit the CBLAS setting you don't have it. After compilation, use

./matmul -h

to see the available options. Here is the result on my machines:

Implementation -a Linux,-n2000 Linux,-n4000 Linux/icc,-n4000 Mac,-n2000
Naive 0 7.53 sec 188.85 sec 173.76 sec 77.45 sec
Transposed 1 6.66 sec 55.48 sec 21.04 sec 9.73 sec
sdot w/o hints 4 6.66 sec 55.04 sec 21.35 sec 9.70 sec
sdot with hints 3 2.41 sec 29.47 sec 21.69 sec 2.92 sec
SSE sdot 2 1.36 sec 21.79 sec 22.18 sec 2.92 sec
SSE+tiling sdot 7 1.11 sec 10.84 sec 10.97 sec 1.90 sec
OpenBLAS sdot 5 2.69 sec 28.87 sec 5.61 sec
OpenBLAS sgemm 6 0.63 sec 4.91 sec 0.86 sec
uBLAS 7.43 sec 165.74 sec
Eigen 0.61 sec 4.76 sec 5.01 sec 0.85 sec

The machine configurations are as follows:

Machine CPU OS Compiler
Linux 2.6 GHz Xeon E5-2697 CentOS 6 gcc-4.4.7/icc-15.0.3
Mac 1.7 GHz Intel Core i5-2557M OS X 10.9.5 clang-600.0.57/LLVM-3.5svn

On both machines, OpenBLAS-0.2.18 is compiled with the following options (no AVX or multithreading):

TARGET=CORE2
BINARY=64
USE_THREAD=0
NO_SHARED=1
ONLY_CBLAS=1
NO_LAPACK=1
NO_LAPACKE=1

About

Benchmarking matrix multiplication implementations

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published