Skip to content

OpenBLAS 0.3.11 version

Pre-release
Pre-release
Compare
Choose a tag to compare
@martin-frbg martin-frbg released this 17 Oct 20:15
· 3496 commits to release-0.3.0 since this release
51c2261

NOTE there appear to be several defects in this version unfortunately - this should not be redistributed or used in a production environment

common:

  • API change:

        the newly added BFLOAT16 functions were renamed to use the
        letter "B" instead of "H" to avoid potential confusion with
        the IEEE "half precision float" type, i.e. the 0.3.10
        SHGEMM is now SBGEMM and the corresponding build option
        was changed from "BUILD_HALF" to "BUILD_BFLOAT16".
    
  • Reduced the default BLAS3_MEM_ALLOC_THRESHOLD (used as an upper
    limit for placing temporary arrays on the stack) to be compatible
    with a stack size of 1mb (as imposed by the JAVA runtime library)
  • Added mixed-precision dot function SBDOT and utility functions
    shstobf16, shdtobf16, sbf16tos and dbf16tod to convert between
    single or double precision float arrays and bfloat16 arrays
  • Fixed prototypes of LAPACK_?ggsvp and LAPACK_?ggsvd functions
    in lapack.h
  • Fixed underflow and rounding errors in LAPACK SLANV2 and DLANV2
    (causing miscalculations in e.g. SHSEQR/DHSEQR, LAPACK issue #263)
  • Fixed workspace calculation in LAPACK ?GELQ (LAPACK issue #415)
  • Fixed several bugs in the LAPACK testsuite
  • Improved performance of TRMM and TRSM for certain problem sizes
  • Fixed infinite recursions and workspace miscalculations in ReLAPACK
  • CMAKE builds no longer require pkg-config for creating the .pc file
  • Makefile builds no longer misread NO_CBLAS=0 or NO_LAPACK=0 as
    enabling these options
  • Fixed detection of gfortran when invoked through an mpi wrapper
  • Improve thread reinitialization performance with OpenMP after a fork
  • Added support for building only the subset of the library required
    for a particular precision by specifying BUILD_SINGLE, BUILD_DOUBLE
  • Optional function name prefixes and suffixes are now correctly
    reflected in the generated cblas.h
  • Added CMAKE build support for the LAPACK and multithreading tests

POWER:

  • Added optimized support for POWER10
  • Added support for compiling for POWER8 in 32bit mode
  • Added support for compilation with LLVM/clang
  • Added support for compilation with NVIDIA/PGI compilers
  • Fixed building on big-endian POWER8
  • Fixed miscompilation of ZDOTC by gcc10
  • Fixed alignment errors in the POWER8 SAXPY kernel
  • Improved CPU detection on AIX
  • Supported building with older compilers on POWER9

x86_64:

  • Added support for Intel Cooperlake
  • Added autodetection of AMD Renoir/Matisse/Zen3 cpus
  • Added autodetection of Intel Comet Lake cpus
  • Reimplemented ?sum, ?dot and daxpy using universal intrinsics
  • Reset the fpu state before using the fpu on Windows as a workaround
    for a problem introduced in Windows 10 build 19041 (a.k.a. SDK 2004)
  • Fixed potentially undefined behaviour in the dot and gemv_t kernels
  • Fixed a potential segmentation fault in DYNAMIC_ARCH builds
  • Fixed building for ZEN with PGI/NVIDIA and AMD AOCC compilers

ARMV7:

  • Fixed cpu detection on BSD-like systems

ARMV8:

  • Added preliminary support for Apple Vortex cpus
  • Added support for the Cavium ThunderX3T110 cpu
  • Fixed cpu detection on BSD-like systems
  • Fixed compilation in -std=C18 mode

IBM Z:

  • Added support for compiling with the clang compiler
  • Improved GEMM performance on Z14

Download OpenBLAS

md5sums:
dd211b73398383a44ebd75fffabd937a OpenBLAS-0.3.11.tar.gz
a76bfee7c125071bce6b24eae5b07468 OpenBLAS-0.3.11.zip
bad36be9fe4fe40372b06d326cfc5a2f OpenBLAS-0.3.11-x64.zip