OpenBLAS 0.3.11 version
Pre-release
Pre-release
martin-frbg
released this
17 Oct 20:15
·
3496 commits
to release-0.3.0
since this release
NOTE there appear to be several defects in this version unfortunately - this should not be redistributed or used in a production environment
common:
-
API change:
the newly added BFLOAT16 functions were renamed to use the letter "B" instead of "H" to avoid potential confusion with the IEEE "half precision float" type, i.e. the 0.3.10 SHGEMM is now SBGEMM and the corresponding build option was changed from "BUILD_HALF" to "BUILD_BFLOAT16".
- Reduced the default BLAS3_MEM_ALLOC_THRESHOLD (used as an upper
limit for placing temporary arrays on the stack) to be compatible
with a stack size of 1mb (as imposed by the JAVA runtime library) - Added mixed-precision dot function SBDOT and utility functions
shstobf16, shdtobf16, sbf16tos and dbf16tod to convert between
single or double precision float arrays and bfloat16 arrays - Fixed prototypes of LAPACK_?ggsvp and LAPACK_?ggsvd functions
in lapack.h - Fixed underflow and rounding errors in LAPACK SLANV2 and DLANV2
(causing miscalculations in e.g. SHSEQR/DHSEQR, LAPACK issue #263) - Fixed workspace calculation in LAPACK ?GELQ (LAPACK issue #415)
- Fixed several bugs in the LAPACK testsuite
- Improved performance of TRMM and TRSM for certain problem sizes
- Fixed infinite recursions and workspace miscalculations in ReLAPACK
- CMAKE builds no longer require pkg-config for creating the .pc file
- Makefile builds no longer misread NO_CBLAS=0 or NO_LAPACK=0 as
enabling these options - Fixed detection of gfortran when invoked through an mpi wrapper
- Improve thread reinitialization performance with OpenMP after a fork
- Added support for building only the subset of the library required
for a particular precision by specifying BUILD_SINGLE, BUILD_DOUBLE - Optional function name prefixes and suffixes are now correctly
reflected in the generated cblas.h - Added CMAKE build support for the LAPACK and multithreading tests
POWER:
- Added optimized support for POWER10
- Added support for compiling for POWER8 in 32bit mode
- Added support for compilation with LLVM/clang
- Added support for compilation with NVIDIA/PGI compilers
- Fixed building on big-endian POWER8
- Fixed miscompilation of ZDOTC by gcc10
- Fixed alignment errors in the POWER8 SAXPY kernel
- Improved CPU detection on AIX
- Supported building with older compilers on POWER9
x86_64:
- Added support for Intel Cooperlake
- Added autodetection of AMD Renoir/Matisse/Zen3 cpus
- Added autodetection of Intel Comet Lake cpus
- Reimplemented ?sum, ?dot and daxpy using universal intrinsics
- Reset the fpu state before using the fpu on Windows as a workaround
for a problem introduced in Windows 10 build 19041 (a.k.a. SDK 2004) - Fixed potentially undefined behaviour in the dot and gemv_t kernels
- Fixed a potential segmentation fault in DYNAMIC_ARCH builds
- Fixed building for ZEN with PGI/NVIDIA and AMD AOCC compilers
ARMV7:
- Fixed cpu detection on BSD-like systems
ARMV8:
- Added preliminary support for Apple Vortex cpus
- Added support for the Cavium ThunderX3T110 cpu
- Fixed cpu detection on BSD-like systems
- Fixed compilation in -std=C18 mode
IBM Z:
- Added support for compiling with the clang compiler
- Improved GEMM performance on Z14
md5sums:
dd211b73398383a44ebd75fffabd937a OpenBLAS-0.3.11.tar.gz
a76bfee7c125071bce6b24eae5b07468 OpenBLAS-0.3.11.zip
bad36be9fe4fe40372b06d326cfc5a2f OpenBLAS-0.3.11-x64.zip