BLIS Append 64_ Suffix to All F77 Exported #4463

xrq-phys · 2022-02-19T07:29:15Z

Resolves JuliaLinearAlgebra/libblastrampoline#36

Currently only level-1/2/3 S/D/C/Z routines are exported w/ 64_ suffix, while libblastrampoline identifies BLAS suffix against isamax. This causes the issue above. The patch here should resolve the issue and kickoff lbt+libblis.

Append 64_ suffix to all F77 exported routines.

giordano · 2022-02-19T13:05:23Z

Nice:

julia> using LinearAlgebra

julia> peakflops(5000)
1.606979535336095e11

julia> BLAS.lbt_forward("./libblis.so", clear=true)
155

julia> peakflops(5000)
3.5732036026861916e10

Edit: I realised after posting that BLIS is slower, didn't notice the different order of magnitude 😬

Although for other operations OpenBLAS is faster:

julia> using BenchmarkTools

julia> LinearAlgebra.__init__()

julia> @benchmark BLAS.axpy!(a, x, y) setup=(T=Float32; N=Int(1e6); a=randn(T); x=randn(T, N); y=randn(T, N)) evals=1
BenchmarkTools.Trial: 208 samples with 1 evaluation.
 Range (min … max):  123.510 μs …   4.651 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     176.543 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   262.716 μs ± 386.684 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ▁█▃▁                                                           
  ████▆▄▃▃▄▃▂▄▃▄▃▃▃▄▂▂▃▁▁▂▁▁▁▂▃▁▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▂ ▃
  124 μs           Histogram: frequency by time         1.15 ms <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> BLAS.lbt_forward("./libblis.so", clear=true)
155

julia> @benchmark BLAS.axpy!(a, x, y) setup=(T=Float32; N=Int(1e6); a=randn(T); x=randn(T, N); y=randn(T, N)) evals=1
BenchmarkTools.Trial: 412 samples with 1 evaluation.
 Range (min … max):  330.814 μs … 761.048 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     484.763 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   476.162 μs ±  86.159 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

   █ ▂ ▁                ▇▆▅▆    ▄  ▁▁ ▁▁                         
  ████▇█▅▆▄▃▃▁▄▃▃▃▁▃▄▅▆▅█████▆▆██████████▅▅▆▅▅▄▃▁▁▃▃▃▃▁▁▃▁▁▃▁▁▃ ▄
  331 μs           Histogram: frequency by time          700 μs <

 Memory estimate: 0 bytes, allocs estimate: 0.

Does BLIS doe runtime detection of features on all architectures? In particular I'm interested in SVE for A64FX, I saw you worked on that.

xrq-phys · 2022-02-19T13:12:21Z

@giordano Thanks for the info.

Now BLIS has no specialized optimization for level-2 BLAS operations I guess that where the slowdown comes from.

SVE is not compiled for now due to GNU as doesn't assembly SVE instruction without -march= while BinaryBuilder.jl disables it.

giordano · 2022-02-19T13:22:16Z

Ok. We do have support for multiple microarchitectures, but still we need to flesh out some details, and I need to fix some compiler flags for aarch64. With JuliaLang/julia#44194 we'll eventually be able to target A64FX, too.

While we're here: do you happen to know whether A64FX requires AES? 🙂

giordano · 2022-02-19T13:24:33Z

SVE is not compiled for now due to GNU as doesn't assembly SVE instruction without -march= while BinaryBuilder.jl disables it.

Wait, would BLIS build a "fat" library for all the targets into a single file, like OpenBLAS does? Because in that case it's ok to disable the check for -march, as long as the library can be used on all microarchitectures.

xrq-phys · 2022-02-19T13:25:48Z

I'm afraid I do not know about this.

Wait, would BLIS build a "fat" library for all the targets into a single file, like OpenBLAS does? Because in that case it's ok to disable the check for -march, as long as the library can be used on all microarchitectures.

Exactly. Those asm compiled with -march=...+sve would not actually get executed when there's no SVE at all. I suppose this would make BLIS easier to support SVE than Julia as a whole.

giordano · 2022-02-19T13:27:25Z

Ok, then you can add lock_microarchitecture=false, see

Yggdrasil/O/OpenBLAS/[email protected]/build_tarballs.jl

Line 18 in 3f85ffc

preferred_gcc_version=v"6", lock_microarchitecture=false, julia_compat="1.7")

xrq-phys · 2022-02-19T13:29:30Z

Excellent!

Do I need to somehow stick to GCC 8 for max compatibility? Or can I push GCC to 10 for arm_sve.h?

Both approaches would work for SVE processors though.

giordano · 2022-02-19T14:05:57Z

Do I need to somehow stick to GCC 8 for max compatibility? Or can I push GCC to 10 for arm_sve.h?

The main compatibility concern we usually have is when compiling C++ code which would end up requiring a too new libstdc++ at runtime. However I don't see symbols tagged with GLIBCXX in libblis:

% nm libblis.so|grep GLIBCXX
%

so I think it should be ok to use GCC 10 for this. We also have GCC 11.

xrq-phys · 2022-02-19T14:09:33Z

BLIS uses C only. Upgrading to GCC 10 would save me source screening work then. Thanks.

We also have GCC 11.

Seen when compiling for aarch64-apple 😉. Nice work!

Append 64_ suffix to all F77 exported routines.

Update bli_macro_defs.h.f77suffix64.patch

be5c6d5

Append 64_ suffix to all F77 exported routines.

giordano merged commit 3f85ffc into JuliaPackaging:master Feb 19, 2022

giordano mentioned this pull request Feb 19, 2022

List BLAS libraries known to work with libblastrampoline JuliaLinearAlgebra/libblastrampoline#63

Closed

simeonschaub pushed a commit to simeonschaub/Yggdrasil that referenced this pull request Feb 23, 2022

[BLIS] Append 64_ Suffix to All F77 Exported (JuliaPackaging#4463)

b91ef7f

Append 64_ suffix to all F77 exported routines.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BLIS Append 64_ Suffix to All F77 Exported #4463

BLIS Append 64_ Suffix to All F77 Exported #4463

xrq-phys commented Feb 19, 2022 •

edited

Loading

giordano commented Feb 19, 2022 •

edited

Loading

xrq-phys commented Feb 19, 2022

giordano commented Feb 19, 2022

giordano commented Feb 19, 2022

xrq-phys commented Feb 19, 2022

giordano commented Feb 19, 2022

xrq-phys commented Feb 19, 2022

giordano commented Feb 19, 2022

xrq-phys commented Feb 19, 2022

BLIS Append 64_ Suffix to All F77 Exported #4463

BLIS Append 64_ Suffix to All F77 Exported #4463

Conversation

xrq-phys commented Feb 19, 2022 • edited Loading

giordano commented Feb 19, 2022 • edited Loading

xrq-phys commented Feb 19, 2022

giordano commented Feb 19, 2022

giordano commented Feb 19, 2022

xrq-phys commented Feb 19, 2022

giordano commented Feb 19, 2022

xrq-phys commented Feb 19, 2022

giordano commented Feb 19, 2022

xrq-phys commented Feb 19, 2022

xrq-phys commented Feb 19, 2022 •

edited

Loading

giordano commented Feb 19, 2022 •

edited

Loading