make arraymancer work with devel #505

ringabout · 2021-04-18T15:29:46Z

https://pipelines.actions.githubusercontent.com/ZRinn1OrR0LWxU3iWy1StaQcZRN2kXW9lHWwDJ3esfasrmfdRn/_apis/pipelines/1/runs/18644/signedlogcontent/8?urlExpires=2021-04-18T15%3A29%3A02.1515769Z&urlSigningMethod=HMACV1&urlSignature=ES5o3LHv4%2Bky4FEHjiSAzReQNwqIZDrkd2X%2B6tFYxcU%3D

2021-04-18T13:40:09.5027645Z                  from /usr/lib/gcc/x86_64-linux-gnu/7/include/x86intrin.h:48,
2021-04-18T13:40:09.5028566Z                  from /home/runner/.cache/nim/tests_cpu_d/@mtensor@stest_operators_blas.nim.c:11:
2021-04-18T13:40:09.5031638Z /home/runner/.cache/nim/tests_cpu_d/@mtensor@stest_operators_blas.nim.c: In function ‘gebb_ukernel_int64_x86_AVX512_tensorZtest95operators95blas_37425’:
2021-04-18T13:40:09.5034234Z /usr/lib/gcc/x86_64-linux-gnu/7/include/avx512fintrin.h:282:1: error: inlining failed in call to always_inline ‘_mm512_setzero_si512’: target specific option mismatch
2021-04-18T13:40:09.5035235Z  _mm512_setzero_si512 (void)
2021-04-18T13:40:09.5035871Z  ^~~~~~~~~~~~~~~~~~~~
2021-04-18T13:40:09.5036752Z /home/runner/.cache/nim/tests_cpu_d/@mtensor@stest_operators_blas.nim.c:23985:9: note: called from here
2021-04-18T13:40:09.5037563Z   AB13_1 = _mm512_setzero_si512();
2021-04-18T13:40:09.5038016Z   ~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~
2021-04-18T13:40:09.5039014Z In file included from /usr/lib/gcc/x86_64-linux-gnu/7/include/immintrin.h:45:0,
2021-04-18T13:40:09.5040103Z                  from /usr/lib/gcc/x86_64-linux-gnu/7/include/x86intrin.h:48,
2021-04-18T13:40:09.5041044Z                  from /home/runner/.cache/nim/tests_cpu_d/@mtensor@stest_operators_blas.nim.c:11:
2021-04-18T13:40:09.5042887Z /usr/lib/gcc/x86_64-linux-gnu/7/include/avx512fintrin.h:282:1: error: inlining failed in call to always_inline ‘_mm512_setzero_si512’: target specific option mismatch
2021-04-18T13:40:09.5043892Z  _mm512_setzero_si512 (void)
2021-04-18T13:40:09.5044319Z  ^~~~~~~~~~~~~~~~~~~~
2021-04-18T13:40:09.5045218Z /home/runner/.cache/nim/tests_cpu_d/@mtensor@stest_operators_blas.nim.c:23982:9: note: called from here
2021-04-18T13:40:09.5046020Z   AB13_0 = _mm512_setzero_si512();
2021-04-18T13:40:09.5046491Z   ~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~
2021-04-18T13:40:09.5047498Z In file included from /usr/lib/gcc/x86_64-linux-gnu/7/include/immintrin.h:45:0,
2021-04-18T13:40:09.5048610Z                  from /usr/lib/gcc/x86_64-linux-gnu/7/include/x86intrin.h:48,
2021-04-18T13:40:09.5049536Z                  from /home/runner/.cache/nim/tests_cpu_d/@mtensor@stest_operators_blas.nim.c:11:
2021-04-18T13:40:09.5051321Z /usr/lib/gcc/x86_64-linux-gnu/7/include/avx512fintrin.h:282:1: error: inlining failed in call to always_inline ‘_mm512_setzero_si512’: target specific option mismatch
2021-04-18T13:40:09.5052304Z  _mm512_setzero_si512 (void)
2021-04-18T13:40:09.5052743Z  ^~~~~~~~~~~~~~~~~~~~
2021-04-18T13:40:09.5053612Z /home/runner/.cache/nim/tests_cpu_d/@mtensor@stest_operators_blas.nim.c:23979:9: note: called from here
2021-04-18T13:40:09.5054412Z   AB12_1 = _mm512_setzero_si512();
2021-04-18T13:40:09.5054871Z   ~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~
2021-04-18T13:40:09.5055686Z compilation terminated due to -fmax-errors=3.
2021-04-18T13:40:09.5058447Z Error: execution of an external compiler program 'gcc -c  -w -fmax-errors=3   -I/home/runner/work/Nim/Nim/lib -I/home/runner/work/Nim/Nim/pkgstemp/arraymancer/tests -o /home/runner/.cache/nim/tests_cpu_d/@mtensor@stest_operators_blas.nim.c.o /home/runner/.cache/nim/tests_cpu_d/@mtensor@stest_operators_blas.nim.c' failed with exit code: 1

and https://github.com/nim-lang/Nim/runs/2374356299

The text was updated successfully, but these errors were encountered:

auxym · 2021-12-20T01:13:21Z

Currently getting the same error with nim 1.4.8, 1.60, 1.6.2...

Is there a known workaround?

Running archlinux, error happens whether I use --d:blas=cblas or not. Example program that causes the error:

import arraymancer
let d = [[1, 2, 3], [4, 5, 6]].toTensor()
let x = d * d

git bisect gave me this commit as the culprit: d6b7984

Which isn't particularly recent. Running an i5-4310U cpu, which doesn't have avx512. Something going wrong with cpu detection maybe?

auxym · 2021-12-20T02:44:55Z

auxym@48b721e

Tracked it down a bit. This will compile AND print "BRANCH 2" on my PC. This shows that cpuinfo seems to be working correctly. HOWEVER, if we change the 1st branch to dispatch(x86_AVX512), then compiling fails.

I'm not sure what's going on. Maybe a Nim compiler bug leading to the 1st branch being codegen'd even though it shouldn't?

Vindaar · 2021-12-20T09:49:43Z

Thanks for digging into this. I've seen the error myself, but never encountered it in practice. Never attempted it to fix it, because I was hoping it'd be a trivial fix for mratsim. At the same time I simply lack the experience dealing with stuff like AVX and the error message is rather opaque.

Your git bisect and further digging is really appreciated though! Maybe that's enough for me to figure out a solution.

SteadBytes · 2021-12-21T21:57:38Z

I've been able to reproduce this issue using an i7-8665U CPU which also does not support AVX-512 and have also been able to reproduce the behaviour from @auxym's patch. I've done some further digging but have come up empty so far 😓 I may well be wrong here, but the error messages seem to suggest that gcc is rejecting compilation of the AVX-512 specific C code that Nim generates (even if it wouldn't actually be called at runtime):

/home/ben/.cache/nim/t_d/@mt.nim.c:1976:18: note: called from here 1976 |         AB13_0 = _mm512_setzero_si512();

That function in particular:

/* Create vectors with repeated elements */

static  __inline __m512i __DEFAULT_FN_ATTRS512
_mm512_setzero_si512(void)
{
  return __extension__ (__m512i)(__v8di){ 0, 0, 0, 0, 0, 0, 0, 0 };
}

Vindaar · 2021-12-22T12:00:58Z

Thanks for looking into it @SteadBytes as well!

Just dug myself a bit: If I pass -mavx512dq to the C compiler it compiles successfully.

auxym · 2021-12-22T13:25:36Z

Great news! Can confirm that --passC:-mavx512dq does work. Thanks for figuring that out.

Out of curiosity, why? Wouldn't that enable avx512 extensions? How does that work on a cpu without avx512?

Next step, what do we do about it? Document it? Can it be added automatically to build flags when arraymancer is imported? Does that limit arraymancer to usage on x86 and/or gcc compiler only?

SteadBytes · 2021-12-22T15:44:03Z

Fantastic - nice find @Vindaar!

@auxym I think what's happening is that passing -mavx512dq allows GCC to generate the AVX-512 code regardless of whether the native platform supports it. Things would then go badly if those code paths were hit at runtime which I think may be the case here.

diff --git a/arraymancer.nimble b/arraymancer.nimble
index 3a021e9..93448cf 100644
--- a/arraymancer.nimble
+++ b/arraymancer.nimble
@@ -188,7 +188,7 @@ task all_tests, "Run all tests - Intel MKL + Cuda + OpenCL + OpenMP":
 #   test "tests_cpu_remainder", switch, split = true

 task test, "Run all tests - Default BLAS & Lapack":
-  test "tests_cpu", "", split = false
+  test "tests_cpu", " --passC:-mavx512dq", split = false

 task test_arc, "Run all tests under ARC - Default BLAS & Lapack":
   test "tests_cpu", "--gc:arc", split = false

nimble test
...
Hint: /usr/src/app/tests/ml/test_metrics  [Exec]

[Suite] [ML] Metrics
Traceback (most recent call last)
/usr/src/app/tests/ml/test_metrics.nim(46) test_metrics
/usr/src/app/tests/ml/test_metrics.nim(25) main
/usr/src/app/src/arraymancer/ml/metrics/accuracy_score.nim(26) accuracy_score
/usr/src/app/src/arraymancer/tensor/ufunc.nim(29) astype
/usr/src/app/src/arraymancer/tensor/higher_order_applymap.nim(43) map
/usr/src/app/src/arraymancer/tensor/ufunc.nim(29) :anonymous
SIGILL: Illegal operation.
Illegal instruction (core dumped)
Error: execution of an external program failed: '/usr/src/app/tests/ml/test_metrics '

The failing test calls accuracy_score which performs floating point operations that I think are causing the Illegal instruction error due to the -mavx512dq flag.

Arraymancer/src/arraymancer/ml/metrics/accuracy_score.nim

Line 26 in 6118841

result = (y_true ==. y_pred).astype(float).mean

Inspecting a coredump using GCC shows that the illegal instruction was vmovdqu64 %zmm2, (%rsi):

Program terminated with signal SIGILL, Illegal instruction.
#0  0x000000000044c61f in astype.t_1514 ()
(gdb) layout asm

VMOVDQA64 zmm2 is AVX-512 only.

Vindaar · 2021-12-23T10:29:41Z

I don't fully see a good solution. As it seems to me right now:

we only detect CPU features at runtime
to support AVX512 we need an additional compilation flag
that compilation flag makes it look like AVX512 is supported even if it is not

So this leads me to believe that the best solution for now is to simply disable AVX512 by default and add a Nim -d:avx512 flag that the user has to hand if their CPU supports it. In that case we pass the -mavx512dq flag to the C compiler. Unfortunately, I don't have an AVX512 capable CPU to test if the code even works there.

SteadBytes · 2021-12-23T14:58:21Z

What seems very odd here is that I can reproduce the original compilation error on a CPU with AVX-512:

lscpu
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                8
On-line CPU(s) list:   0-7
Thread(s) per core:    2
Core(s) per socket:    4
Socket(s):             1
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 140
Model name:            11th Gen Intel(R) Core(TM) i7-1185G7 @ 3.00GHz
Stepping:              1
CPU MHz:               2995.201
BogoMIPS:              5990.40
Hypervisor vendor:     Microsoft
Virtualization type:   full
L1d cache:             48K
L1i cache:             32K
L2 cache:              1280K
L3 cache:              12288K
Flags:                 fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology cpuid pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single ssbd ibrs ibpb stibp ibrs_enhanced fsgsbase bmi1 avx2smep bmi2 erms invpcid avx512f avx512dq rdseed adx smap avx512ifma clflushopt clwb avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves avx512vbmi umip avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg avx512_vpopcntdq rdpid avx512_vp2intersect flush_l1d arch_capabilities

Using the same reproducer saved as avx512.nim:

nim c -r avx512.nim
...
inlining failed in call to ‘always_inline’ ‘_mm512_setzero_si512’: target specific option mismatch
  334 | _mm512_setzero_si512 (void)
      | ^~~~~~~~~~~~~~~~~~~~
/home/bsteadman/.cache/nim/t_d/@mt.nim.c:1955:18: note: called from here
 1955 |         AB13_1 = _mm512_setzero_si512();
      |                  ^~~~~~~~~~~~~~~~~~~~~~
...

Compiling with the -mavx512dq flag succeeds as does executing the resulting binary (because AVX 512 is actually available this time).

auxym · 2021-12-23T15:50:12Z

So this leads me to believe that the best solution for now is to simply disable AVX512 by default and add a Nim -d:avx512 flag that the user has to hand if their CPU supports it. In that case we pass the -mavx512dq flag to the C compiler. Unfortunately, I don't have an AVX512 capable CPU to test if the code even works there.

This seems like a good solution to me, and something that can be implemented via {booldefine} and {passC:} pragmas, then gating the avx512 code behind compile time whens instead of runtime ifs. Or maybe even in nim.cfg file, though its syntax for conditionals, etc seems to be undocumented (?).

The main downside of it is that -mavx512dq is gcc-specific, yes? Then we lose compatibility with clang, msvc, etc? TBH sounds like a reasonable tradeoff to me, at least as a quick initial fix: use gcc if you want avx512 support. Perhaps later we can check the c compiler that is used and add different flags for avx512.

Another option would be using cpuinfo at compile time to automatically discover CPU features, but IMO it might not be the best idea. Even if the user is on a avx512 cpu, they might not want an avx512 build, eg if they want to share it with others. Sounds like a "vanilla" build as a default would be the best choice, IMO.

SteadBytes · 2021-12-24T12:40:44Z

Unfortunately, I don't have an AVX512 capable CPU to test if the code even works there.

Take a look at my earlier comment #505 (comment)

I agree that disabling AVX-512 by default is probably a good idea. However I feel like something is missing here - this was working, right? 🤔

The main downside of it is that -mavx512dq is gcc-specific, yes?

FWIW, clang also has -mavx512dq.

Even if the user is on a avx512 cpu, they might not want an avx512 build, eg if they want to share it with others. Sounds like a "vanilla" build as a default would be the best choice, IMO.

For GCC/clang this is handled with the -march compiler flag e.g. -march=native, -march=skylake-avx512 etc. These set a bunch of #defines for the available CPU features allowing for the corresponding conditional code to be written.

It seems like disabling AVX-512 by default and then opting in at compile time rather than runtime CPU feature detection will more reliably produce working code - at the expense of having to compile with specific options and generating less "universal" code. I'm personally fine with that trade off but I, of course, cannot speak for everyone 😅

This commit fixes issue mratsim#505, which is a codegen issue arising from generated C code that is only valid if the `-mavx512dq` compilation flag is handed to the C compiler. The issue is that: - we cannot generate the code without the compilation flag (and compile it successfully) - handing the compilation flag makes the code compile, but forces every CPU to attempt to run the AVX512 code, which is an illegal operation for CPUs that do not support it. For this reason AVX512 support is hidden behind a `-d:avx512` compilation flag that needs to be handed. The resulting binary then *will* use AVX512 for gemm.

* do not compile with AVX512 support by default This commit fixes issue #505, which is a codegen issue arising from generated C code that is only valid if the `-mavx512dq` compilation flag is handed to the C compiler. The issue is that: - we cannot generate the code without the compilation flag (and compile it successfully) - handing the compilation flag makes the code compile, but forces every CPU to attempt to run the AVX512 code, which is an illegal operation for CPUs that do not support it. For this reason AVX512 support is hidden behind a `-d:avx512` compilation flag that needs to be handed. The resulting binary then *will* use AVX512 for gemm. * update README with a note about `-d:avx512` * fix whitespace in README

Vindaar · 2021-12-29T11:43:43Z

Now we disable AVX512 by default. If anyone comes up with a solution that makes it work automagically if available, but still compiles on all targets, I'm all ears.

mratsim · 2022-01-03T10:32:15Z

I think there is a change in how Nim passes file-specific compilation flags in 1.6.
This suffers from the same:

CI: test with multiple Nim versions status-im/nim-blscurve#133

The cause of arraymancer failure has been tracked here: mratsim/Arraymancer#505 And it was fixed by mratsim/Arraymancer#542

auxym · 2022-01-05T01:46:26Z

@mratsim I did reproduce with 1.4.8 (and some older versions too if I recall correctly)

Vindaar · 2022-01-05T10:20:08Z

I think there is a change in how Nim passes file-specific compilation flags in 1.6. This suffers from the same:

CI: test with multiple Nim versions status-im/nim-blscurve#133

I'm not sure if this is related.
Before we did not even have any file specific compilation flags that are related to AVX512, no?

mratsim · 2022-01-05T12:28:38Z

We do

Arraymancer/nim.cfg

Lines 77 to 83 in 20e0dc3

    
           gemm_ukernel_sse.always = "-msse" 
        
           gemm_ukernel_sse2.always = "-msse2" 
        
           gemm_ukernel_sse4_1.always = "-msse4.1" 
        
           gemm_ukernel_avx.always = "-mavx" 
        
           gemm_ukernel_avx_fma.always = "-mavx -mfma" 
        
           gemm_ukernel_avx2.always = "-mavx2" 
        
           gemm_ukernel_avx512.always = "-mavx512f -mavx512dq"

SteadBytes · 2022-01-05T18:25:06Z

@mratsim I did reproduce with 1.4.8 (and some older versions too if I recall correctly)

That's interesting. I'm able to compile the example fine on 1.4.8 🤔 1.6.0 and above fails as @mratsim suggested.

status-im/nim-blscurve#133 (comment)

Vindaar · 2022-01-08T10:59:24Z

For reference also here (posted initially here status-im/nim-blscurve#133 (comment)):

I just did a git bisect on one of the test cases that are broken with the AVX512 error with arraymancer before #542 was merged. The Nim PR that introduced the regression is:

nim-lang/Nim#17311

which nicely enough already has a still open "fix arraymancer regression" TODO!

SteadBytes · 2022-01-08T14:51:21Z

Nice find @Vindaar!

which nicely enough already has a still open "fix arraymancer regression" TODO!

Haha, the infamous TODO that doesn't get done strikes again 😆 On the plus side, hopefully that means there's a fix to be had sooner rather than later 🤞

The cause of arraymancer failure has been tracked here: mratsim/Arraymancer#505 And it was fixed by mratsim/Arraymancer#542

ringabout mentioned this issue Jul 20, 2021

nimyaml doesn't work on devel anymore nim-lang/Nim#18543

Closed

Vindaar mentioned this issue Dec 29, 2021

Disable avx512 default #542

Merged

Vindaar closed this as completed in #542 Dec 29, 2021

This was referenced Jan 3, 2022

CI: test with multiple Nim versions status-im/nim-blscurve#133

Merged

try 1.6 CI mratsim/constantine#170

Merged

ringabout added a commit to nim-lang/Nim that referenced this issue Jan 3, 2022

enable multiple packages (arraymancer, fidget ...)

515741c

The cause of arraymancer failure has been tracked here: mratsim/Arraymancer#505 And it was fixed by mratsim/Arraymancer#542

ringabout mentioned this issue Jan 3, 2022

enable multiple packages (arraymancer, fidget ...) nim-lang/Nim#19311

Merged

ringabout added a commit to nim-lang/Nim that referenced this issue Jan 3, 2022

enable multiple packages (arraymancer, fidget ...) (#19311)

19bcb43

The cause of arraymancer failure has been tracked here: mratsim/Arraymancer#505 And it was fixed by mratsim/Arraymancer#542

ringabout mentioned this issue Jan 8, 2022

Recent Nim makes Weave crash mratsim/weave#171

Open

Vindaar mentioned this issue Jan 8, 2022

IC: green tests nim-lang/Nim#17311

Merged

4 tasks

ringabout mentioned this issue Jan 11, 2022

enable weave nim-lang/Nim#19363

Merged

PMunch pushed a commit to PMunch/Nim that referenced this issue Mar 28, 2022

enable multiple packages (arraymancer, fidget ...) (nim-lang#19311)

b5d7c82

The cause of arraymancer failure has been tracked here: mratsim/Arraymancer#505 And it was fixed by mratsim/Arraymancer#542

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

make arraymancer work with devel #505

make arraymancer work with devel #505

ringabout commented Apr 18, 2021 •

edited

Loading

auxym commented Dec 20, 2021 •

edited

Loading

auxym commented Dec 20, 2021 •

edited

Loading

Vindaar commented Dec 20, 2021

SteadBytes commented Dec 21, 2021

Vindaar commented Dec 22, 2021

auxym commented Dec 22, 2021

SteadBytes commented Dec 22, 2021 •

edited

Loading

Vindaar commented Dec 23, 2021

SteadBytes commented Dec 23, 2021 •

edited

Loading

auxym commented Dec 23, 2021

SteadBytes commented Dec 24, 2021

Vindaar commented Dec 29, 2021

mratsim commented Jan 3, 2022

auxym commented Jan 5, 2022

Vindaar commented Jan 5, 2022

mratsim commented Jan 5, 2022

SteadBytes commented Jan 5, 2022 •

edited

Loading

Vindaar commented Jan 8, 2022

SteadBytes commented Jan 8, 2022

make arraymancer work with devel #505

make arraymancer work with devel #505

Comments

ringabout commented Apr 18, 2021 • edited Loading

auxym commented Dec 20, 2021 • edited Loading

auxym commented Dec 20, 2021 • edited Loading

Vindaar commented Dec 20, 2021

SteadBytes commented Dec 21, 2021

Vindaar commented Dec 22, 2021

auxym commented Dec 22, 2021

SteadBytes commented Dec 22, 2021 • edited Loading

Vindaar commented Dec 23, 2021

SteadBytes commented Dec 23, 2021 • edited Loading

auxym commented Dec 23, 2021

SteadBytes commented Dec 24, 2021

Vindaar commented Dec 29, 2021

mratsim commented Jan 3, 2022

auxym commented Jan 5, 2022

Vindaar commented Jan 5, 2022

mratsim commented Jan 5, 2022

SteadBytes commented Jan 5, 2022 • edited Loading

Vindaar commented Jan 8, 2022

SteadBytes commented Jan 8, 2022

ringabout commented Apr 18, 2021 •

edited

Loading

auxym commented Dec 20, 2021 •

edited

Loading

auxym commented Dec 20, 2021 •

edited

Loading

SteadBytes commented Dec 22, 2021 •

edited

Loading

SteadBytes commented Dec 23, 2021 •

edited

Loading

SteadBytes commented Jan 5, 2022 •

edited

Loading