Building for Skylake with no AVX support (e.g G4400 or G4500) fails. #274

nemothenoone · 2019-07-05T23:55:02Z

Build for UNIX on Skylake (e.g. G4400 or G4500) fails because of AVX instruction marked as "black magic" presence in add_n.as and sub_n.as. Commenting out such an instruction makes no changes in tests passing.

Are there some explanation what kind of "black magic" is this?

Because such a "black magic" block the library from functioning on entire set of CPUs, maybe consider commenting it and moving such assemblies to AVX directory? (I could make a pull-request)

The text was updated successfully, but these errors were encountered:

wbhart · 2019-07-06T00:44:49Z

I sincerely doubt that the code is doing the right thing with instructions commented out. In fact, I would say this is absolutely impossible.

Skylake is supposed to have AVX, but Intel continues to insist on selling cut down versions of their processors under the Pentium brand with various features missing.

We certainly can't remove AVX instructions from Skylake code since AVX makes a big difference to performance on proper server chips.

I recommend building your MPIR for a different processor that doesn't support AVX.

If Intel happened to use a different model and family number for this particular chip, we could work around it easily enough. But unless someone wants to write AVX detection code and do all the rearranging necessary to support this properly on a per CPU basis, it really has to stay the way it is.

nemothenoone · 2019-07-06T00:53:55Z

No, I actually meant commenting out two lines commented as "black magic" (

mpir/mpn/x86_64/skylake/add_n.as

Line 56 in 55fe6a9

vpblendd YMM0, YMM0, YMM0, 0 ; This one is black magic. Beware.

and

mpir/mpn/x86_64/skylake/sub_n.as

Line 56 in 55fe6a9

vpblendd YMM0, YMM0, YMM0, 0 ; This one is black magic. Beware.

), because they persist in non-AVX assemblies. All the AVX-enabled assemblies are inside the avx directory, which is perfectly fine.

These ones, I mentioned, are just not having any effect in these "black magic" instructions.

wbhart · 2019-07-06T01:00:34Z

Ok, I see what you are saying. As the AVX registers are not used elsewhere in the code, they are just spurious.

When I get a chance I'll look into it carefully. It's ages since I looked at this assembly code, so it might take some time for me to check it. It'll also need testing for hours with the "try" test program.

alexjbest · 2019-07-09T05:11:52Z

I don't recognise these but I'd guess they are a very specific bit-length noop or something like that so there would only be a potential speed issue if removed. Maybe a same length single instruction non avx noop can be found?

wjblanke · 2020-06-13T05:38:49Z

We had a G4400 user having problems with the Skylake non-AVX build. GC worked fine for him, as did Broadwell non-AVX so we will be switching to using the latter for Skylake non-AVX.

nemothenoone · 2020-06-13T06:09:56Z

I got these problems as well. Thinking of nop-ing same amount of bits.

…

On 13 Jun 2020, at 08:39, wjblanke ***@***.***> wrote: We had a G4400 user having problems with the Skylake non-AVX build. GC worked fine for him, as did Broadwell non-AVX so we will be switching to using the latter for Skylake non-AVX. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#274 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAVDIJDHEGAEVIA7SVUM5BDRWMGHJANCNFSM4H6QS73Q>.

hocheung20 · 2020-08-23T20:36:58Z

vpblendd is copying DWORDs from 2nd operand to 1st operand based on which bits are set in 3rd (immediate) operand.

Note that the 2nd operand (so called source operand) is the combination of the two bolded operands below.

vpblendd YMM0, **YMM0**, **YMM0**, 0

So in fact, the instruction does exactly nothing from an ISA standpoint.

It appears that the Jens Nurman who is one of the authors of this file (add_n.s) in the copyright at the top discovered that there is a way to hint to Skylake processors to use port 7 which results in a measurable latency speedup which he documented in this post:

https://www.agner.org/optimize/blog/read.php?i=415#796

Unfortunately my understanding of how this works ends there so it appears that the "black magic" admonition is well deserved.

However, for those looking for a workaround because they are on Skylake but don't have AVX, removing the vpblendd line probably isn't going to produce any correctness bugs.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Building for Skylake with no AVX support (e.g G4400 or G4500) fails. #274

Building for Skylake with no AVX support (e.g G4400 or G4500) fails. #274

nemothenoone commented Jul 5, 2019

wbhart commented Jul 6, 2019

nemothenoone commented Jul 6, 2019 •

edited

Loading

wbhart commented Jul 6, 2019

alexjbest commented Jul 9, 2019

wjblanke commented Jun 13, 2020

nemothenoone commented Jun 13, 2020 via email

hocheung20 commented Aug 23, 2020

Building for Skylake with no AVX support (e.g G4400 or G4500) fails. #274

Building for Skylake with no AVX support (e.g G4400 or G4500) fails. #274

Comments

nemothenoone commented Jul 5, 2019

wbhart commented Jul 6, 2019

nemothenoone commented Jul 6, 2019 • edited Loading

wbhart commented Jul 6, 2019

alexjbest commented Jul 9, 2019

wjblanke commented Jun 13, 2020

nemothenoone commented Jun 13, 2020 via email

hocheung20 commented Aug 23, 2020

nemothenoone commented Jul 6, 2019 •

edited

Loading