Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Building for Skylake with no AVX support (e.g G4400 or G4500) fails. #274

Open
nemothenoone opened this issue Jul 5, 2019 · 7 comments
Open

Comments

@nemothenoone
Copy link

Build for UNIX on Skylake (e.g. G4400 or G4500) fails because of AVX instruction marked as "black magic" presence in add_n.as and sub_n.as. Commenting out such an instruction makes no changes in tests passing.

Are there some explanation what kind of "black magic" is this?

Because such a "black magic" block the library from functioning on entire set of CPUs, maybe consider commenting it and moving such assemblies to AVX directory? (I could make a pull-request)

@wbhart
Copy link
Owner

wbhart commented Jul 6, 2019

I sincerely doubt that the code is doing the right thing with instructions commented out. In fact, I would say this is absolutely impossible.

Skylake is supposed to have AVX, but Intel continues to insist on selling cut down versions of their processors under the Pentium brand with various features missing.

We certainly can't remove AVX instructions from Skylake code since AVX makes a big difference to performance on proper server chips.

I recommend building your MPIR for a different processor that doesn't support AVX.

If Intel happened to use a different model and family number for this particular chip, we could work around it easily enough. But unless someone wants to write AVX detection code and do all the rearranging necessary to support this properly on a per CPU basis, it really has to stay the way it is.

@nemothenoone
Copy link
Author

nemothenoone commented Jul 6, 2019

No, I actually meant commenting out two lines commented as "black magic" (

vpblendd YMM0, YMM0, YMM0, 0 ; This one is black magic. Beware.
and
vpblendd YMM0, YMM0, YMM0, 0 ; This one is black magic. Beware.
), because they persist in non-AVX assemblies. All the AVX-enabled assemblies are inside the avx directory, which is perfectly fine.

These ones, I mentioned, are just not having any effect in these "black magic" instructions.

@wbhart
Copy link
Owner

wbhart commented Jul 6, 2019

Ok, I see what you are saying. As the AVX registers are not used elsewhere in the code, they are just spurious.

When I get a chance I'll look into it carefully. It's ages since I looked at this assembly code, so it might take some time for me to check it. It'll also need testing for hours with the "try" test program.

@alexjbest
Copy link

I don't recognise these but I'd guess they are a very specific bit-length noop or something like that so there would only be a potential speed issue if removed. Maybe a same length single instruction non avx noop can be found?

@wjblanke
Copy link

We had a G4400 user having problems with the Skylake non-AVX build. GC worked fine for him, as did Broadwell non-AVX so we will be switching to using the latter for Skylake non-AVX.

@nemothenoone
Copy link
Author

nemothenoone commented Jun 13, 2020 via email

@hocheung20
Copy link

vpblendd is copying DWORDs from 2nd operand to 1st operand based on which bits are set in 3rd (immediate) operand.

Note that the 2nd operand (so called source operand) is the combination of the two bolded operands below.

vpblendd YMM0, **YMM0**, **YMM0**, 0

So in fact, the instruction does exactly nothing from an ISA standpoint.

It appears that the Jens Nurman who is one of the authors of this file (add_n.s) in the copyright at the top discovered that there is a way to hint to Skylake processors to use port 7 which results in a measurable latency speedup which he documented in this post:

https://www.agner.org/optimize/blog/read.php?i=415#796

Unfortunately my understanding of how this works ends there so it appears that the "black magic" admonition is well deserved.

However, for those looking for a workaround because they are on Skylake but don't have AVX, removing the vpblendd line probably isn't going to produce any correctness bugs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants