-
Notifications
You must be signed in to change notification settings - Fork 135
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Building for Skylake with no AVX support (e.g G4400 or G4500) fails. #274
Comments
I sincerely doubt that the code is doing the right thing with instructions commented out. In fact, I would say this is absolutely impossible. Skylake is supposed to have AVX, but Intel continues to insist on selling cut down versions of their processors under the Pentium brand with various features missing. We certainly can't remove AVX instructions from Skylake code since AVX makes a big difference to performance on proper server chips. I recommend building your MPIR for a different processor that doesn't support AVX. If Intel happened to use a different model and family number for this particular chip, we could work around it easily enough. But unless someone wants to write AVX detection code and do all the rearranging necessary to support this properly on a per CPU basis, it really has to stay the way it is. |
No, I actually meant commenting out two lines commented as "black magic" ( mpir/mpn/x86_64/skylake/add_n.as Line 56 in 55fe6a9
mpir/mpn/x86_64/skylake/sub_n.as Line 56 in 55fe6a9
These ones, I mentioned, are just not having any effect in these "black magic" instructions. |
Ok, I see what you are saying. As the AVX registers are not used elsewhere in the code, they are just spurious. When I get a chance I'll look into it carefully. It's ages since I looked at this assembly code, so it might take some time for me to check it. It'll also need testing for hours with the "try" test program. |
I don't recognise these but I'd guess they are a very specific bit-length noop or something like that so there would only be a potential speed issue if removed. Maybe a same length single instruction non avx noop can be found? |
We had a G4400 user having problems with the Skylake non-AVX build. GC worked fine for him, as did Broadwell non-AVX so we will be switching to using the latter for Skylake non-AVX. |
I got these problems as well. Thinking of nop-ing same amount of bits.
… On 13 Jun 2020, at 08:39, wjblanke ***@***.***> wrote:
We had a G4400 user having problems with the Skylake non-AVX build. GC worked fine for him, as did Broadwell non-AVX so we will be switching to using the latter for Skylake non-AVX.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub <#274 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAVDIJDHEGAEVIA7SVUM5BDRWMGHJANCNFSM4H6QS73Q>.
|
vpblendd is copying DWORDs from 2nd operand to 1st operand based on which bits are set in 3rd (immediate) operand. Note that the 2nd operand (so called source operand) is the combination of the two bolded operands below.
So in fact, the instruction does exactly nothing from an ISA standpoint. It appears that the Jens Nurman who is one of the authors of this file (add_n.s) in the copyright at the top discovered that there is a way to hint to Skylake processors to use port 7 which results in a measurable latency speedup which he documented in this post: https://www.agner.org/optimize/blog/read.php?i=415#796 Unfortunately my understanding of how this works ends there so it appears that the "black magic" admonition is well deserved. However, for those looking for a workaround because they are on Skylake but don't have AVX, removing the |
Build for UNIX on Skylake (e.g. G4400 or G4500) fails because of AVX instruction marked as "black magic" presence in
add_n.as
andsub_n.as
. Commenting out such an instruction makes no changes in tests passing.Are there some explanation what kind of "black magic" is this?
Because such a "black magic" block the library from functioning on entire set of CPUs, maybe consider commenting it and moving such assemblies to AVX directory? (I could make a pull-request)
The text was updated successfully, but these errors were encountered: