-
Notifications
You must be signed in to change notification settings - Fork 126
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow to choose fine-grained CPU intrinsics on as CMake options #849
Conversation
For awareness: @kpu @XapaJIaMnu @ugermann |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me.
CMakeLists.txt
Outdated
if(BUILD_ARCH STREQUAL "native") | ||
# @TODO: if we are building "-march=native" anyway is the whole shebang here even useful? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe not? :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Native would enable all the supported flags, so specifying the additional things won't do anything.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think the only thing that's useful is that the XXX_FOUND vars get added to the build and can be displayed with --build-info
. I think I will leave the messages in but remove the flags maybe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, the messages here should rather inform the user that with march=native their request to build with e.g. -DCOMPILE_AVX512=off will be essentially ignored if avx512 was detected since the compiler will add it anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the long run I'd still like to see fat binaries that determine CPU features at run time, so that we can actually deliver binaries that run everywhere with the best code path for the given architecture. For dockerization, I always have to compile with the lowest common set, because I don't know ahead of time what architecture the container will run on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://github.com/google/cpu_features is a start.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Worth taking a look at.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We actually might have similar requirements for something like that in the very near future. We might want to sync?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would be quite time consuming, but very rewarding potentially.
The way we do it in intgemm is that at runtime you have a bunch of function ptrs that get initiated to the the kernel
that corresponds to your architecture. We have to do that for all the performance critical marian functions and the make sure that the non-performance critical parts only generate generic x86 instructions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you want to be able to distribute binaries that make the most of the available hardware, you either have to maintain a zoo of binaries and educate users how to determine which of the many versions available is the right one for them, or have the software make that decision for them. In that sense it's not only rewarding, but inevitable. We should focus on decoding first (anyone with the technical knowledge to set up training will be competent to compile).
My hunch is that we can replace pre-compiler switches by specialization of (inline) template functions. MKL is currently another obstacle, as it insists on linking to a dynamic system library. It's currently not possible to create a fully static executable even if you know the CPU intrinsics available and or are willing to go with the minimum set of intrinsics required.
…native if there is compiler support
OK, since no one screamed that this is horrible, I am pulling it in. |
New motto. |
Description
Checklist