-
Notifications
You must be signed in to change notification settings - Fork 126
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow to choose fine-grained CPU intrinsics on as CMake options #849
Merged
Merged
Changes from 1 commit
Commits
Show all changes
3 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe not? :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Native would enable all the supported flags, so specifying the additional things won't do anything.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think the only thing that's useful is that the XXX_FOUND vars get added to the build and can be displayed with
--build-info
. I think I will leave the messages in but remove the flags maybe.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, the messages here should rather inform the user that with march=native their request to build with e.g. -DCOMPILE_AVX512=off will be essentially ignored if avx512 was detected since the compiler will add it anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the long run I'd still like to see fat binaries that determine CPU features at run time, so that we can actually deliver binaries that run everywhere with the best code path for the given architecture. For dockerization, I always have to compile with the lowest common set, because I don't know ahead of time what architecture the container will run on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://github.com/google/cpu_features is a start.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Worth taking a look at.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We actually might have similar requirements for something like that in the very near future. We might want to sync?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would be quite time consuming, but very rewarding potentially.
The way we do it in intgemm is that at runtime you have a bunch of function ptrs that get initiated to the the
kernel
that corresponds to your architecture. We have to do that for all the performance critical marian functions and the make sure that the non-performance critical parts only generate generic x86 instructions.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you want to be able to distribute binaries that make the most of the available hardware, you either have to maintain a zoo of binaries and educate users how to determine which of the many versions available is the right one for them, or have the software make that decision for them. In that sense it's not only rewarding, but inevitable. We should focus on decoding first (anyone with the technical knowledge to set up training will be competent to compile).
My hunch is that we can replace pre-compiler switches by specialization of (inline) template functions. MKL is currently another obstacle, as it insists on linking to a dynamic system library. It's currently not possible to create a fully static executable even if you know the CPU intrinsics available and or are willing to go with the minimum set of intrinsics required.