Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow to choose fine-grained CPU intrinsics on as CMake options #849

Merged
merged 3 commits into from
Apr 9, 2021

Conversation

emjotde
Copy link
Member

@emjotde emjotde commented Apr 9, 2021

Description

  • Allow for fine-grained CPU intrinsics overrides when BUILD_ARCH != native e.g. -DBUILD_ARCH=x86-64 -DCOMPILE_AVX512=off.
  • For BUILD_ARCH != native enable all intrinsics types by default, can be disabled like this: -DCOMPILE_AVX512=off

Checklist

  • I have tested the code manually
  • I have run regression tests
  • I have read and followed CONTRIBUTING.md
  • I have updated CHANGELOG.md

@emjotde
Copy link
Member Author

emjotde commented Apr 9, 2021

For awareness: @kpu @XapaJIaMnu @ugermann
Let me know if this causes issues for you.

@emjotde emjotde requested a review from aaronpburke April 9, 2021 01:22
Copy link
Member

@ykim362 ykim362 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

CMakeLists.txt Outdated
if(BUILD_ARCH STREQUAL "native")
# @TODO: if we are building "-march=native" anyway is the whole shebang here even useful?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe not? :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Native would enable all the supported flags, so specifying the additional things won't do anything.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think the only thing that's useful is that the XXX_FOUND vars get added to the build and can be displayed with --build-info. I think I will leave the messages in but remove the flags maybe.

Copy link
Member Author

@emjotde emjotde Apr 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, the messages here should rather inform the user that with march=native their request to build with e.g. -DCOMPILE_AVX512=off will be essentially ignored if avx512 was detected since the compiler will add it anyway.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the long run I'd still like to see fat binaries that determine CPU features at run time, so that we can actually deliver binaries that run everywhere with the best code path for the given architecture. For dockerization, I always have to compile with the lowest common set, because I don't know ahead of time what architecture the container will run on.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worth taking a look at.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We actually might have similar requirements for something like that in the very near future. We might want to sync?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be quite time consuming, but very rewarding potentially.

The way we do it in intgemm is that at runtime you have a bunch of function ptrs that get initiated to the the kernel that corresponds to your architecture. We have to do that for all the performance critical marian functions and the make sure that the non-performance critical parts only generate generic x86 instructions.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you want to be able to distribute binaries that make the most of the available hardware, you either have to maintain a zoo of binaries and educate users how to determine which of the many versions available is the right one for them, or have the software make that decision for them. In that sense it's not only rewarding, but inevitable. We should focus on decoding first (anyone with the technical knowledge to set up training will be competent to compile).

My hunch is that we can replace pre-compiler switches by specialization of (inline) template functions. MKL is currently another obstacle, as it insists on linking to a dynamic system library. It's currently not possible to create a fully static executable even if you know the CPU intrinsics available and or are willing to go with the minimum set of intrinsics required.

@emjotde
Copy link
Member Author

emjotde commented Apr 9, 2021

OK, since no one screamed that this is horrible, I am pulling it in.

@emjotde emjotde merged commit be65065 into master Apr 9, 2021
@kpu
Copy link
Member

kpu commented Apr 9, 2021

OK, since no one screamed that this is horrible, I am pulling it in.

New motto.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants