Allow to choose fine-grained CPU intrinsics on as CMake options #849

emjotde · 2021-04-09T01:21:05Z

Description

Allow for fine-grained CPU intrinsics overrides when BUILD_ARCH != native e.g. -DBUILD_ARCH=x86-64 -DCOMPILE_AVX512=off.
For BUILD_ARCH != native enable all intrinsics types by default, can be disabled like this: -DCOMPILE_AVX512=off

Checklist

I have tested the code manually
I have run regression tests
I have read and followed CONTRIBUTING.md
I have updated CHANGELOG.md

emjotde · 2021-04-09T01:22:21Z

For awareness: @kpu @XapaJIaMnu @ugermann
Let me know if this causes issues for you.

ykim362

Looks good to me.

ykim362 · 2021-04-09T05:28:48Z

CMakeLists.txt

  if(BUILD_ARCH STREQUAL "native")
+    # @TODO: if we are building "-march=native" anyway is the whole shebang here even useful?


maybe not? :)

Native would enable all the supported flags, so specifying the additional things won't do anything.

Yeah, I think the only thing that's useful is that the XXX_FOUND vars get added to the build and can be displayed with --build-info. I think I will leave the messages in but remove the flags maybe.

OK, the messages here should rather inform the user that with march=native their request to build with e.g. -DCOMPILE_AVX512=off will be essentially ignored if avx512 was detected since the compiler will add it anyway.

In the long run I'd still like to see fat binaries that determine CPU features at run time, so that we can actually deliver binaries that run everywhere with the best code path for the given architecture. For dockerization, I always have to compile with the lowest common set, because I don't know ahead of time what architecture the container will run on.

https://github.com/google/cpu_features is a start.

Worth taking a look at.

We actually might have similar requirements for something like that in the very near future. We might want to sync?

This would be quite time consuming, but very rewarding potentially.

The way we do it in intgemm is that at runtime you have a bunch of function ptrs that get initiated to the the kernel that corresponds to your architecture. We have to do that for all the performance critical marian functions and the make sure that the non-performance critical parts only generate generic x86 instructions.

If you want to be able to distribute binaries that make the most of the available hardware, you either have to maintain a zoo of binaries and educate users how to determine which of the many versions available is the right one for them, or have the software make that decision for them. In that sense it's not only rewarding, but inevitable. We should focus on decoding first (anyone with the technical knowledge to set up training will be competent to compile).

My hunch is that we can replace pre-compiler switches by specialization of (inline) template functions. MKL is currently another obstacle, as it insists on linking to a dynamic system library. It's currently not possible to create a fully static executable even if you know the CPU intrinsics available and or are willing to go with the minimum set of intrinsics required.

…native if there is compiler support

emjotde · 2021-04-09T16:02:18Z

OK, since no one screamed that this is horrible, I am pulling it in.

kpu · 2021-04-09T16:04:58Z

OK, since no one screamed that this is horrible, I am pulling it in.

New motto.

allow to choose fine-grained CPU intrinsics on as CMake options

606d86c

emjotde requested review from ykim362, snukky, kpu, ugermann and XapaJIaMnu April 9, 2021 01:21

emjotde requested a review from aaronpburke April 9, 2021 01:22

ykim362 approved these changes Apr 9, 2021

View reviewed changes

emjotde added 2 commits April 9, 2021 14:52

inform user that e.g. -DCOMPILE_AVX2=off will be ignored with -march=…

161257e

…native if there is compiler support

Merge branch 'master' into mjd/choose

7a50746

emjotde merged commit be65065 into master Apr 9, 2021

This was referenced Apr 15, 2021

Jenkins marian-dev-cuda-10.2 #78 failed #854

Closed

Jenkins marian-dev-cpu-avx512 #122 failed #855

Closed

Jenkins marian-dev-cpu-avx2 #107 failed #857

Closed

jerinphilip mentioned this pull request Apr 22, 2021

Enabling ccache on github builds for Ubuntu browsermt/bergamot-translator#95

Merged

marianminion mentioned this pull request May 25, 2021

Jenkins marian-dev-cpu-clang-8 #105 failed #867

Closed

snukky deleted the mjd/choose branch February 15, 2022 12:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow to choose fine-grained CPU intrinsics on as CMake options #849

Allow to choose fine-grained CPU intrinsics on as CMake options #849

emjotde commented Apr 9, 2021

emjotde commented Apr 9, 2021

ykim362 left a comment

ykim362 Apr 9, 2021

XapaJIaMnu Apr 9, 2021

emjotde Apr 9, 2021

emjotde Apr 9, 2021 •

edited

Loading

ugermann Apr 9, 2021

ugermann Apr 9, 2021

emjotde Apr 9, 2021

emjotde Apr 9, 2021

XapaJIaMnu Apr 12, 2021

ugermann Apr 12, 2021

emjotde commented Apr 9, 2021

kpu commented Apr 9, 2021

		if(BUILD_ARCH STREQUAL "native")
		# @TODO: if we are building "-march=native" anyway is the whole shebang here even useful?

Allow to choose fine-grained CPU intrinsics on as CMake options #849

Allow to choose fine-grained CPU intrinsics on as CMake options #849

Conversation

emjotde commented Apr 9, 2021

Description

Checklist

emjotde commented Apr 9, 2021

ykim362 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

emjotde Apr 9, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

emjotde commented Apr 9, 2021

kpu commented Apr 9, 2021

emjotde Apr 9, 2021 •

edited

Loading