-
Notifications
You must be signed in to change notification settings - Fork 203
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARM: GCC optimal/generic architecture compiler flags #1974
Conversation
systemtools.POWER: 'mcpu=generic-arch', # no support for -march on POWER | ||
systemtools.AARCH32: 'mcpu=generic-armv7', # implies -march=armv7 and -mtune=generic-armv7 | ||
systemtools.AARCH64: 'mcpu=generic', # implies -march=armv8-a and -mtune=generic | ||
systemtools.POWER: 'mcpu=generic-arch', # no support for -march on POWER |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@JackPerdue Did you ever test this? I could not find any indication in the GCC manuals that generic-arch
is a valid cpu_type value for the -mcpu
option on POWER...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pescobar ping on the question above?
A quick test rebuilding HPL-2.1-foss-2016b.eb with --optarch=GENERIC shows a definite fail. I'm not used to using the GENERIC flag but, yes, GCC won't work on Power as is. |
-mcpu=native -mtune=native might be the best way to go for now if the GENERIC needs to be differentiated in some way... but generic-arch is right out (as you suspected) |
gcc: note: valid arguments to ‘-mtune=’ are: 401 403 405 405fp 440 440fp 464 464fp 476 476fp 505 601 602 603 603e 604 604e 620 630 740 7400 7450 750 801 821 823 8540 8548 860 970 G3 G4 G5 a2 cell e300c2 e300c3 e500mc e500mc64 e5500 e6500 ec603e native power3 power4 power5 power5+ power6 power6x power7 power8 powerpc powerpc64 rs64 titan I don't have any suggestions beyond those not understanding what "generic" means on Power. |
@@ -84,14 +84,12 @@ class Clang(Compiler): | |||
|
|||
# used when 'optarch' toolchain option is enabled (and --optarch is not specified) | |||
COMPILER_OPTIMAL_ARCHITECTURE_OPTION = { | |||
systemtools.INTEL : 'march=native', | |||
systemtools.AMD : 'march=native', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@geimer it seems unwise to drop these?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, wait, this relates to your change in compiler.py
where you define self.arch
differently...
# Construct -mcpu option | ||
option = 'mcpu=%s' % '.'.join([i[1] for i in sorted_core_types]) | ||
|
||
self.COMPILER_OPTIMAL_ARCHITECTURE_OPTION[systemtools.AARCH64] = option |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@geimer don't we want some kind of warning in case option
is an empty string here?
# big.LITTLE setups: sort core types to have big core (higher model number) first | ||
core_types = [core.strip().lower() for core in cpu_model.split('+')] | ||
sorted_core_types = sorted([(int(re.search('\d+', i).group(0)), i) for i in core_types], | ||
reverse=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there's lots that could go wrong here, and there's no error handling whatsoever; aren't we making too much assumption here on what get_cpu_model
returns?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@boegel To be honest, I copy/paste-engineered this based on some code snippet I found on the web (StackOverflow, IIRC). I have no clue how to add error handling here...
Regarding your concerns about assumptions: This is certainly an issue, yes. However, I currently see no other way to optimize for the host architecture if -mcpu=native
is not supported. If you have any better suggestions, please let me know.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, it's mostly a case of error handling: what if our assumption is wrong? Then we should fall back to not picking a value for option
(which we should rename to optarch_opt
here?).
The only real issue/assumption you're making here is that i) get_cpu_model()
returns something that starts with ARM
, and ii) that we can find find a number in every item in core_types
.
So, I think this should become something like this:
optarch_opt = ''
cpu_vendor, cpu_model = systemtools.get_cpu_vendor(), systemtools.get_cpu_model()
if cpu_vendor == systemtools.ARM and cpu_model.startswith('ARM '):
self.log.debug("Determining architecture-specific compiler optimization flag for ARM (model: %s)", cpu_model)
core_types = []
for core_type in [ct.strip().lower() for ct in cpu_model[4:].split('+')]:
# determine numeric id for each core type, since we need to sort them later numerically
res = re.search('\d+$', core_type) # note: numeric ID is expected at the end
if res:
core_id = int(res.group(0))
core_types.append((core_id, core_type))
self.log.debug("Extracted numeric ID for ARM core type '%s': %s", core_type, core_id)
else:
# if we can't determine numeric id, bail out
core_types = None
self.log.debug("Failed to extract numeric ID for ARM core type '%s', bailing out", core_type)
break
if core_types:
# example: 'mcpu=cortex-a72.cortex-a53' for "ARM Cortex-A53 + Cortex-A72"
optarch_opt = 'mcpu=%s' '.'.join([ct[1] for ct in sorted(core_types, reverse=True)])
self.log.debug("Using architecture-specific compiler optimization flag '%s'", optarch_opt)
self.COMPILER_OPTIMAL_ARCHITECTURE_OPTION[systemtools.AARCH64] = option
sorted_core_types = sorted([(int(re.search('\d+', i).group(0)), i) for i in core_types], | ||
reverse=True) | ||
# Construct -mcpu option | ||
option = 'mcpu=%s' % '.'.join([i[1] for i in sorted_core_types]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you provide a couple of examples in a comment of how what value would be specified to mcpu
with this approach (just in case we get something entirely different at some point)?
@@ -146,8 +146,11 @@ def set_options(self, options): | |||
|
|||
def set_variables(self): | |||
"""Set the variables""" | |||
if self.arch is None: | |||
self.arch = systemtools.get_cpu_architecture() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, aren't there any other locations where we rely on the .arch
attribute of a Toolchain
instance (potentially) that we should worry about?
playing the devil's advocate here, since this approach makes more sense anyway (defining self.arch
with the result of get_cpu_family
was already an indication that something wasn't 100% right...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@boegel As far as I could figure out using grep -r '\.arch'
on framework, easyblocks, and easyconfigs, Compiler.arch
(it's not defined in the Toolchain
base class) is only used in tools/toolchain/compiler.py
, in the toolchain unit tests, and now in toolchains/compiler/gcc.py
.
BTW: Is there any reason why it isn't set in the constructor already? In this case, lazy initialization doesn't make much sense to me. But I may be overlooking something.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not seeing any other use of .arch
elsewhere is good, but we don't control all easyblocks out there, and people have scripts around that hook into the EasyBuild framework, who knows...
That shouldn't hold us back from making this change though, since it makes a whole lot of sense, and if anyone relies on .arch
somewhere, they probably shouldn't be anyway (they should call the appropriate function from systemtools
instead).
I see no reason either for lazy init of self.arch
, so feel free to change that.
@@ -339,6 +353,20 @@ def test_optarch_generic(self): | |||
else: | |||
self.assertFalse(generic_flags in tc.get_variable(var)) | |||
|
|||
def test_optarch_aarch64_heuristic(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@geimer don't we want to test the default behaviour for AARCH64
as well here?
you can put a dummy module in place for GCC 6.x to check that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@boegel This is causing some trouble I don't understand... I've added a module file for GCCcore 6.2.0 and added
tc = self.get_toolchain("GCCcore", version="6.2.0")
tc.prepare()
self.assertEqual(...)
at the end of this test. But it is still giving me the modified mcpu
flag based on the heuristic, even though it is definitely not taking the "modify option with heuristic" branch in _set_optimal_architecture
. Why don't I get a fresh instance with COMPILER_OPTIMAL_ARCHITECTURE_OPTION
initialized to the defaults???
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@geimer Hmm, you should get a fresh instance... Do make sure you also call tc.set_options()
before tc.prepare()
though, even if you're not setting any custom toolchain options. Maybe that explains it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@boegel Nope. Here is my current test (with a GCCcore/6.2.0
module in place):
def test_optarch_aarch64_heuristic(self):
"""Test whether AArch64 pre-GCC-6 optimal architecture flag heuristic works."""
st.get_cpu_architecture = lambda: st.AARCH64
st.get_cpu_model = lambda: 'ARM Cortex-A53'
st.get_cpu_vendor = lambda: st.ARM
tc = self.get_toolchain("GCC", version="4.7.2")
tc.prepare()
self.assertEqual(tc.COMPILER_OPTIMAL_ARCHITECTURE_OPTION[tc.arch], 'mcpu=cortex-a53')
st.get_cpu_model = lambda: 'ARM Cortex-A53 + Cortex-A72'
tc = self.get_toolchain("GCC", version="4.7.2")
tc.prepare()
self.assertEqual(tc.COMPILER_OPTIMAL_ARCHITECTURE_OPTION[tc.arch], 'mcpu=cortex-a72.cortex-a53')
tc = self.get_toolchain("GCCcore", version="6.2.0")
tc.prepare()
self.assertEqual(tc.COMPILER_OPTIMAL_ARCHITECTURE_OPTION[tc.arch], 'mcpu=native')
I've added
print(self.COMPILER_OPTIMAL_ARCHITECTURE_OPTION[systemtools.AARCH64])
to _set_optimal_architecture
in easybuild-framework/toolchains/compiler/gcc.py
right at the beginning after the check that we're on AArch64 (i.e., before modifying the default value). And it prints
mcpu=native
mcpu=cortex-a53
mcpu=cortex-a72.cortex-a53
Calling tc.set_options({})
right before tc.prepare()
in the test doesn't change anything. So it keeps the modified value even when creating a new instance. What am I doing wrong here? I'm confused...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@geimer Taking a look at this right now, I'll keep you posted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for completeness sake: this problem was fixed in geimer#12
@@ -71,13 +71,11 @@ class IntelIccIfort(Compiler): | |||
|
|||
# used when 'optarch' toolchain option is enabled (and --optarch is not specified) | |||
COMPILER_OPTIMAL_ARCHITECTURE_OPTION = { | |||
systemtools.INTEL : 'xHost', | |||
systemtools.AMD : 'xHost', | |||
systemtools.X86_64: 'xHost', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@geimer hmm, I'm not so sure we want to drop the distinction between AMD
and INTEL
here... right now, it doesn't matter, but it may sometime in the future?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@boegel Hmm... for ARM, one needs to distinguish by architecture (as both AArch32 and AArch64 are of CPU family ARM
), for X64_64 you want a distinction by CPU family. I can see various ways to address this:
- Use the architecture as dictionary key (like now) and overwrite the value in
_set_optimal_architecture
if needed - Do it exactly the other way around (family as key, overwrite on architecture)
- Use a tuple
(architecture, family)
as dictionary key
If you really want to keep the family distinction, I'm leaning towards the last option. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Going with the latter makes sense to me!
@geimer Looks like this will have to be moved to the |
@boegel I still have no clue why this unit test is failing. I expect to get a fresh |
…dating class constant
Again... I don't know who added mcpu=generic-arch for POWER or why and I have no idea what GENERIC is supposed to be but --optarch=GENERIC on Power7 is broken in 3.0.... there is no "generic-arch" option for mcpu (see above). |
@JackPerdue yeah, the plan is to fix that in this PR as well, since we're fiddling in that area. Not quite sure what We could just spit out an error stating this is not supported on POWER. |
pass down default optarch value for AAarch64 & GCC 6.x rather than updating class constant
@JackPerdue, @boegel: Looking through the GCC manuals of various versions, these may be reasonable values for POWER:
As far as I can see, |
tc = self.get_toolchain("GCCcore", version="6.2.0") | ||
tc.set_options({}) | ||
tc.prepare() | ||
self.assertEqual(tc.COMPILER_OPTIMAL_ARCHITECTURE_OPTION[tc.arch], 'mcpu=native') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it OK to compare with tc.COMPILER_OPTIMAL_ARCHITECTURE_OPTION
here? In the other cases we compare against tc.options.options_map['optarch']
.
tc = self.get_toolchain("GCCcore", version="6.2.0") | ||
tc.set_options({}) | ||
tc.prepare() | ||
self.assertEqual(tc.COMPILER_OPTIMAL_ARCHITECTURE_OPTION[(tc.arch, tc.cpu_family)], 'mcpu=native') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm... One should refresh the page before commenting ;-) So, again:
Is it OK to compare with tc.COMPILER_OPTIMAL_ARCHITECTURE_OPTION here? In the other cases we compare against tc.options.options_map['optarch'].
@boegel The failing test seems to be due to an inconsistent behavior of TCL modules. Any suggestions how to deal with this? Edit: Maybe run |
(currently not required due to the way it is implemented, but may cause subtle errors in the future if the implementation changes)
rather than tc.COMPILER_OPTIMAL_ARCHITECTURE_OPTION
@boegel I believe this PR is ready for another review. However, there is still three issues:
|
…vironmentModulesC modules tools
fix broken test_avail, GCCcore is only included in result for Lmod/EnvironmentModulesC modules tools
(systemtools.AARCH64, systemtools.ARM): 'mcpu=native', # since GCC 6; implies -march=native and -mtune=native | ||
(systemtools.POWER, systemtools.POWER): 'mcpu=native', # no support for -march on POWER; implies -mtune=native | ||
(systemtools.X86_64, systemtools.AMD): 'march=native', # implies -mtune=native | ||
(systemtools.X86_64, systemtools.INTEL): 'march=native', # implies -mtune=native |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@geimer this makes me wonder, should we also be using -mcpu=native
on x86_84, since that implies both -march=native
and -mtune=native
for AARCH32?
Which one is closer to Intel's -xHost
, mcpu=native
or -march=native
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@boegel: These kind of options are a mess in GCC, as they are inconsistent between architectures... -march=native
is the way to go on x86_64.
On x86_64, -march=native
implies -mtune=native
, but -mcpu
is a deprecated synonym for -mtune
. So using -mcpu=native
would only tune for the host, but not enable all CPU features (as -march
does).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@geimer That sounds like a mess indeed, thanks for clarifying!
…t_optimal_architecture in gcc.py
# On AArch64, -mcpu=native is not supported prior to GCC 6. In this case, try to optimize for the detected | ||
# CPU model (vanilla ARM cores only). Note that this heuristic may fail if the CPU model is not supported | ||
# by the GCC version being used. | ||
gcc_version = self.get_software_version(self.COMPILER_MODULE_NAME)[0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@geimer: did you ever test this with a GCC
module that is actually a bundle of GCCcore
and binutils
?
As far as I can tell, grabbing the GCC version will fail in that case, as least with this current implementation (which only takes into account GCC
as module name)...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see why this worked: we usually make sure in the GCC
bundles that $EBROOTGCCCORE
and $EBVERSIONGCCCORE
are also defined for the GCCcore
module, by defining altroot
/altversion
in the GCC
bundle easyconfig, see https://github.com/hpcugent/easybuild-easyconfigs/blob/master/easybuild/easyconfigs/g/GCC/GCC-6.2.0-2.27.eb#L21
However, it would be better to take both GCCcore
(first) and GCC
(second)into account for determining the GCC version, like we already do for the installation prefix in
_set_compiler_vars`.
Are you up for making that change @geimer?
edit: $EBROOTGCC
and $EBVERSIONGCC
=> $EBROOTGCCCORE
and $EBVERSIONGCCCORE
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@boegel: I have to admit that I don't remember whether I tested this with a GCC
bundle. But...
As far as I can see, self.COMPILER_MODULE_NAME
corresponds to GCC
when using the GCC
toolchain, and to GCCcore
when using this toolchain. So it should actually do the right thing, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, no, since when GCC
is being used as a toolchain, the compiler is actually provided via GCCcore
, so we should grab the version directly from there (without relying on a GCC
bundle also defining $EBROOTGCCCORE
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@boegel Just to clarify: You are proposing to use get_software_version
from easybuild.tools.modules
here rather than self.get_software_version
, right? I don't think I understand the difference right now (it's been a long day...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, I propose to try obtaining the version using GCCcore
first, and fall back to use GCC
if that doesn't work. Whether you use self.get_software_version
or get_software_version
doesn't really matter here, that's only relevant when multiple modules are in play (self.get_software_version
has a way of dealing with that).
So, something like this:
gcc_ver = get_software_version('GCCcore')
if gcc_ver is None:
gcc_ver = get_software_version('GCC')
(switching to get_software_version
makes sense here since we know we'll never be dealing with multiple names at the same time here)
cpu_vendor = systemtools.get_cpu_vendor() | ||
cpu_model = systemtools.get_cpu_model() | ||
if cpu_vendor == systemtools.ARM and cpu_model.startswith('ARM '): | ||
self.log.debug("Determining architecture-specific compiler optimization flag for ARM (model: %s)", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# On AArch64, -mcpu=native is not supported prior to GCC 6. In this case, try to optimize for the detected | ||
# CPU model (vanilla ARM cores only). Note that this heuristic may fail if the CPU model is not supported | ||
# by the GCC version being used. | ||
gcc_version = self.get_software_version(self.COMPILER_MODULE_NAME)[0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see why this worked: we usually make sure in the GCC
bundles that $EBROOTGCCCORE
and $EBVERSIONGCCCORE
are also defined for the GCCcore
module, by defining altroot
/altversion
in the GCC
bundle easyconfig, see https://github.com/hpcugent/easybuild-easyconfigs/blob/master/easybuild/easyconfigs/g/GCC/GCC-6.2.0-2.27.eb#L21
However, it would be better to take both GCCcore
(first) and GCC
(second)into account for determining the GCC version, like we already do for the installation prefix in
_set_compiler_vars`.
Are you up for making that change @geimer?
edit: $EBROOTGCC
and $EBVERSIONGCC
=> $EBROOTGCCCORE
and $EBVERSIONGCCCORE
flesh out _guess_arm_aarch64_default_optarch method to lighten up _set_optimal_architecture in gcc.py
@geimer I don't mind at all, I only really care about the test result of the last commit; but it's good to know Travis screws up if you push multiple commits quickly enough. Going in, thanks! |
This PR adds support for optimal/generic architecture compiler flags for the GCC compiler. This required basing the flag selection on the CPU architecture rather than the CPU family.
AArch32: The generic option assumes an ARMv7 platform. Using EasyBuild on older ARM architectures probably isn't of interest anyhow.
AArch64: Since the
-mcpu=native
option is only supported since GCC 6, a simple heuristic to target the detected CPU model for vanilla ARM cores has been implemented for older compilers. This is a "best effort" approach that may fail if the actual compiler does not (yet) support the corresponding CPU model. However, in this case the user still has the option to manually specify a reasonable option via--optarch=...
.