Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

all: add GOAMD64 environment variable #45453

Closed
mdempsky opened this issue Apr 8, 2021 · 41 comments
Closed

all: add GOAMD64 environment variable #45453

mdempsky opened this issue Apr 8, 2021 · 41 comments

Comments

@mdempsky
Copy link
Contributor

mdempsky commented Apr 8, 2021

This proposal is to add a GOAMD64 environment variable, with the initial options of "baseline" (default), "v2", and "v3".

Most Go architectures support a GO[arch] environment variable to control architecture-specific options: GO386, GOARM, GOMIPS, GOMIPS64, GOPPC64, GOWASM. However, the AMD64 port (presumably the most common architecture Go is deployed on) still limits itself to the original, now-20-year-old instruction set, with some occasional runtime CPUID detection when the savings is significant enough to merit it. (For comparison, GOPPC64 supports optimizing for power9, which only became available in 2017.)

This is further complicated by x86-64 having accumulated many, many instruction set extensions, with each processor revision having a different set of supported extensions. Making users responsible for deciding what set of extensions to enable doesn't feel very Go-like.

However, in 2020, the x86-64 psABI added four named microarchitecture levels to help group the extensions: "x86-64 (baseline)", "x86-64-v2", "x86-64-v3", and "x86-64-v4". See https://en.wikipedia.org/wiki/X86-64#Microarchitecture_levels or https://developers.redhat.com/blog/2021/01/05/building-red-hat-enterprise-linux-9-for-the-x86-64-v2-microarchitecture-level/ for further details.

The "baseline" corresponds to what Go already supports, while "v2" and "v3" each add some new instructions that could be useful for Go programs (e.g., POPCNT in v2, BMI1/BMI2 in v3).

v2 CPUs appear commonplace today. E.g., RHEL9 will only support v2, per the above blog post; all GCE CPUs support v2, and I believe all AWS and Azure CPUs too.

v3 CPUs are also increasingly common. E.g., only GCE's Ivy Bridge and Sandy Bridge CPUs are limited to v2; Haswell (launched 2013) and newer support v3.

On issue #25489, I reported results from two optimization attempts at using Haswell's BMI instructions (PEXT for varint decoding, LZCNT and a couple others for scanobject). These are optimizations that could benefit from targeting v3 CPUs specifically, but probably wouldn't be worthwhile if they needed to rely on runtime CPUID detection.

It's also been suggested that at process startup, the Go runtime should throw if it's been compiled to assume instruction set extensions that aren't available on the CPU. I think that's a good idea.

Questions:

  • Are "baseline", "v2", and "v3" the best names? "v1" would perhaps be better than "baseline", but the psABI doesn't formally name it that. We could suggest that though?

  • Should we add "v4" too? This only adds AVX512 instructions, which the Go compiler/runtime don't immediately have any use for, and which seem a bit contentious about whether to use them on current processors anyway.

@gopherbot gopherbot added this to the Proposal milestone Apr 8, 2021
@ianlancetaylor
Copy link
Member

Since v4 is defined I would be inclined to say that we should accept v4 as a valid GOAMD64 value but treat it as v3.

@mdempsky
Copy link
Contributor Author

mdempsky commented Apr 8, 2021

Since v4 is defined I would be inclined to say that we should accept v4 as a valid GOAMD64 value but treat it as v3.

I think it's okay if we accept GOAMD64=v4 and don't actually use any of the AVX512 instructions. But then I think the runtime should probably still check that they're available at runtime, so if we decide to start using AVX512 in the future we won't have to worry about users erroneously running v4 binaries on v3 CPUs.

@martisch
Copy link
Contributor

martisch commented Apr 12, 2021

I would like to note that the Pentium and Celerons (often used in low tier laptops, NUCs and NAS devices) do not support AVX/AVX2 and while based on Haswell and newer (and are categorized with the same architecture names) are v2 and not v3. So for server farms where performance matters v3 is likely a good choice but for general computing even on newer chips v2 is still relevant.

@mvdan
Copy link
Member

mvdan commented Apr 28, 2021

I think it's also worth noting that some mainstream Linux distros are looking at adopting the same microarchitecture levels for their binary packages. For example, Arch will add v3 to their mirrors on top of the existing "baseline": https://gitlab.archlinux.org/archlinux/rfcs/-/merge_requests/2/diffs

Assuming they ship this soon, I imagine any packages building with GCC or LLVM would benefit, and Go packages would be left behind without this proposal.

@rsc rsc changed the title proposal: cmd/*: add GOAMD64 microarchitecture environment variable proposal: all: add GOAMD64 environment variable Apr 28, 2021
@rsc
Copy link
Contributor

rsc commented Apr 28, 2021

It's nice to see Intel and AMD coalescing on fewer configuration knobs.

@rsc
Copy link
Contributor

rsc commented Apr 28, 2021

"baseline" is an unfortunate name because it sounds like "Go's default".
(Compare with the mentions of baseline in discussions of GOEXPERIMENT.)
It does seem like "v1" is the obvious choice for the base configuration.
Maybe someone can suggest that to Intel?

@rsc
Copy link
Contributor

rsc commented Apr 28, 2021

This proposal has been added to the active column of the proposals project
and will now be reviewed at the weekly proposal review meetings.
— rsc for the proposal review group

@beoran
Copy link

beoran commented Apr 29, 2021

While I agree that adding support for different API levels of the Go AMD64 port, I would like v1 to stay the default for Go applications and for the Go compiler itself at least for the next 10 years. My family and I use old refurbished computers with Linux, since that still works fine, and I think there must be many others around the world who are in the situation of not having access to recent hardware.

@rsc
Copy link
Contributor

rsc commented May 5, 2021

@beoran, what the default or minimum requirements are for Go would be different proposals. As I understand it, no one is proposing to change the default or the minimum requirement away from v1 in this issue. This is just about adding an architecture setting similar to GOARM and others.

@rsc
Copy link
Contributor

rsc commented May 5, 2021

If we can call the current baseline "v1" instead of "baseline" then it seems like everyone is on board.
Do I have that right?

@rsc
Copy link
Contributor

rsc commented May 12, 2021

Based on the discussion above, this proposal seems like a likely accept.
— rsc for the proposal review group

@rsc
Copy link
Contributor

rsc commented May 19, 2021

No change in consensus, so accepted. 🎉
This issue now tracks the work of implementing the proposal.
— rsc for the proposal review group

@rsc rsc changed the title proposal: all: add GOAMD64 environment variable all: add GOAMD64 environment variable May 19, 2021
@rsc rsc modified the milestones: Proposal, Backlog May 19, 2021
nimelehin added a commit to nimelehin/go that referenced this issue Sep 13, 2021
The variable represents the microarchitecture levels for which to compile.
Valid values are v1 (default), v2, v3, v4.

Updates golang#45453
gopherbot pushed a commit that referenced this issue Oct 18, 2021
encoding/binary benchmark on my laptop:
name                      old time/op    new time/op     delta
ReadSlice1000Int32s-8       4.42µs ± 5%     4.20µs ± 1%   -4.94%  (p=0.046 n=9+8)
ReadStruct-8                 359ns ± 8%      368ns ± 5%   +2.35%  (p=0.041 n=9+10)
WriteStruct-8                349ns ± 1%      357ns ± 1%   +2.15%  (p=0.000 n=8+10)
ReadInts-8                   235ns ± 1%      233ns ± 1%   -1.01%  (p=0.005 n=10+10)
WriteInts-8                  265ns ± 1%      274ns ± 1%   +3.45%  (p=0.000 n=10+10)
WriteSlice1000Int32s-8      4.61µs ± 5%     4.59µs ± 5%     ~     (p=0.986 n=10+10)
PutUint16-8                 0.56ns ± 4%     0.57ns ± 4%     ~     (p=0.101 n=10+10)
PutUint32-8                 0.83ns ± 2%     0.56ns ± 6%  -32.91%  (p=0.000 n=10+10)
PutUint64-8                 0.81ns ± 3%     0.62ns ± 4%  -23.82%  (p=0.000 n=10+10)
LittleEndianPutUint16-8     0.55ns ± 4%     0.55ns ± 3%     ~     (p=0.926 n=10+10)
LittleEndianPutUint32-8     0.41ns ± 4%     0.42ns ± 3%     ~     (p=0.148 n=10+9)
LittleEndianPutUint64-8     0.55ns ± 2%     0.56ns ± 6%     ~     (p=0.897 n=10+10)
ReadFloats-8                60.4ns ± 4%     59.0ns ± 1%   -2.25%  (p=0.007 n=10+10)
WriteFloats-8               72.3ns ± 2%     71.5ns ± 7%     ~     (p=0.089 n=10+10)
ReadSlice1000Float32s-8     4.21µs ± 3%     4.18µs ± 2%     ~     (p=0.197 n=10+10)
WriteSlice1000Float32s-8    4.61µs ± 2%     4.68µs ± 7%     ~     (p=1.000 n=8+10)
ReadSlice1000Uint8s-8        250ns ± 4%      247ns ± 4%     ~     (p=0.324 n=10+10)
WriteSlice1000Uint8s-8       227ns ± 5%      229ns ± 2%     ~     (p=0.193 n=10+7)
PutUvarint32-8              15.3ns ± 2%     15.4ns ± 4%     ~     (p=0.782 n=10+10)
PutUvarint64-8              38.5ns ± 1%     38.6ns ± 5%     ~     (p=0.396 n=8+10)

name                      old speed      new speed       delta
ReadSlice1000Int32s-8      890MB/s ±17%    953MB/s ± 1%   +7.00%  (p=0.027 n=10+8)
ReadStruct-8               209MB/s ± 8%    204MB/s ± 5%   -2.42%  (p=0.043 n=9+10)
WriteStruct-8              214MB/s ± 3%    210MB/s ± 1%   -1.75%  (p=0.003 n=9+10)
ReadInts-8                 127MB/s ± 1%    129MB/s ± 1%   +1.01%  (p=0.006 n=10+10)
WriteInts-8                113MB/s ± 1%    109MB/s ± 1%   -3.34%  (p=0.000 n=10+10)
WriteSlice1000Int32s-8     868MB/s ± 5%    872MB/s ± 5%     ~     (p=1.000 n=10+10)
PutUint16-8               3.55GB/s ± 4%   3.50GB/s ± 4%     ~     (p=0.093 n=10+10)
PutUint32-8               4.83GB/s ± 2%   7.21GB/s ± 6%  +49.16%  (p=0.000 n=10+10)
PutUint64-8               9.89GB/s ± 3%  12.99GB/s ± 4%  +31.26%  (p=0.000 n=10+10)
LittleEndianPutUint16-8   3.65GB/s ± 4%   3.65GB/s ± 4%     ~     (p=0.912 n=10+10)
LittleEndianPutUint32-8   9.74GB/s ± 3%   9.63GB/s ± 3%     ~     (p=0.222 n=9+9)
LittleEndianPutUint64-8   14.4GB/s ± 2%   14.3GB/s ± 5%     ~     (p=0.912 n=10+10)
ReadFloats-8               199MB/s ± 4%    203MB/s ± 1%   +2.27%  (p=0.007 n=10+10)
WriteFloats-8              166MB/s ± 2%    168MB/s ± 7%     ~     (p=0.089 n=10+10)
ReadSlice1000Float32s-8    949MB/s ± 3%    958MB/s ± 2%     ~     (p=0.218 n=10+10)
WriteSlice1000Float32s-8   867MB/s ± 2%    857MB/s ± 6%     ~     (p=1.000 n=8+10)
ReadSlice1000Uint8s-8     4.00GB/s ± 4%   4.06GB/s ± 4%     ~     (p=0.353 n=10+10)
WriteSlice1000Uint8s-8    4.40GB/s ± 4%   4.36GB/s ± 2%     ~     (p=0.193 n=10+7)
PutUvarint32-8             262MB/s ± 2%    260MB/s ± 4%     ~     (p=0.739 n=10+10)
PutUvarint64-8             208MB/s ± 1%    207MB/s ± 5%     ~     (p=0.408 n=8+10)

Updates #45453

Change-Id: Ifda0d48d54665cef45d46d3aad974062633142c4
Reviewed-on: https://go-review.googlesource.com/c/go/+/354670
Run-TryBot: Alberto Donizetti <[email protected]>
TryBot-Result: Go Bot <[email protected]>
Reviewed-by: Keith Randall <[email protected]>
Trust: Matthew Dempsky <[email protected]>
@dmitshur dmitshur modified the milestones: Backlog, Go1.18 Oct 26, 2021
@dmitshur
Copy link
Contributor

Aside from the TODO to document this new environment variable in the 1.18 release notes, what is left to do for 1.18 here?

@randall77
Copy link
Contributor

I think this is otherwise done. #48506 was the last coding task.

@gopherbot
Copy link
Contributor

Change https://golang.org/cl/364174 mentions this issue: runtime: check GOAMD64 compatibility after setting up TLS

gopherbot pushed a commit that referenced this issue Nov 16, 2021
We need TLS set up to be able to print an error without crashing.

Fixes #49586
Update #45453

Change-Id: I97f0efcd716a8dca614e82ab73f2d855b7277599
Reviewed-on: https://go-review.googlesource.com/c/go/+/364174
Run-TryBot: Keith Randall <[email protected]>
TryBot-Result: Go Bot <[email protected]>
Reviewed-by: Martin Möhrmann <[email protected]>
Trust: Martin Möhrmann <[email protected]>
Trust: Keith Randall <[email protected]>
@gopherbot
Copy link
Contributor

Change https://golang.org/cl/365395 mentions this issue: runtime: check GOAMD64 v4 compatibility on Darwin

@gopherbot
Copy link
Contributor

Change https://go.dev/cl/386116 mentions this issue: dashboard: add linux-amd64-goamd64v3 builder

gopherbot pushed a commit to golang/build that referenced this issue Feb 18, 2022
The purpose of this builder will be to test Go with a non-default
value of the new GOAMD64 environment variable.

For golang/go#48505.
Updates golang/go#45453.

Change-Id: Ic7bf0bd45f3539530ac6540cc3609f87cfdab6f7
Reviewed-on: https://go-review.googlesource.com/c/build/+/386116
Run-TryBot: Dmitri Shuralyov <[email protected]>
TryBot-Result: Gopher Robot <[email protected]>
Reviewed-by: Alex Rakoczy <[email protected]>
Reviewed-by: Matthew Dempsky <[email protected]>
Trust: Dmitri Shuralyov <[email protected]>
gopherbot pushed a commit that referenced this issue Apr 1, 2022
Darwin requires a different approach to check AVX512 support.

Update #45453

Change-Id: Ia3dfecc04b47aab16f472000e92e46d4fc6d596d
Reviewed-on: https://go-review.googlesource.com/c/go/+/365395
Reviewed-by: Keith Randall <[email protected]>
Run-TryBot: Keith Randall <[email protected]>
TryBot-Result: Gopher Robot <[email protected]>
Trust: Heschi Kreinick <[email protected]>
@VVD
Copy link

VVD commented Jul 8, 2022

Core 2 Duo/Quad (Conroe/Kentsfield) support SSE1/2/3/SSSE3, Core 2 Duo/Quad (Penryn) support SSE1/2/3/SSSE3/4.1.
Both support CMPXCHG16B, LAHF and SAHF. And both doesn't support POPCNT.
So both are v1 only, but support almost all instructions from v2 list.
For example clang have even different -target-cpu for both: core2 and penryn.
Can you add more levels?

Thanks.

@randall77
Copy link
Contributor

I don't think we're going to be more granular, sorry. More granular means more complication, more testing, etc.
Core 2 processors were discontinued 10 years ago (https://en.wikipedia.org/wiki/Intel_Core_2). It doesn't seem worth the additional work to make a 10-year-old processor run a tad faster.

@VVD
Copy link

VVD commented Jul 9, 2022

Also I have Pentium 4 630 Prescott (https://ark.intel.com/content/www/us/en/ark/products/27478/intel-pentium-4-processor-630-supporting-ht-technology-2m-cache-3-00-ghz-800-mhz-fsb.html) probably from 1st model line of the Intel x86 CPU with support of 64bit.
It support: CMPXCHG16B and SSE1/2/3.
Doesn't support: SSSE3, SSE4.1/4.2, LAHF, SAHF, POPCNT.

Maybe move CMPXCHG16B and SSE1/2/3 to v1? Or some old AMD CPUs doesn't support these instructions?

@martisch
Copy link
Contributor

martisch commented Jul 9, 2022

One reason to have GOAMD64 is to maximise performance on new CPUs that are likely to be the biggest uses of Go in terms of total compute power while still having the option to compile GO in a way that works on every AMD64 compatible CPU.

It thereby does not give much return to try to optimize a bit more for specific very old CPUs which are likely only a tiny fraction of Go use. I would also think that if performance is of essence the users of these CPUs should upgrade to more modern CPUs which are likely more efficient and faster.

The levels chosen for GOAMD64 are made to match common definitions:
https://en.wikipedia.org/wiki/X86-64#Microarchitecture_levels

As such we should not misalign them with GOs definition of levels which would cause confusion and does not allow GO to align with other compilers to compile e.g. both GO and C at the same target level.

As for v1: it needs to support the minimum standard and the minimum standard dont support some of the features in v2 (e.g. SSE3 was not available on AMD Sledgehammer). We would thereby raise Go's minimum AMD64 requirements and not support them on some AMD64 CPus anymore. That seems a bigger loss and change then to skip some optimizations for slighly newer but in absolute terms still very old CPUs.

@VVD
Copy link

VVD commented Jul 9, 2022

The levels chosen for GOAMD64 are made to match common definitions:
https://en.wikipedia.org/wiki/X86-64#Microarchitecture_levels

Thanks. At least SSE2 is here.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests