-
Notifications
You must be signed in to change notification settings - Fork 17.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
all: add GOAMD64 environment variable #45453
Comments
Since v4 is defined I would be inclined to say that we should accept v4 as a valid |
I think it's okay if we accept GOAMD64=v4 and don't actually use any of the AVX512 instructions. But then I think the runtime should probably still check that they're available at runtime, so if we decide to start using AVX512 in the future we won't have to worry about users erroneously running v4 binaries on v3 CPUs. |
I would like to note that the Pentium and Celerons (often used in low tier laptops, NUCs and NAS devices) do not support AVX/AVX2 and while based on Haswell and newer (and are categorized with the same architecture names) are v2 and not v3. So for server farms where performance matters v3 is likely a good choice but for general computing even on newer chips v2 is still relevant. |
I think it's also worth noting that some mainstream Linux distros are looking at adopting the same microarchitecture levels for their binary packages. For example, Arch will add v3 to their mirrors on top of the existing "baseline": https://gitlab.archlinux.org/archlinux/rfcs/-/merge_requests/2/diffs Assuming they ship this soon, I imagine any packages building with GCC or LLVM would benefit, and Go packages would be left behind without this proposal. |
It's nice to see Intel and AMD coalescing on fewer configuration knobs. |
"baseline" is an unfortunate name because it sounds like "Go's default". |
This proposal has been added to the active column of the proposals project |
While I agree that adding support for different API levels of the Go AMD64 port, I would like v1 to stay the default for Go applications and for the Go compiler itself at least for the next 10 years. My family and I use old refurbished computers with Linux, since that still works fine, and I think there must be many others around the world who are in the situation of not having access to recent hardware. |
@beoran, what the default or minimum requirements are for Go would be different proposals. As I understand it, no one is proposing to change the default or the minimum requirement away from v1 in this issue. This is just about adding an architecture setting similar to GOARM and others. |
If we can call the current baseline "v1" instead of "baseline" then it seems like everyone is on board. |
Based on the discussion above, this proposal seems like a likely accept. |
No change in consensus, so accepted. 🎉 |
The variable represents the microarchitecture levels for which to compile. Valid values are v1 (default), v2, v3, v4. Updates golang#45453
encoding/binary benchmark on my laptop: name old time/op new time/op delta ReadSlice1000Int32s-8 4.42µs ± 5% 4.20µs ± 1% -4.94% (p=0.046 n=9+8) ReadStruct-8 359ns ± 8% 368ns ± 5% +2.35% (p=0.041 n=9+10) WriteStruct-8 349ns ± 1% 357ns ± 1% +2.15% (p=0.000 n=8+10) ReadInts-8 235ns ± 1% 233ns ± 1% -1.01% (p=0.005 n=10+10) WriteInts-8 265ns ± 1% 274ns ± 1% +3.45% (p=0.000 n=10+10) WriteSlice1000Int32s-8 4.61µs ± 5% 4.59µs ± 5% ~ (p=0.986 n=10+10) PutUint16-8 0.56ns ± 4% 0.57ns ± 4% ~ (p=0.101 n=10+10) PutUint32-8 0.83ns ± 2% 0.56ns ± 6% -32.91% (p=0.000 n=10+10) PutUint64-8 0.81ns ± 3% 0.62ns ± 4% -23.82% (p=0.000 n=10+10) LittleEndianPutUint16-8 0.55ns ± 4% 0.55ns ± 3% ~ (p=0.926 n=10+10) LittleEndianPutUint32-8 0.41ns ± 4% 0.42ns ± 3% ~ (p=0.148 n=10+9) LittleEndianPutUint64-8 0.55ns ± 2% 0.56ns ± 6% ~ (p=0.897 n=10+10) ReadFloats-8 60.4ns ± 4% 59.0ns ± 1% -2.25% (p=0.007 n=10+10) WriteFloats-8 72.3ns ± 2% 71.5ns ± 7% ~ (p=0.089 n=10+10) ReadSlice1000Float32s-8 4.21µs ± 3% 4.18µs ± 2% ~ (p=0.197 n=10+10) WriteSlice1000Float32s-8 4.61µs ± 2% 4.68µs ± 7% ~ (p=1.000 n=8+10) ReadSlice1000Uint8s-8 250ns ± 4% 247ns ± 4% ~ (p=0.324 n=10+10) WriteSlice1000Uint8s-8 227ns ± 5% 229ns ± 2% ~ (p=0.193 n=10+7) PutUvarint32-8 15.3ns ± 2% 15.4ns ± 4% ~ (p=0.782 n=10+10) PutUvarint64-8 38.5ns ± 1% 38.6ns ± 5% ~ (p=0.396 n=8+10) name old speed new speed delta ReadSlice1000Int32s-8 890MB/s ±17% 953MB/s ± 1% +7.00% (p=0.027 n=10+8) ReadStruct-8 209MB/s ± 8% 204MB/s ± 5% -2.42% (p=0.043 n=9+10) WriteStruct-8 214MB/s ± 3% 210MB/s ± 1% -1.75% (p=0.003 n=9+10) ReadInts-8 127MB/s ± 1% 129MB/s ± 1% +1.01% (p=0.006 n=10+10) WriteInts-8 113MB/s ± 1% 109MB/s ± 1% -3.34% (p=0.000 n=10+10) WriteSlice1000Int32s-8 868MB/s ± 5% 872MB/s ± 5% ~ (p=1.000 n=10+10) PutUint16-8 3.55GB/s ± 4% 3.50GB/s ± 4% ~ (p=0.093 n=10+10) PutUint32-8 4.83GB/s ± 2% 7.21GB/s ± 6% +49.16% (p=0.000 n=10+10) PutUint64-8 9.89GB/s ± 3% 12.99GB/s ± 4% +31.26% (p=0.000 n=10+10) LittleEndianPutUint16-8 3.65GB/s ± 4% 3.65GB/s ± 4% ~ (p=0.912 n=10+10) LittleEndianPutUint32-8 9.74GB/s ± 3% 9.63GB/s ± 3% ~ (p=0.222 n=9+9) LittleEndianPutUint64-8 14.4GB/s ± 2% 14.3GB/s ± 5% ~ (p=0.912 n=10+10) ReadFloats-8 199MB/s ± 4% 203MB/s ± 1% +2.27% (p=0.007 n=10+10) WriteFloats-8 166MB/s ± 2% 168MB/s ± 7% ~ (p=0.089 n=10+10) ReadSlice1000Float32s-8 949MB/s ± 3% 958MB/s ± 2% ~ (p=0.218 n=10+10) WriteSlice1000Float32s-8 867MB/s ± 2% 857MB/s ± 6% ~ (p=1.000 n=8+10) ReadSlice1000Uint8s-8 4.00GB/s ± 4% 4.06GB/s ± 4% ~ (p=0.353 n=10+10) WriteSlice1000Uint8s-8 4.40GB/s ± 4% 4.36GB/s ± 2% ~ (p=0.193 n=10+7) PutUvarint32-8 262MB/s ± 2% 260MB/s ± 4% ~ (p=0.739 n=10+10) PutUvarint64-8 208MB/s ± 1% 207MB/s ± 5% ~ (p=0.408 n=8+10) Updates #45453 Change-Id: Ifda0d48d54665cef45d46d3aad974062633142c4 Reviewed-on: https://go-review.googlesource.com/c/go/+/354670 Run-TryBot: Alberto Donizetti <[email protected]> TryBot-Result: Go Bot <[email protected]> Reviewed-by: Keith Randall <[email protected]> Trust: Matthew Dempsky <[email protected]>
Aside from the TODO to document this new environment variable in the 1.18 release notes, what is left to do for 1.18 here? |
I think this is otherwise done. #48506 was the last coding task. |
Change https://golang.org/cl/364174 mentions this issue: |
We need TLS set up to be able to print an error without crashing. Fixes #49586 Update #45453 Change-Id: I97f0efcd716a8dca614e82ab73f2d855b7277599 Reviewed-on: https://go-review.googlesource.com/c/go/+/364174 Run-TryBot: Keith Randall <[email protected]> TryBot-Result: Go Bot <[email protected]> Reviewed-by: Martin Möhrmann <[email protected]> Trust: Martin Möhrmann <[email protected]> Trust: Keith Randall <[email protected]>
Change https://golang.org/cl/365395 mentions this issue: |
Change https://go.dev/cl/386116 mentions this issue: |
The purpose of this builder will be to test Go with a non-default value of the new GOAMD64 environment variable. For golang/go#48505. Updates golang/go#45453. Change-Id: Ic7bf0bd45f3539530ac6540cc3609f87cfdab6f7 Reviewed-on: https://go-review.googlesource.com/c/build/+/386116 Run-TryBot: Dmitri Shuralyov <[email protected]> TryBot-Result: Gopher Robot <[email protected]> Reviewed-by: Alex Rakoczy <[email protected]> Reviewed-by: Matthew Dempsky <[email protected]> Trust: Dmitri Shuralyov <[email protected]>
Darwin requires a different approach to check AVX512 support. Update #45453 Change-Id: Ia3dfecc04b47aab16f472000e92e46d4fc6d596d Reviewed-on: https://go-review.googlesource.com/c/go/+/365395 Reviewed-by: Keith Randall <[email protected]> Run-TryBot: Keith Randall <[email protected]> TryBot-Result: Gopher Robot <[email protected]> Trust: Heschi Kreinick <[email protected]>
Core 2 Duo/Quad (Conroe/Kentsfield) support SSE1/2/3/SSSE3, Core 2 Duo/Quad (Penryn) support SSE1/2/3/SSSE3/4.1. Thanks. |
I don't think we're going to be more granular, sorry. More granular means more complication, more testing, etc. |
Also I have Pentium 4 630 Prescott (https://ark.intel.com/content/www/us/en/ark/products/27478/intel-pentium-4-processor-630-supporting-ht-technology-2m-cache-3-00-ghz-800-mhz-fsb.html) probably from 1st model line of the Intel x86 CPU with support of 64bit. Maybe move CMPXCHG16B and SSE1/2/3 to v1? Or some old AMD CPUs doesn't support these instructions? |
One reason to have GOAMD64 is to maximise performance on new CPUs that are likely to be the biggest uses of Go in terms of total compute power while still having the option to compile GO in a way that works on every AMD64 compatible CPU. It thereby does not give much return to try to optimize a bit more for specific very old CPUs which are likely only a tiny fraction of Go use. I would also think that if performance is of essence the users of these CPUs should upgrade to more modern CPUs which are likely more efficient and faster. The levels chosen for GOAMD64 are made to match common definitions: As such we should not misalign them with GOs definition of levels which would cause confusion and does not allow GO to align with other compilers to compile e.g. both GO and C at the same target level. As for v1: it needs to support the minimum standard and the minimum standard dont support some of the features in v2 (e.g. SSE3 was not available on AMD Sledgehammer). We would thereby raise Go's minimum AMD64 requirements and not support them on some AMD64 CPus anymore. That seems a bigger loss and change then to skip some optimizations for slighly newer but in absolute terms still very old CPUs. |
Thanks. At least SSE2 is here. |
This proposal is to add a GOAMD64 environment variable, with the initial options of "baseline" (default), "v2", and "v3".
Most Go architectures support a GO[arch] environment variable to control architecture-specific options: GO386, GOARM, GOMIPS, GOMIPS64, GOPPC64, GOWASM. However, the AMD64 port (presumably the most common architecture Go is deployed on) still limits itself to the original, now-20-year-old instruction set, with some occasional runtime CPUID detection when the savings is significant enough to merit it. (For comparison, GOPPC64 supports optimizing for power9, which only became available in 2017.)
This is further complicated by x86-64 having accumulated many, many instruction set extensions, with each processor revision having a different set of supported extensions. Making users responsible for deciding what set of extensions to enable doesn't feel very Go-like.
However, in 2020, the x86-64 psABI added four named microarchitecture levels to help group the extensions: "x86-64 (baseline)", "x86-64-v2", "x86-64-v3", and "x86-64-v4". See https://en.wikipedia.org/wiki/X86-64#Microarchitecture_levels or https://developers.redhat.com/blog/2021/01/05/building-red-hat-enterprise-linux-9-for-the-x86-64-v2-microarchitecture-level/ for further details.
The "baseline" corresponds to what Go already supports, while "v2" and "v3" each add some new instructions that could be useful for Go programs (e.g., POPCNT in v2, BMI1/BMI2 in v3).
v2 CPUs appear commonplace today. E.g., RHEL9 will only support v2, per the above blog post; all GCE CPUs support v2, and I believe all AWS and Azure CPUs too.
v3 CPUs are also increasingly common. E.g., only GCE's Ivy Bridge and Sandy Bridge CPUs are limited to v2; Haswell (launched 2013) and newer support v3.
On issue #25489, I reported results from two optimization attempts at using Haswell's BMI instructions (PEXT for varint decoding, LZCNT and a couple others for scanobject). These are optimizations that could benefit from targeting v3 CPUs specifically, but probably wouldn't be worthwhile if they needed to rely on runtime CPUID detection.
It's also been suggested that at process startup, the Go runtime should throw if it's been compiled to assume instruction set extensions that aren't available on the CPU. I think that's a good idea.
Questions:
Are "baseline", "v2", and "v3" the best names? "v1" would perhaps be better than "baseline", but the psABI doesn't formally name it that. We could suggest that though?
Should we add "v4" too? This only adds AVX512 instructions, which the Go compiler/runtime don't immediately have any use for, and which seem a bit contentious about whether to use them on current processors anyway.
The text was updated successfully, but these errors were encountered: