Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/go: slow "native" performance with Mac OS X 10.14.1 and 10.12.6 #28739

Closed
myitcv opened this issue Nov 12, 2018 · 19 comments
Closed

cmd/go: slow "native" performance with Mac OS X 10.14.1 and 10.12.6 #28739

myitcv opened this issue Nov 12, 2018 · 19 comments
Labels
FrozenDueToAge GoCommand cmd/go NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. OS-Darwin Performance
Milestone

Comments

@myitcv
Copy link
Member

myitcv commented Nov 12, 2018

Following a debug session with @fatih

What version of Go are you using (go version)?

go version go1.11.2 darwin/amd64

# and

go version go1.11.2 linux/amd64

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

go env Output
GOARCH="amd64"
GOBIN="/Users/fatih/go/bin"
GOCACHE="/Users/fatih/Library/Caches/go-build"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="darwin"
GOOS="darwin"
GOPATH="/Users/fatih/go"
GOPROXY=""
GORACE=""
GOROOT="/usr/local/Cellar/go/1.11.2/libexec"
GOTMPDIR=""
GOTOOLDIR="/usr/local/Cellar/go/1.11.2/libexec/pkg/tool/darwin_amd64"
GCCGO="gccgo"
CC="clang"
CXX="clang++"
CGO_ENABLED="1"
GOMOD=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/k_/87syx3r50m93m72hvqj2qqlw0000gn/T/go-build130848285=/tmp/go-build -gno-record-gcc-switches -fno-common"

and

GOARCH="amd64"
GOBIN=""
GOCACHE="/root/.cache/go-build"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOOS="linux"
GOPATH="/go"
GOPROXY=""
GORACE=""
GOROOT="/usr/local/go"
GOTMPDIR=""
GOTOOLDIR="/usr/local/go/pkg/tool/linux_amd64"
GCCGO="gccgo"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build416231211=/tmp/go-build -gno-record-gcc-switches"

What did you do?

The symptom that @fatih and I were investigating was godef performing ~2x more slowly on his fast machine than on my slower machine.

@fatih's machine specs for reference:

 Model Name:    iMac Pro
  Model Identifier:    iMacPro1,1
  Processor Name:    Intel Xeon W
  Processor Speed:    3 GHz
  Number of Processors:    1
  Total Number of Cores:    10
  L2 Cache (per Core):    1 MB
  L3 Cache:    13.8 MB
  Memory:    64 GB

OS X 10.14.1 (18B75)

We think we have a smaller reproduction that demonstrates what appears to be an OS X 10.14.n issue that may or may not be related to Go. But go list appears to be a good way to demonstrate the problem and hopefully therefore a good place to start further investigation.

The following was run in a Terminal on @fatih's setup, and then within a Docker container on the same machine. Run-times of the go list commands were compared:

/bin/bash
(
set -eux
export GOPATH=$(mktemp -d)
command cd $(mktemp -d)
git clone https://github.com/digitalocean/csi-digitalocean
command cd csi-digitalocean/
git checkout b9ed6c2ea9ad85c5e4956bdfe418940bb5aa883a
rm go.sum
go mod tidy
command cd driver
time go list -export -deps -json -e . > /dev/null 2>&1
time go list -export -deps -json -e . > /dev/null 2>&1
time go list -export -deps -json -e . > /dev/null 2>&1
time go list -export -deps -json -e . > /dev/null 2>&1
) 2>&1 | tee output.txt

The "native" terminal run shows "average" speeds of ~300ms:

+ go list -export -deps -json -e .
real    0m1.524s
user    0m7.224s
sys 0m2.391s
+ go list -export -deps -json -e .
real    0m0.297s
user    0m0.420s
sys 0m0.898s
+ go list -export -deps -json -e .
real    0m0.300s
user    0m0.403s
sys 0m0.919s
+ go list -export -deps -json -e .
real    0m0.308s
user    0m0.402s
sys 0m0.968s

Whereas the Docker run shows "average" speeds of ~180ms:

+ go list -export -deps -json -e .

real    0m4.837s
user    0m12.070s
sys     0m5.030s
+ go list -export -deps -json -e .

real    0m0.186s
user    0m0.180s
sys     0m0.110s
+ go list -export -deps -json -e .

real    0m0.165s
user    0m0.160s
sys     0m0.120s
+ go list -export -deps -json -e .

real    0m0.187s
user    0m0.190s
sys     0m0.100s

What did you expect to see?

Similar run-times in each, possibly even faster times in the "native" OS X environment.

What did you see instead?

The "native" run-times are almost 70% longer.

cc @bcmills (and FYI @ianthehat for any go/packages side-effects)

@myitcv myitcv added NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. GoCommand cmd/go labels Nov 12, 2018
@marwan-at-work
Copy link
Contributor

marwan-at-work commented Nov 13, 2018

Here's what I got from running the above script on a 13' macOS (seirra 10.12.6)

Without Docker

+ godef -o 19358 -f controller.go
/var/folders/n4/35bgdnkd6vlgqrdzbyl0x1ycmhcndx/T/tmp.DVAXjerX/pkg/mod/github.com/container-storage-interface/[email protected]/lib/go/csi/v0/csi.pb.go:1233:1

real    0m10.234s
user    0m8.298s
sys    0m1.270s
+ godef -o 19358 -f controller.go
/var/folders/n4/35bgdnkd6vlgqrdzbyl0x1ycmhcndx/T/tmp.DVAXjerX/pkg/mod/github.com/container-storage-interface/[email protected]/lib/go/csi/v0/csi.pb.go:1233:1

real    0m1.276s
user    0m0.381s
sys    0m0.234s
+ godef -o 19358 -f controller.go
/var/folders/n4/35bgdnkd6vlgqrdzbyl0x1ycmhcndx/T/tmp.DVAXjerX/pkg/mod/github.com/container-storage-interface/[email protected]/lib/go/csi/v0/csi.pb.go:1233:1

real    0m1.295s
user    0m0.399s
sys    0m0.240s

With Docker

+ godef -o 19358 -f controller.go
/tmp/tmp.U5kJ73H8kO/pkg/mod/github.com/container-storage-interface/[email protected]/lib/go/csi/v0/csi.pb.go:1233:1

real    0m7.215s
user    0m10.540s
sys    0m2.270s
+ godef -o 19358 -f controller.go
/tmp/tmp.U5kJ73H8kO/pkg/mod/github.com/container-storage-interface/[email protected]/lib/go/csi/v0/csi.pb.go:1233:1

real    0m0.304s
user    0m0.250s
sys    0m0.110s
+ godef -o 19358 -f controller.go
/tmp/tmp.U5kJ73H8kO/pkg/mod/github.com/container-storage-interface/[email protected]/lib/go/csi/v0/csi.pb.go:1233:1

real    0m0.282s
user    0m0.230s
sys    0m0.110s

@myitcv myitcv changed the title cmd/go: slow native performance with Mac OS X 10.14.1 cmd/go: slow "native" performance with Mac OS X 10.14.1 and 10.12.6 Nov 13, 2018
@myitcv
Copy link
Member Author

myitcv commented Nov 13, 2018

Sorry, forgot to cc @rsc in the original description

@agnivade
Copy link
Contributor

Will a dtruss/strace log help here ? Just wondering if there is any OS level variable here.

@myitcv
Copy link
Member Author

myitcv commented Nov 27, 2018

cc @jayconrod

@jayconrod jayconrod self-assigned this Nov 27, 2018
@xmurobi
Copy link

xmurobi commented Dec 4, 2018

The same problem on my windows 10:

What version of Go are you using (go version)?
go version go1.11 windows/amd64

What operating system and processor architecture are you using (go env)?
set GOARCH=amd64
set GOBIN=
set GOCACHE=C:\Users\shuye\AppData\Local\go-build
set GOEXE=.exe
set GOFLAGS=
set GOHOSTARCH=amd64
set GOHOSTOS=windows
set GOOS=windows
set GOPATH=G:\GO
set GOPROXY=
set GORACE=
set GOROOT=D:\Go
set GOTMPDIR=
set GOTOOLDIR=D:\Go\pkg\tool\windows_amd64
set GCCGO=gccgo
set CC=gcc
set CXX=g++
set CGO_ENABLED=1
set GOMOD=G:\GIT.HUB\marketstore\go.mod
set CGO_CFLAGS=-g -O2
set CGO_CPPFLAGS=
set CGO_CXXFLAGS=-g -O2
set CGO_FFLAGS=-g -O2
set CGO_LDFLAGS=-g -O2
set PKG_CONFIG=pkg-config
set GOGCCFLAGS=-m64 -mthreads -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=C:\Users\shuye\AppData\Local\Temp\go-build930085358=/tmp/go-build -gno-record-gcc-switches

@jayconrod
Copy link
Contributor

I don't have an answer to why macOS 10.12 and 10.14 performance would be different, but I think I know why Docker and macOS are different.

As far as I understand, Docker on macOS actually runs a Linux kernel and userspace under a hypervisor. That means overhead for I/O and system calls is more similar to native Linux overhead than it is to macOS overhead. "go list" does a lot of stat and read system calls on small files, so the system call overhead matters a lot.

Here are the results from a quick benchmark I ran. The Read and Write tests are reading and writing slices of various sizes to a file (the file is almost certainly in the kernel file cache, so disk speed doesn't matter). The Import test is measuring the time to call go/build.Context.Import on "fmt".

macOS native:

BenchmarkStat-8            	  300000	      4470 ns/op
BenchmarkRead/1K-8         	   50000	     27366 ns/op
BenchmarkRead/50K-8        	   50000	     30980 ns/op
BenchmarkRead/1024K-8      	   10000	    104866 ns/op
BenchmarkWrite/1K-8        	   10000	    151379 ns/op
BenchmarkWrite/50K-8       	    5000	    311861 ns/op
BenchmarkWrite/1024K-8     	    1000	   2020063 ns/op
BenchmarkImport-8          	    2000	    919822 ns/op

macOS Docker

BenchmarkStat-4            	 1000000	      1222 ns/op
BenchmarkRead/1K-4         	  300000	      4101 ns/op
BenchmarkRead/50K-4        	  200000	      7405 ns/op
BenchmarkRead/1024K-4      	   20000	     92996 ns/op
BenchmarkWrite/1K-4        	   10000	    181860 ns/op
BenchmarkWrite/50K-4       	   10000	    341137 ns/op
BenchmarkWrite/1024K-4     	    1000	   2404169 ns/op
BenchmarkImport-4          	    2000	    580594 ns/op

Notably, the Stat and Read/1K tests are much, much slower on native macOS. Windows is quite a bit slower than this.

@magiconair
Copy link
Contributor

magiconair commented Mar 7, 2020

We have a project with a mono repo with 40k LOC, 700k LOC dependencies and 30 binaries, code generation before build and so forth. Running time make without any changes and a warm cache takes 3.3 sec on a modern MacBook Pro and only 2 sec on a Fedora VM running on the same machine using Parallels. It seems other people have similar experiences with VMware/Ubuntu and macOS.

all:
	go build -o build/ ./gen/... # build code generators
	go generate ./...
	go test ./...
	go build -o build/$(GOOS)_$(GOARCH)/ ./cmd/...

The build cache contains only this one project. When making a simple change to the project build times change to 5.3 sec vs. 2.8 sec. Building a single binary the numbers are 1.35 sec vs 0.85 sec.

We are using modules, everything is downloaded, no CGO and we are not cross-compiling.

Using a RAM disk on macOS for GOCACHE and the output directory did not improve the build speed.

I'm testing this on a quite beefy machine but this is more noticeable on smaller laptops of my colleagues. I'll double check but the build times were more in the range of 5-10 seconds. This can be quite noticeable and I'm curious how we can debug this a bit further.

Update 1:

The project and dependencies are about 1700 Go files at this point.

@randall77
Copy link
Contributor

Could this possibly be due to #26550 (thus a dup of #30497 or #27415) ?
(Note that change occurred for 10.12, so it can't explain the OP's problem, but it may explain what some other posters are seeing.)

@magiconair
Copy link
Contributor

The speed difference was already noticeable with go1.13 since I've started playing with a Fedora laptop a while ago. Not sure about go1.12.

@magiconair
Copy link
Contributor

@randall77 you mean that i/o syscalls are slower on Darwin than on Linux?

@randall77
Copy link
Contributor

@magiconair, yes, syscalls will be somewhat slower because they go through libc now. But I was particularly thinking of fsync which is much more expensive on Darwin. The regular fsync is kinda broken, so we have to use ioctl with F_FULLFSYNC.

BTW, that #26650, not #26550, sorry.

@ngrilly
Copy link

ngrilly commented Mar 15, 2020

Are we sure that what we are seeing here is not just Docker for Mac virtualization doing some very aggressive in-memory caching?

We are not just comparing Darwin to Linux; we are comparing Darwin to Hyperkit+Linux.

In the numbers shared by @myitcv in the issue description, native macOS is actually faster on the first cold run:

+ go list -export -deps -json -e .
real    0m1.524s
user    0m7.224s
sys 0m2.391s

versus this for Docker:

+ go list -export -deps -json -e .
real    0m4.837s
user    0m12.070s
sys     0m5.030s

Update: I assumed everybody was using Docker for Mac, but maybe you use another solution to run Docker on your Mac?

@magiconair
Copy link
Contributor

@randall77 is there some indication that this situation will improve in the future? Is Apple addressing the fsync issue? I'm now at 15sec build times on Darwin vs 4.5 sec in a Fedora VM running on the same super-beefy hardware.

@randall77
Copy link
Contributor

@randall77 is there some indication that this situation will improve in the future?

I don't know of anyone working on it, or anyone who has plans in this area. So, no.

Is Apple addressing the fsync issue?

I have no idea.

I'm now at 15sec build times on Darwin vs 4.5 sec in a Fedora VM running on the same super-beefy hardware.

I think if we want to make progress here we need to understand why it is so much slower. For build times, I don't think a slow fsync would be the cause. Building Go code is mostly reading; it's probably stat or read call. But that's a guess; data would help.

@magiconair
Copy link
Contributor

Building Go code is mostly reading; it's probably stat or read call. But that's a guess; data would help.

Do you have a quick tip on how to collect this data? Otherwise, I'll try and ask the internet.

@randall77
Copy link
Contributor

Run with strace?
Run under a profiler?
Both might be kinda tricky, as the build system runs multiple processes.

@jayconrod
Copy link
Contributor

Closing as obsolete, since this has been open a while.

Comparing I/O performance between native macOS and Docker (virtualized Linux) is somewhat apples and oranges. For whatever reason, Linux seems to be a lot faster on the same hardware.

There is certainly room for better I/O optimization in cmd/go, and it's probably worth studying further whether I/O changes much by macOS version (holding hardware and Go version constant).

@marcan
Copy link
Contributor

marcan commented Feb 17, 2022

Just FWIW, this is a problem with Apple NVMe drives (T2 and M1 Macs alike). Their cache flush performance is abysmal. This affects both macOS and (native) Linux (presumably existing VM solutions fail to forward this as F_FULLSYNC if they run faster). You get about 46 IOPS of F_FULLSYNC on macOS on an M1 machine, and about the same on native Linux with fsync().

So there isn't much to be done on the software side other than deciding whether you care about data integrity on macOS or not; on Linux there is no way for user processes to control this as far as I know (i.e. flush to cache but not the cache), though it can be emulated by setting the drive write cache type to "write through" in sysfs (which tells the kernel to never issue flush commands, so fsync() behaves like it does on macOS).

@myitcv
Copy link
Member Author

myitcv commented Feb 17, 2022

For the benefit of others, @marcan also posted this fantastic thread: https://twitter.com/marcan42/status/1494213855387734019

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge GoCommand cmd/go NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. OS-Darwin Performance
Projects
None yet
Development

No branches or pull requests