Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/link: emitting DWARF doubles time spent in linker #26318

Closed
rsc opened this issue Jul 10, 2018 · 8 comments
Closed

cmd/link: emitting DWARF doubles time spent in linker #26318

rsc opened this issue Jul 10, 2018 · 8 comments
Labels
early-in-cycle A change that should be done early in the 3 month dev cycle. FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. release-blocker ToolSpeed
Milestone

Comments

@rsc
Copy link
Contributor

rsc commented Jul 10, 2018

On Linux:

$ perflock bash -c 'for i in $(seq 10); do rm -f /tmp/x1 && 9 time go build -o /tmp/x1 cmd/go; done'
1.27u 0.14s 1.10r 	 go build -o /tmp/x1 cmd/go ...
1.23u 0.16s 1.09r 	 go build -o /tmp/x1 cmd/go ...
1.18u 0.17s 1.12r 	 go build -o /tmp/x1 cmd/go ...
1.20u 0.16s 1.11r 	 go build -o /tmp/x1 cmd/go ...
1.22u 0.18s 1.07r 	 go build -o /tmp/x1 cmd/go ...
1.23u 0.13s 1.09r 	 go build -o /tmp/x1 cmd/go ...
1.23u 0.15s 1.08r 	 go build -o /tmp/x1 cmd/go ...
1.25u 0.14s 1.09r 	 go build -o /tmp/x1 cmd/go ...
1.23u 0.14s 1.05r 	 go build -o /tmp/x1 cmd/go ...
1.21u 0.15s 1.08r 	 go build -o /tmp/x1 cmd/go ...
$ perflock bash -c 'for i in $(seq 10); do rm -f /tmp/x2 && 9 time go build -ldflags=-w -o /tmp/x2 cmd/go; done'
0.66u 0.16s 0.63r 	 go build -ldflags=-w -o /tmp/x2 ...
0.72u 0.12s 0.63r 	 go build -ldflags=-w -o /tmp/x2 ...
0.67u 0.13s 0.57r 	 go build -ldflags=-w -o /tmp/x2 ...
0.66u 0.17s 0.57r 	 go build -ldflags=-w -o /tmp/x2 ...
0.67u 0.13s 0.57r 	 go build -ldflags=-w -o /tmp/x2 ...
0.66u 0.16s 0.57r 	 go build -ldflags=-w -o /tmp/x2 ...
0.68u 0.14s 0.57r 	 go build -ldflags=-w -o /tmp/x2 ...
0.66u 0.13s 0.58r 	 go build -ldflags=-w -o /tmp/x2 ...
0.68u 0.13s 0.57r 	 go build -ldflags=-w -o /tmp/x2 ...
0.70u 0.10s 0.59r 	 go build -ldflags=-w -o /tmp/x2 ...
$ 

Just plain "go list -json cmd/go", which essentially does the prep for a build, takes about 0.12r, so the upper bound on time spent in link here is about 0.45r without DWARF and 0.93r with DWARF. That is, emitting DWARF doubles the amount of time it takes for the linker to execute. We don't pay this cost for "go test" or "go run", because we turn DWARF off for those already.

The Mac shows a similar ~2X slowdown. I thought maybe #12259 was related but given that Linux clearly is unaffected by that issue, it appears to be a general DWARF problem.

I've put the proposal to turn off DWARF by default (#26074) on hold pending better understanding of where all the time is going (this issue). If we decide that DWARF really is a 2X cost, then we need to go back to #26074 and decide whether that's worth paying on every build when most people make no use of the DWARF info.

/cc @dr2chase @cherrymui @thanm @ianlancetaylor @randall77 @aclements @RLH

@rsc rsc added this to the Go1.12 milestone Jul 10, 2018
@rsc rsc added early-in-cycle A change that should be done early in the 3 month dev cycle. release-blocker labels Jul 10, 2018
@ianlancetaylor ianlancetaylor added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Jul 10, 2018
@aclements
Copy link
Member

As a point of comparison, I ran both a build -a cmd/go and just the link step to get a sense of the spectrum between a fully-non-incremental build and a fully-incremental build:

perflock benchcmd -n 10 BuildCmdGoAll sh -c "rm -f /tmp/x1 && go build -o /tmp/x1 -a cmd/go" > dwarf
perflock benchcmd -n 10 BuildCmdGoAll sh -c "rm -f /tmp/x1 && go build -o /tmp/x1 -a -gcflags -dwarf=false -ldflags -w cmd/go" > nodwarf
perflock benchcmd -n 10 LinkCmdGo sh -c "rm -f /tmp/x1 && go build -o /tmp/x1 cmd/go" >> dwarf
perflock benchcmd -n 10 LinkCmdGo sh -c "rm -f /tmp/x1 && go build -o /tmp/x1 -ldflags -w cmd/go" >> nodwarf
benchsave nodwarf dwarf

From my linux/amd64 laptop:

name           old time/op         new time/op         delta
BuildCmdGoAll          10.0s ±11%          10.6s ± 5%   +6.41%  (p=0.004 n=9+10)
LinkCmdGo              626ms ± 5%         1096ms ± 2%  +75.17%  (p=0.000 n=10+10)

name           old peak-RSS-bytes  new peak-RSS-bytes  delta
BuildCmdGoAll           163M ± 2%           198M ± 8%  +21.33%  (p=0.000 n=10+10)
LinkCmdGo               164M ± 2%           192M ± 7%  +17.37%  (p=0.000 n=10+10)

(https://perf.golang.org/search?q=upload:20180711.1)

This at least probably explains how we missed this. We were exceedingly careful about compiler performance and kept an eye on overall build performance, but weren't looking at the effects on just the link step. Perhaps compilebench should include a linker benchmark.

Looking at the profiles, about half of the additional link time comes from compression. An easy thing to try would be to reduce the compression level to favor speed. About a quarter of the time comes from dwarfgeneratedebugsyms (half of that is in writelines, which we may be able to move more of into the compiler, or parallelize in the linker itself). And the other quarter comes from additional time spent in GC (corresponding to the ~20% increase in peak RSS).

@aclements
Copy link
Member

An easy thing to try would be to reduce the compression level to favor speed.

This is pretty promising. Using zlib.BestSpeed wins back about half of the linker slow-down. Comparing no-DWARF ("old") to BestSpeed compressed DWARF ("new"):

name           old time/op         new time/op         delta
BuildCmdGoAll          10.0s ±11%          10.8s ± 5%   +8.03%  (p=0.003 n=9+9)
LinkCmdGo              626ms ± 5%          860ms ± 2%  +37.36%  (p=0.000 n=10+8)

And the impact on binary size is minimal.

             cmd/go bytes
nodwarf      10106953
dwarf        12159049 nodwarf+20%
dwarf-speed  12408905 nodwarf+23%
dwarf-nozlib 17766473 nodwarf+76%

@gopherbot
Copy link
Contributor

Change https://golang.org/cl/123356 mentions this issue: cmd/link: use BestSpeed for DWARF compression

gopherbot pushed a commit that referenced this issue Jul 11, 2018
DWARF compression accounts for roughly 30% of the linker's time. This
CL switches from DefaultCompression to BestSpeed, which virtually
eliminates this time. This roughly halves the overhead of handling
DWARF in the linker:

name \ time/op         nodwarf     dwarf        dwarf-speed
BuildCmdGoAll          10.0s ±11%   10.6s ± 5%   10.8s ± 5%
             nodwarf                  +6.41%       +8.03%
               dwarf                                 ~
LinkCmdGo              626ms ± 5%  1096ms ± 2%   860ms ± 2%
             nodwarf                 +75.17%      +37.36%
               dwarf                              -21.59%

Previously, enabling DWARF had a 75% overhead in link time for cmd/go.
This change reduces this overhead to 37% (a 22% reduction).

The effect on binary size is minimal compared to DefaultCompression,
and still substantially better than no compression:

             cmd/go bytes
nodwarf      10106953
dwarf        12159049 nodwarf+20%
dwarf-speed  12408905 nodwarf+23%
dwarf-nozlib 17766473 nodwarf+76%

Updates #26318.

Change-Id: I33bb7caa038a2753c29104501663daf4839e7054
Reviewed-on: https://go-review.googlesource.com/123356
Run-TryBot: Austin Clements <[email protected]>
Reviewed-by: Heschi Kreinick <[email protected]>
@cespare
Copy link
Contributor

cespare commented Jul 11, 2018

@aclements benchcmd looks really nice. Where does it come from? Some light googling and github searches didn't turn it up.

@aclements
Copy link
Member

@cespare: https://godoc.org/github.com/aclements/go-misc/benchcmd

@aclements
Copy link
Member

Here's a comparison of building vs just linking jujud between Go 1.10 and tip (including BestSpeed DWARF compression):

name           old time/op         new time/op         delta
BuildJujudAll          43.5s ± 0%          41.6s ± 2%   -4.29%   (p=0.000 n=8+10)
LinkJujud              9.28s ± 1%         11.73s ± 2%  +26.36%   (p=0.000 n=9+10)

So, we're winning on non-incremental builds, probably because of indexed export info, but the link step in isolation is 26% slower. Not great, though certainly better than 2X or the 40% reported in #26074 (comment).

@gopherbot
Copy link
Contributor

Change https://golang.org/cl/126656 mentions this issue: cmd/go: add $GOFLAGS environment variable

gopherbot pushed a commit that referenced this issue Aug 1, 2018
People sometimes want to turn on a particular go command flag by default.
In Go 1.11 we have at least two different cases where users may need this.

1. Linking can be noticeably slower on underpowered systems
due to DWARF, and users may want to set -ldflags=-w by default.

2. For modules, some users or CI systems will want vendoring always,
so they want -getmode=vendor (soon to be -mod=vendor) by default.

This CL generalizes the problem to “set default flags for the go command.”

$GOFLAGS can be a space-separated list of flag settings, but each
space-separated entry in the list must be a standalone flag.
That is, you must do 'GOFLAGS=-ldflags=-w' not 'GOFLAGS=-ldflags -w'.
The latter would mean to pass -w to go commands that understand it
(if any do; if not, it's an error to mention it).

For #26074.
For #26318.
Fixes #26585.

Change-Id: I428f79c1fbfb9e41e54d199c68746405aed2319c
Reviewed-on: https://go-review.googlesource.com/126656
Run-TryBot: Russ Cox <[email protected]>
TryBot-Result: Gobot Gobot <[email protected]>
Reviewed-by: Rob Pike <[email protected]>
@ianlancetaylor
Copy link
Member

Here is what I see today.

> perflock bash -c 'for i in $(seq 10); do rm -f /tmp/x1 && time go build -o /tmp/x1 cmd/go; done'

real	0m0.812s
user	0m0.882s
sys	0m0.204s

real	0m0.767s
user	0m0.862s
sys	0m0.168s

real	0m0.756s
user	0m0.906s
sys	0m0.144s

real	0m0.788s
user	0m0.909s
sys	0m0.153s

real	0m0.735s
user	0m0.854s
sys	0m0.124s

real	0m0.762s
user	0m0.883s
sys	0m0.163s

real	0m0.746s
user	0m0.870s
sys	0m0.115s

real	0m0.767s
user	0m0.832s
sys	0m0.189s

real	0m0.743s
user	0m0.902s
sys	0m0.143s

real	0m0.762s
user	0m0.888s
sys	0m0.145s

> perflock bash -c 'for i in $(seq 10); do rm -f /tmp/x2 && time go build -ldflags=-w -o /tmp/x2 cmd/go; done'

real	0m0.551s
user	0m0.636s
sys	0m0.179s

real	0m0.587s
user	0m0.645s
sys	0m0.152s

real	0m0.576s
user	0m0.747s
sys	0m0.162s

real	0m0.578s
user	0m0.652s
sys	0m0.162s

real	0m0.569s
user	0m0.714s
sys	0m0.133s

real	0m0.545s
user	0m0.666s
sys	0m0.173s

real	0m0.559s
user	0m0.664s
sys	0m0.158s

real	0m0.548s
user	0m0.659s
sys	0m0.200s

real	0m0.561s
user	0m0.668s
sys	0m0.148s

real	0m0.549s
user	0m0.719s
sys	0m0.168s

WIth DWARF the average is

real 0.763s
user 0.878s
sys  0.154s

WIthout DWARF:

real 0.562s
user 0.677s
sys  0.163s

So today the slowdown of using DWARF is around 30%, which is a lot better than 200%.

Looking at the CPU profile when linking cmd/go I see 4.76% of CPU time in dwarfGenerateDebugInfo, 6.35% in dwarfGenerateDebugSyms, and 12.7% of time in compressSyms. By way of comparison, runtime.scanObject takes 15.87%.

These numbers seem plausible to me and I think we've answered the question posed by this issue, so closing.

@golang golang locked and limited conversation to collaborators Nov 28, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
early-in-cycle A change that should be done early in the 3 month dev cycle. FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. release-blocker ToolSpeed
Projects
None yet
Development

No branches or pull requests

6 participants