Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/go: go env GOMOD is too slow for use in interactive commands #29382

Closed
heschi opened this issue Dec 21, 2018 · 27 comments
Closed

cmd/go: go env GOMOD is too slow for use in interactive commands #29382

heschi opened this issue Dec 21, 2018 · 27 comments
Labels
early-in-cycle A change that should be done early in the 3 month dev cycle. FrozenDueToAge GoCommand cmd/go ToolSpeed
Milestone

Comments

@heschi
Copy link
Contributor

heschi commented Dec 21, 2018

We're porting a lot of tools to support modules, and most of them need to know whether modules are enabled to turn on the newer, slower, potentially buggier code path. Running go env GOMOD takes about 10ms, which is not negligible for an interactive command like godef or guru. The algorithm for determining if modules are on is just simple enough to tempt people to roll their own. In fact, there's already a second implementation in go/build:

switch os.Getenv("GO111MODULE") {

This is unfortunate, especially since the algorithm changed in http://golang.org/cl/148517. (Maybe not in a way that matters for go/build? Not sure.)

I think it'd be helpful to have a standalone internal package that implemented this functionality that could be used for go env, go/build, and copied into x/tools for other tools to use.

@bcmills @jayconrod @ianthehat

@bcmills
Copy link
Contributor

bcmills commented Dec 21, 2018

The algorithm has changed before, and will likely change again (whenever the default behavior switches over). Probably it's better to make go env GOMOD fast than to duplicate it with version skew.

I'm a total novice with the Linux perf tool, but it seems to imply that we're spending a significant fraction of our latency initializing maps, regexps, and templates that aren't actually going to be used.

From

$ GOMAXPROCS=1 perf record --freq=max --call-graph fp -- go env GOMOD
$ perf report

I get the following:

-   76.18%     0.00%  go       go                  [.] runtime.main                                                                  ◆
   - runtime.main                                                                                                                    ▒
      - 71.40% main.init                                                                                                             ▒
         - 61.68% cmd/go/internal/bug.init                                                                                           ▒
            - cmd/go/internal/envcmd.init                                                                                            ▒
               - 48.18% cmd/go/internal/modload.init                                                                                 ▒
                  - 46.52% cmd/go/internal/modfetch.init                                                                             ▒
                     - 41.80% cmd/go/internal/get.init                                                                               ▒
                        - 23.85% cmd/go/internal/work.init                                                                           ▒
                           + 22.31% regexp.MustCompile                                                                               ▒
                           + 0.82% cmd/go/internal/work.init.0                                                                       ▒
                           + 0.72% debug/elf.init                                                                                    ▒
                        - 8.70% cmd/go/internal/get.init.1                                                                           ▒
                             regexp.MustCompile                                                                                      ▒
                           + regexp.compile                                                                                          ▒
                        - 7.08% cmd/go/internal/web.init                                                                             ▒
                           - 4.61% net/http.init                                                                                     ▒
                              - 1.91% internal/x/net/http/httpguts.init                                                              ▒
                                   internal/x/net/idna.init                                                                          ▒
                                 - internal/x/text/unicode/norm.init                                                                 ▒
                                      1.25% runtime.mapassign_fast32                                                                 ▒
                                      0.66% runtime.aeshash32                                                                        ▒
                              - 1.37% internal/x/net/http2/hpack.init                                                                ▒
                                   internal/x/net/http2/hpack.newStaticTable                                                         ▒
                                 + runtime.mapassign                                                                                 ▒
                                0.70% runtime.mapassign_fast64                                                                       ▒
                              + 0.63% mime/multipart.init                                                                            ▒
                           - 2.47% crypto/tls.init                                                                                   ▒
                              + 1.82% crypto/des.init                                                                                ▒
                              + 0.65% crypto/x509.init                                                                               ▒
                        + 0.81% regexp.MustCompile                                                                                   ▒
                        + 0.75% cmd/go/internal/get.init.0                                                                           ▒
                          0.61% crypto/tls.init                                                                                      ▒
                     + 2.73% cmd/go/internal/modfetch/codehost.init                                                                  ▒
                     + 1.15% archive/zip.init                                                                                        ▒
                     + 0.83% regexp.MustCompile                                                                                      ▒
                  + 0.84% cmd/go/internal/imports.init                                                                               ▒
                  + 0.83% regexp.MustCompile                                                                                         ▒
               - 12.96% cmd/go/internal/load.init                                                                                    ▒
                    text/template.(*Template).Parse                                                                                  ▒
                    text/template/parse.Parse                                                                                        ▒
                    text/template/parse.(*Tree).Parse                                                                                ▒
                  + text/template/parse.(*Tree).parse                                                                                ▒
         - 4.54% cmd/go/internal/base.init                                                                                           ▒
            - cmd/go/internal/cfg.init                                                                                               ▒
               - 3.83% go/build.init                                                                                                 ▒
                  - 1.96% go/doc.init                                                                                                ▒
                     + 1.38% regexp.MustCompile                                                                                      ▒
                  + 0.78% go/build.defaultContext                                                                                    ▒
         + 1.64% cmd/go/internal/test.init                                                                                           ▒
         + 1.58% flag.init                                                                                                           ▒
         + 0.83% cmd/go/internal/list.init                                                                                           ▒
         + 0.83% cmd/go/internal/clean.init                                                                                          ▒
      + 3.87% main.main                                                                                                              ▒

@heschi
Copy link
Contributor Author

heschi commented Dec 21, 2018

I agree that'd be better, and I know Brad was looking at lazy-loading this kind of stuff, but in general keeping init functions out of something as big as the main go command seems pretty difficult. Maybe a new command in $GOROOT/bin, goenv or something?

@mark-rushakoff
Copy link
Contributor

The lazy-loading issue was #26775.

@bcmills
Copy link
Contributor

bcmills commented Dec 21, 2018

Let's see how far we can get making go env faster. I'd really rather not introduce a whole binary just to shave off (optimistically) ≤8ms, especially given that the result of go env GOMOD for a given working directory should be pretty amenable to caching for long-running tools like editors.

@mvdan
Copy link
Member

mvdan commented Dec 21, 2018

I actually was thinking the same when I saw go env executions like go env GOARCH show up in the trace reports from a tool I've written that uses go/packages. I agree that shaving off a few milliseconds from go env should be a good start.

I agree with the sentiment that milliseconds matter for some tools, but given that go/packages is built around executing the go tool, I don't think it's a good idea to break that rule just for performance. Perhaps we could instead reduce the amount of times the go tool is executed as part of those interactive tools.

@gopherbot
Copy link
Contributor

Change https://golang.org/cl/155540 mentions this issue: cmd/go: avoid compiling many regexes at init

@mvdan
Copy link
Member

mvdan commented Dec 21, 2018

I had a stab at this and shaved a millisecond off on my machine (20%). The bigger offenders given by perf report are now a bit more tricky, like 5.20% net/http.init and 3.76% crypto/tls.init. A few percent still belong to regexp, but I've left them for later.

I wonder if packages like http and tls could reduce their init cost even further. If not, perhaps it does make sense to split go env into a much smaller binary with a tiny init function.

I'm also a bit puzzled by how time go env GOARCH on Bash seems to give numbers around 10ms, while the benchmark gives numbers around 4ms. I'm not sure what could explain the almost tripled run-time.

@mark-rushakoff
Copy link
Contributor

time go env GOARCH on Bash seems to give numbers around 10ms, while the benchmark gives numbers around 4ms.

echo 'package main; func main(){}' > c.go
go build ./c.go
time ./c

time reports 5-8 milliseconds to run a no-op go program on my machine, so it's probably overhead from loading/starting the process and whatever go runtime initialization is involved. time /usr/bin/true (an actual 17k compiled executable) takes 3-6 milliseconds on my machine so the time for the no-op go program seems reasonable to me.

@mvdan
Copy link
Member

mvdan commented Dec 22, 2018

Interesting - but the Go benchmark is also executing the go binary at each iteration. Perhaps it's faster since it's loading the same binary over and over again.

If anyone knows of a better way to write a benchmark for this, please let me know. Benchmarking a function directly, like the generated init function for the entire cmd/go main package, would be great - as we'd be able to get cpu and memory profiles directly from the benchmark. But I don't think that is possible at the moment.

@gopherbot
Copy link
Contributor

Change https://golang.org/cl/155961 mentions this issue: cmd/go: add benchmark that execs 'go env GOARCH'

@gopherbot
Copy link
Contributor

Change https://golang.org/cl/155962 mentions this issue: cmd/go: delay parsing the testmain template

@gopherbot
Copy link
Contributor

Change https://golang.org/cl/155963 mentions this issue: cmd/go: delay compiling some more regexes

@mvdan
Copy link
Member

mvdan commented Dec 30, 2018

I've found a few more ways to remove work from cmd/go, and now go env GOARCH went from ~5ms to ~2.3ms on the added benchmark. Similarly, time go env GOARCH has gone from ~15-20ms to ~10-12ms on my laptop, though those numbers jump around much more than the benchmark ones.

I think we should be able to fairly easily shave off some more from cmd/go's init for 1.13. On perf report, regexp.MustCompile still owns 7%, and go/build.init is an astonishing 10%, so I think we could realistically shave off another 20% or even more.

I'll stop with the CLs for now, to get some input on the benchmark and style of the CLs. Happy to work on more once these have been approved.

@bcmills
Copy link
Contributor

bcmills commented Jan 17, 2019

Retitling to reflect the current approach.

@bcmills bcmills changed the title cmd/go/internal: separate go env GOMOD into reusable package cmd/go: go env GOMOD is too slow for use in interactive commands Jan 17, 2019
@bcmills bcmills added this to the Go1.13 milestone Jan 17, 2019
@bcmills bcmills added the early-in-cycle A change that should be done early in the 3 month dev cycle. label Jan 18, 2019
@gopherbot
Copy link
Contributor

Change https://golang.org/cl/164040 mentions this issue: internal/lazyregexp: add a lazy Regexp package

gopherbot pushed a commit that referenced this issue Feb 27, 2019
This was implemented as part of go/doc, but it's going to be useful in
other packages. In particular, many packages under cmd/go like web and
vcs make somewhat heavy use of global regexes, which add a non-trivial
amount of init work to the cmd/go program.

A lazy wrapper around regexp.Regexp will make it trivial to get rid of
the extra cost with a trivial refactor, so make it possible for other
packages in the repository to make use of it. While naming the package,
give the members better names, such as lazyregexp.New and
lazyregexp.Regexp.

We're also considering adding some form of a lazy API to the public
regexp package, so this internal package will allow us to get some
initial experience across std and cmd.

For #29382.

Change-Id: I30b0e72871d5267c309786f95f4cb15c68b2393d
Reviewed-on: https://go-review.googlesource.com/c/164040
Run-TryBot: Daniel Martí <[email protected]>
Reviewed-by: Brad Fitzpatrick <[email protected]>
Reviewed-by: Bryan C. Mills <[email protected]>
TryBot-Result: Gobot Gobot <[email protected]>
gopherbot pushed a commit that referenced this issue Feb 27, 2019
'go env' is used for many quick operations, such as in go/packages to
query GOARCH and GOMOD. It often is a bottleneck; for example,
go/packages doesn't know whether or not to use Go modules until it has
queried GOMOD.

As such, this go command should be fast. Right now it's slower than it
should be. This commit adds a simple benchmark with os/exec, since we're
particularly interested in the cost of cmd/go's large init function.

Updates #29382.

Change-Id: Ifee6fb9997b9b89565fbfc2739a00c86117b1d37
Reviewed-on: https://go-review.googlesource.com/c/155961
Run-TryBot: Daniel Martí <[email protected]>
TryBot-Result: Gobot Gobot <[email protected]>
Reviewed-by: Bryan C. Mills <[email protected]>
gopherbot pushed a commit that referenced this issue Feb 27, 2019
These regexes are all related to commands like get and build, so they're
unnecessary for simpler commands like env. In particular, we need env to
be fast, since libraries like go/packages call it early and often. Some
external Go tools are interactive, so milliseconds matter.

lazyregexp eagerly compiles the patterns when running from within a test
binary, so there's no longer any need to do that as part of non-test
binaries.

Picking up the low-hanging fruit spotted by 'perf record' shaves off
well over a full millisecond off the benchmark on my laptop:

name         old time/op    new time/op    delta
ExecGoEnv-8    4.92ms ± 1%    3.81ms ± 0%  -22.52%  (p=0.004 n=6+5)

This CL required adding a few more methods to the lazy regexp wrapper.

Updates #29382.

Change-Id: I22417ab6258f7437a2feea0d25ceb2bb4d735a15
Reviewed-on: https://go-review.googlesource.com/c/155540
Run-TryBot: Daniel Martí <[email protected]>
Reviewed-by: Brad Fitzpatrick <[email protected]>
Reviewed-by: Bryan C. Mills <[email protected]>
TryBot-Result: Gobot Gobot <[email protected]>
@gopherbot
Copy link
Contributor

Change https://golang.org/cl/164337 mentions this issue: internal/lazytemplate: add a lazy template wrapper

gopherbot pushed a commit that referenced this issue Feb 27, 2019
Similar to internal/lazyregexp, this will allow removing unnecessary
work from init functions with trivial refactors, thanks to sync.Once.

Copy the structure. The only major difference is that a template also
carries a name.

For #29382.

Change-Id: I65d096dc2e2072b310bf59a814cd62669856b5b5
Reviewed-on: https://go-review.googlesource.com/c/164337
Run-TryBot: Daniel Martí <[email protected]>
TryBot-Result: Gobot Gobot <[email protected]>
Reviewed-by: Brad Fitzpatrick <[email protected]>
gopherbot pushed a commit that referenced this issue Feb 28, 2019
The template is over a hundred lines and full of pipelines, and
text/template isn't optimised to parse quickly, so it's no wonder that
delaying the parsing to the first template use makes 'go env' much
faster.

Like in the previous patches to get rid of global regexp.MustCompile
vars, use the newly introduced lazytemplate package. Close to two full
milliseconds are shaved off of 'go env' runs.

name         old time/op    new time/op    delta
ExecGoEnv-8    4.27ms ± 0%    2.63ms ± 1%  -38.43%  (p=0.002 n=6+6)

Updates #29382.

Change-Id: I4e2569e51ddf2afe1b46eb1a9e9e5845f7a3b0bd
Reviewed-on: https://go-review.googlesource.com/c/155962
Run-TryBot: Daniel Martí <[email protected]>
TryBot-Result: Gobot Gobot <[email protected]>
Reviewed-by: Brad Fitzpatrick <[email protected]>
@mvdan
Copy link
Member

mvdan commented Feb 28, 2019

All previous changes merged; go env should now be about twice as fast.

A perf report now shows these remaining offenders:

  • errors.New at 21%, most of which is runtime.Callers at 20%. Why do global errors need stack traces?
  • archive/zip at 3%, which we might be able to improve slightly
  • runtime.mapassign_faststr at 7%, so there's quite a bit of map filling work happening somewhere
  • runtime.newobject at 6%, so there probably are quite a lot of global object allocs around

I think the only big low-hanging fruit is errors.New. There should definitely be a way to create global errors without incurring extra init cost beyond allocating the error. I presume the 1.13 error changes, which include stack frames, are partially to blame for this spike in init cost.

What should the solution be? Not to collect stack frames if errors.New is running as part of init? Using static error types, which would be a large refactor? Making runtime.Callers a no-op while init is running? /cc @jba @mpvl

@bcmills
Copy link
Contributor

bcmills commented Feb 28, 2019

runtime.Caller, at least, can theoretically be evaluated (and inlined) at link time.

I filed #30468 for the general regression. In the meantime perhaps we should substitute an internal package for errors: it might be as simple as sed -i 's/ "errors"/ "cmd/go/internal/errors"/' $(find cmd/go -name '*.go'.

@mvdan
Copy link
Member

mvdan commented Feb 28, 2019

Thanks! I don't think this is terribly urgent; we won ~50% and lost ~20%, so I'd wait another week or two to see if #30468 can have a good solution.

@gopherbot
Copy link
Contributor

Change https://golang.org/cl/166459 mentions this issue: cmd/go: further reduce init work

@mvdan
Copy link
Member

mvdan commented Mar 9, 2019

I've had a look around and managed to scrape another 3%. I don't think we're going to get much more without larger refactors, such as making cmd/go's init work be lazier, or having the compiler generate better code for static global maps.

If anyone is better at perf report than I am and manages to spot more low-hanging fruit, please do speak up :)

I'd say we should focus on #30468 before continuing; right now errors.New is just taking the top spot in too many generated package init functions.

@gopherbot
Copy link
Contributor

Change https://golang.org/cl/167401 mentions this issue: errors: improve performance of New

@gopherbot
Copy link
Contributor

Change https://golang.org/cl/167400 mentions this issue: errors: record only single frame

gopherbot pushed a commit that referenced this issue Mar 14, 2019
See Issue #29382 and Issue #30468.

3 frames are no longer needed as of
https://go-review.googlesource.com/c/go/+/152537/

name                     old time/op  new time/op  delta
New-8                     475ns ± 3%   352ns ± 2%  -25.87%  (p=0.008 n=5+5)
Errorf/no_format-8        661ns ± 4%   558ns ± 2%  -15.63%  (p=0.008 n=5+5)
Errorf/with_format-8      729ns ± 6%   626ns ± 2%  -14.23%  (p=0.008 n=5+5)
Errorf/method:_mytype-8  1.00µs ± 9%  0.84µs ± 2%  -15.94%  (p=0.008 n=5+5)
Errorf/method:_number-8  1.25µs ± 7%  1.04µs ± 2%  -16.38%  (p=0.008 n=5+5)

Change-Id: I30377e769b3b3be623f63ecbe365f8950ca08dda
Reviewed-on: https://go-review.googlesource.com/c/go/+/167400
Run-TryBot: Marcel van Lohuizen <[email protected]>
TryBot-Result: Gobot Gobot <[email protected]>
Reviewed-by: Damien Neil <[email protected]>
gopherbot pushed a commit that referenced this issue Mar 14, 2019
See Issue #29382 and Issue #30468.

Improvements in this CL:

name                     old time/op  new time/op  delta
New-8                     352ns ± 2%   225ns ± 5%  -36.04%  (p=0.008 n=5+5)

Improvements together with moving to 1 uintptr:

name                     old time/op  new time/op  delta
New-8                     475ns ± 3%   225ns ± 5%  -52.59%  (p=0.008 n=5+5)

Change-Id: I9d69a14e5e10a6498767defb7d5f26ceedcf9ba5
Reviewed-on: https://go-review.googlesource.com/c/go/+/167401
Run-TryBot: Marcel van Lohuizen <[email protected]>
TryBot-Result: Gobot Gobot <[email protected]>
Reviewed-by: Damien Neil <[email protected]>
@rsc
Copy link
Contributor

rsc commented Mar 14, 2019

I'm seeing 20ms for 'go env GOMOD'. Is that really too slow?

@mvdan
Copy link
Member

mvdan commented Mar 14, 2019

It's gotten ~50% faster since the issue was opened :) At least as far as the benchmark tells us.

@mvdan
Copy link
Member

mvdan commented Mar 14, 2019

Now that the CLs to make errors.New faster have gone in, perf report shows errors.New at ~10%, and runtime.Callers at 8%, down from the previous 21% and 20%. #30820 would presumably let us get rid of that entire chunk.

@bcmills
Copy link
Contributor

bcmills commented Mar 14, 2019

I'm seeing 11ms at this point. That's well below typical keypress latency for a USB keyboard,¹ and can generally be cached for a given working directory anyway.

That seems good enough for interactive use.

¹https://danluu.com/keyboard-latency/

@bcmills bcmills closed this as completed Mar 14, 2019
gopherbot pushed a commit that referenced this issue Mar 31, 2019
The first biggest offender was crypto/des.init at ~1%. It's
cryptographically broken and the init function is relatively expensive,
which is unfortunate as both crypto/tls and crypto/x509 (and by
extension, cmd/go) import it. Hide the work behind sync.Once.

The second biggest offender was flag.sortFlags at just under 1%, used by
the Visit flagset methods. It allocated two slices, which made a
difference as cmd/go iterates over multiple flagsets during init.
Use a single slice with a direct sort.Interface implementation.

Another big offender is initializing global maps. Reducing this work in
cmd/go/internal/imports and net/textproto gives us close to another
whole 1% in saved work. The former can use map literals, and the latter
can hide the work behind sync.Once.

Finally, compress/flate used newHuffmanBitWriter as part of init, which
allocates many objects and slices. Yet it only used one of the slice
fields. Allocating just that slice saves a surprising ~0.3%, since we
generated a lot of unnecessary garbage.

All in all, these little pieces amount to just over 3% saved CPU time.

name         old time/op  new time/op  delta
ExecGoEnv-8  3.61ms ± 1%  3.50ms ± 0%  -3.02%  (p=0.000 n=10+10)

Updates #26775.
Updates #29382.

Change-Id: I915416e88a874c63235ba512617c8aef35c0ca8b
Reviewed-on: https://go-review.googlesource.com/c/go/+/166459
Run-TryBot: Daniel Martí <[email protected]>
TryBot-Result: Gobot Gobot <[email protected]>
Reviewed-by: Brad Fitzpatrick <[email protected]>
@golang golang locked and limited conversation to collaborators Mar 13, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
early-in-cycle A change that should be done early in the 3 month dev cycle. FrozenDueToAge GoCommand cmd/go ToolSpeed
Projects
None yet
Development

No branches or pull requests

6 participants