Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/go: occasional "failed to cache compiled Go files" when 'go list -compiled' runs in parallel #29667

Closed
cespare opened this issue Jan 10, 2019 · 18 comments
Labels
FrozenDueToAge GoCommand cmd/go NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone

Comments

@cespare
Copy link
Contributor

cespare commented Jan 10, 2019

What version of Go are you using (go version)?

Go 1.11.4

Does this issue reproduce with the latest release?

It does not reproduce with Go 1.12beta1. It reproduces at tip at a somewhat lower rate. See @dominikh's comment, below.

What operating system and processor architecture are you using (go env)?

linux/amd64

What did you do?

@dominikh wrote a small reproducer: https://play.golang.org/p/ghOtMwBnGu7

It may take a few runs to fail. Clean out your build cache or use GOCACHE=/tmp/empty to make it more reliably fail.

What did you expect to see?

Calling go list in parallel should succeed.

What did you see instead?

go build <some package>: failed to cache compiled Go files

I think this happens only when -compiled is provided to go list (which is new in Go 1.11?)

The context in which this comes up is multiple tools running golang.org/x/tools/go/packages.Load at the same time. I believe the issue is in go list but possibly the bug (or at least the workaround) should be in golang.org/x/tools/go/packages.

I also believe this is much more likely to happen with an empty GOCACHE. We've mainly been seeing this our CI environment. github.com/google/wire has also been seeing this in Travis: google/wire#66.

The effects of this issue tend to be pretty far away from the go list invocation. We've run into it via staticcheck (http://staticcheck.io/) as well as our own go/packages-using code.

I can't get this to reproduce in 1.12beta1. It would be helpful if someone can find the CL that fixed it. I think it would also be valuable if the fix could be backported to Go 1.11 in the meantime. (It does reproduce at tip.)

The ecosystem is rapidly converging on the new, go list-based way of writing tooling (typically via go/packages) so I expect this will arise often.

/cc @bcmills @dominikh @zombiezen @ianthehat

@julieqiu julieqiu added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Jan 10, 2019
@julieqiu julieqiu added this to the Go1.12 milestone Jan 10, 2019
@cespare
Copy link
Contributor Author

cespare commented Jan 10, 2019

Note that while this bug resembles #26794, it occurs when modules aren't involved (GOPATH mode).

(I haven't tried to reproduce it in conjunction with modules.)

@bcmills bcmills modified the milestones: Go1.12, Go1.11.5 Jan 11, 2019
@bcmills
Copy link
Contributor

bcmills commented Jan 11, 2019

I can't get this to reproduce in 1.12beta1.

Nor can I, and I really have no idea how it would have been fixed.

If someone wants to bisect we can at least consider a backport, but given that we can't reproduce it in 1.12, and that it occurs only rarely (and under unusual load) in 1.11, I can't justify putting a lot of effort into tracking it down.

@dominikh
Copy link
Member

I'll bisect it.

@dominikh
Copy link
Member

Actually, there is no need to bisect it. The issue still occurs as of go version devel +a2bb68de4d Fri Jan 11 16:08:59 2019 +0000 linux/amd64. When GOCACHE is on a tmpfs, the error occurs at a rate of roughly 1% on my system. It is easier to hit the issue on slower media, at a rate of anywhere from 53% to 93% on an NFS mount, which isn't very surprising if we assume that this is a race condition. It seems that 1.11.4 has vastly more instances of this error at a rate of 80% when on NFS, versus tip at 52%

@bcmills bcmills modified the milestones: Go1.11.5, Go1.13 Jan 11, 2019
@bcmills
Copy link
Contributor

bcmills commented Jan 11, 2019

Good to know — thanks for the investigation!

CC @jayconrod

@jayconrod jayconrod added the GoCommand cmd/go label Jan 11, 2019
@jayconrod jayconrod self-assigned this Jan 11, 2019
@cespare
Copy link
Contributor Author

cespare commented Jan 11, 2019

@dominikh thanks for that info. I updated the original description too.

@fho
Copy link

fho commented Jan 15, 2019

We are using Go 1.11.2 in our CI system and running parallel jobs that are accessing the same GOCACHE dir by the command:

go list -e -json -compiled -test=false -export=false -deps=true -- ./...

Recently we implemented to boot additional CI workers on demand. The worker are started with an empty GOCACHE directory.
Since this change we run into the failed to cache compiled Go files multiples times a day.

mariash added a commit to cloudfoundry/diego-release that referenced this issue Jan 18, 2019
@cespare
Copy link
Contributor Author

cespare commented Feb 2, 2019

This bug has been biting us a lot as we run more and more tools in CI environments which use go/packages. In case it helps other people, the workaround we recently started using is a shim binary that passes its arguments to the go tool. If it sees that it is being called as go list, it uses a global lock file to prevent simultaneous go list executions. In CI and script environments where the workaround is needed, the fake go is put in a directory at the front of the $PATH.

I also looked into writing a GOPACKAGESDRIVER tool which would apply the same lockfile-based mutual exclusion around go list but it looked like that would require copy-pasting a lot of code out of go/packages.

@nmiyake
Copy link
Contributor

nmiyake commented Mar 8, 2019

I'm still seeing this failure even after upgrading to Go 1.12 -- we're encountering this error when running checks in parallel in a CI environment. The CI environment uses Go 1.12 and the checks are compiled with Go 1.12 as well. The projects use GOPATH mode (it is not a module project).

Is there anything that we can do to help with the investigation or fix for this issue?

@jayconrod
Copy link
Contributor

@nmiyake If you could describe your CI environment file system, operating system, architecture and any other platform information relevant to file locking, that would be a useful data point.

@nmiyake
Copy link
Contributor

nmiyake commented Mar 8, 2019

This is on an enterprise deployment of CircleCI 2. The Docker image used to run the CI is the official golang:1.12 image.

Output of go env:

root@1965526c4de7:/go# go env
GOARCH="amd64"
GOBIN=""
GOCACHE="/root/.cache/go-build"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOOS="linux"
GOPATH="/go"
GOPROXY=""
GORACE=""
GOROOT="/usr/local/go"
GOTMPDIR=""
GOTOOLDIR="/usr/local/go/pkg/tool/linux_amd64"
GCCGO="gccgo"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build212534544=/tmp/go-build -gno-record-gcc-switches"

@viveklak
Copy link

Also seeing this locally as well as on CircleCI (cloud hosted offering). Running on official Go 1.12.1 image.

@ConradKurth
Copy link

Having this same problem. Our CI/CD is running golangci-lint, which underlying is the go list. Strange part is when I ssh into the box, i can run it fine, but when the program runs on its normal workflow it works

env is

GOARCH="amd64"
GOBIN=""
GOCACHE="/root/.cache/go-build"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOOS="linux"
GOPATH="/go"
GOPROXY=""
GORACE=""
GOROOT="/usr/local/go"
GOTMPDIR=""
GOTOOLDIR="/usr/local/go/pkg/tool/linux_amd64"
GCCGO="gccgo"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD="/go/src/github.com/HeadspaceMeditation/go-service-template/go.mod"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build389910482=/tmp/go-build -gno-record-gcc-switches"

@jwm
Copy link

jwm commented Jun 11, 2019

We're seeing this with go 1.11.5 and 1.12.5 (both the official Docker images).

After reading @cespare's comment about putting a mutex on go list, we added this go wrapper to our build process, which works around the problem nicely, letting us perform parallel builds.

#!/bin/sh

set -e

for i in "$@"; do
	if [ "$i" = list ]; then
		set -o noclobber
		while ! echo "$$" >/tmp/go-list-lock; do
			sleep 0.25
		done
		trap 'rm -f /tmp/go-list-lock' EXIT
	fi
done

/usr/local/go/bin/go "$@"

@bcmills
Copy link
Contributor

bcmills commented Jun 28, 2019

Here's one via the x/tools tests:
https://build.golang.org/log/bcf9572ff6f39f405e18be4d5f2dfea98e4764ed

@andybons andybons modified the milestones: Go1.13, Go1.14 Jul 8, 2019
@gopherbot
Copy link
Contributor

Change https://golang.org/cl/188157 mentions this issue: cmd/go/internal/cache: avoid ioutil.WriteFile for writing cache entries

@gopherbot
Copy link
Contributor

Change https://golang.org/cl/188020 mentions this issue: cmd/go/internal/cache: verify that timestamp is positive

@gopherbot
Copy link
Contributor

Change https://golang.org/cl/188021 mentions this issue: cmd/go/internal/cache: include detail in errors from (*cache).Get

gopherbot pushed a commit that referenced this issue Jul 30, 2019
An apparent typo caused us to re-check size instead.

Updates #29667

Change-Id: Icc1fb9cef1ba77b8490c8fbe6c343be06ff48d8d
Reviewed-on: https://go-review.googlesource.com/c/go/+/188020
Run-TryBot: Bryan C. Mills <[email protected]>
TryBot-Result: Gobot Gobot <[email protected]>
Reviewed-by: Jay Conrod <[email protected]>
gopherbot pushed a commit that referenced this issue Sep 12, 2019
Since the cache has been required since Go 1.12, also remove redundant
checks for a nil cache.

Updates #29127
Updates #29667

Change-Id: Ibc59b659306a4eef2d4f0e3d0b651986d6cf84ad
Reviewed-on: https://go-review.googlesource.com/c/go/+/188021
Run-TryBot: Bryan C. Mills <[email protected]>
TryBot-Result: Gobot Gobot <[email protected]>
Reviewed-by: Jay Conrod <[email protected]>
@golang golang locked and limited conversation to collaborators Jul 29, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge GoCommand cmd/go NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
None yet
Development

No branches or pull requests