Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime/pprof: under-samples all work when runtime creates threads #40963

Closed
rhysh opened this issue Aug 21, 2020 · 2 comments
Closed

runtime/pprof: under-samples all work when runtime creates threads #40963

rhysh opened this issue Aug 21, 2020 · 2 comments

Comments

@rhysh
Copy link
Contributor

rhysh commented Aug 21, 2020

What version of Go are you using (go version)?

$ go1.15 version
go version go1.15 darwin/amd64

$ uname -a
Linux ip-172-31-18-196.us-west-2.compute.internal 4.14.123-111.109.amzn2.x86_64 #1 SMP Mon Jun 10 19:37:57 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

$ cat /boot/config* | grep HIGH_RES_TIMERS
CONFIG_HIGH_RES_TIMERS=y

I'm cross-compiling to GOOS=linux GOARCH=amd64 for the test.

Does this issue reproduce with the latest release?

Yes, this is present in Go 1.15 which is currently the latest release.

What operating system and processor architecture are you using (go env)?

Here's the view from my laptop; the bug is on linux/amd64.

go env Output
$ go1.15 env
GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/Users/rhys/Library/Caches/go-build"
GOENV="/Users/rhys/Library/Application Support/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="darwin"
GOINSECURE=""
GOMODCACHE="/Users/rhys/go/pkg/mod"
GONOPROXY="*"
GONOSUMDB="*"
GOOS="darwin"
GOPATH="/Users/rhys/go"
GOPRIVATE="*"
GOPROXY="direct"
GOROOT="/Users/rhys/go/version/go1.15"
GOSUMDB="off"
GOTMPDIR=""
GOTOOLDIR="/Users/rhys/go/version/go1.15/pkg/tool/darwin_amd64"
GCCGO="gccgo"
AR="ar"
CC="clang"
CXX="clang++"
CGO_ENABLED="1"
GOMOD=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/49/zmds5zsn75z1283vtzxyfr5hj7yjq4/T/go-build016610480=/tmp/go-build -gno-record-gcc-switches -fno-common"

What did you do?

This program does work in a single goroutine and collects a CPU profile of itself. It periodically forces a thread to be created to do a trivial amount of processing in a separate goroutine.

I altered the period between new thread creations to be less than the CPU profiling rate (100 Hz, 10ms), somewhat longer than that period, and much longer than that period.

The code for the test is in the "details" immediately below:

bumps.go
package main

import (
	"context"
	"flag"
	"fmt"
	"log"
	"os"
	"runtime"
	"runtime/pprof"
	"sync"
	"time"
)

func main() {
	profileName := flag.String("profile", "./bumps.pb.gz", "File name for CPU profile")
	tickTime := flag.Duration("tick", 5*time.Millisecond, "How long to work before creating a new thread")
	duration := flag.Duration("duration", 5*time.Second, "Total duration of the test")
	flag.Parse()

	pf, err := os.Create(*profileName)
	if err != nil {
		log.Fatalf("Create; err = %v", err)
	}
	defer func() {
		err := pf.Close()
		if err != nil {
			log.Fatalf("Close; err = %v", err)
		}
	}()
	err = pprof.StartCPUProfile(pf)
	if err != nil {
		log.Fatalf("StartCPUProfile; err = %v", err)
	}
	defer pprof.StopCPUProfile()

	ctx, cancel := context.WithTimeout(context.Background(), *duration)
	defer cancel()

	tick := time.Tick(*tickTime)

	var p int64 = 1
	workUntilTick := func() {
		for {
			select {
			case <-tick:
				return
			default:
			}

			work(&p)
		}
	}

	// All work takes place on goroutine 1. Lock it to the current thread so
	// we're sure the
	runtime.LockOSThread()

	var wg sync.WaitGroup
	for ctx.Err() == nil {
		workUntilTick()

		wg.Add(1)
		go func() {
			defer wg.Done()

			// Have this code run on its own thread, force the runtime to
			// terminate the thread when this goroutine completes (so the next
			// run will be on a different thread).
			runtime.LockOSThread()
			defer runtime.Goexit()

			// Yield for a moment, try to trigger a call to runtime·execute on
			// this thread.
			runtime.Gosched()
		}()
		wg.Wait()
	}

	fmt.Printf("%d\n", p)
}

func work(p *int64) {
	for i := 0; i < 1e5; i++ {
		*p *= 3
	}
}

What did you expect to see?

I expected the profile to show the CPU cost of the program's work. I expected it to show the CPU cost of the work equally well as I varied the "tick" parameter: without regard to how often the program created new OS threads.

What did you see instead?

When the program creates threads infrequently, its CPU profile reports a number of samples (each representing 10ms of work) that corresponds to the OS's view of how much CPU time the program spent.

When the program creates threads every 20ms, the resulting CPU profile has about half as many 10ms samples as I'd expect.

When the program creates threads faster than it earns SIGPROF deliveries (less than 10ms), the CPU profile ends up with almost no samples.

I think the problem is that runtime.execute calls runtime.setThreadCPUProfiler when the current thread's profilehz is different from what the program requests, which leads to a setitimer syscall to adjust the profiling rate for the entire process. The test program creates new threads while the program is being profiled, and because they've never before run Go code their profilehz values are still 0 .. and that the result of the setitimer syscall they make is to reset the kernel's account of how much CPU time the process has spent.

$ time ./bumps -tick=1s
40153154464199553

real	0m5.118s
user	0m4.978s
sys	0m0.036s
$ go tool pprof -top /tmp/bumps.pb.gz 
File: bumps
Type: cpu
Time: Aug 21, 2020 at 12:05pm (PDT)
Duration: 5.11s, Total samples = 4.97s (97.19%)
Showing nodes accounting for 4.92s, 98.99% of 4.97s total
Dropped 14 nodes (cum <= 0.02s)
      flat  flat%   sum%        cum   cum%
     4.92s 98.99% 98.99%      4.93s 99.20%  main.work
         0     0% 98.99%      4.93s 99.20%  main.main
         0     0% 98.99%      4.93s 99.20%  main.main.func2
         0     0% 98.99%      4.93s 99.20%  runtime.main
         0     0% 98.99%      0.04s   0.8%  runtime.mstart
         0     0% 98.99%      0.04s   0.8%  runtime.mstart1
         0     0% 98.99%      0.03s   0.6%  runtime.sysmon

$ time ./bumps -tick=20ms
-5752685429589199231

real	0m5.135s
user	0m5.057s
sys	0m0.008s
$ go tool pprof -top ./bumps.pb.gz 
File: bumps
Type: cpu
Time: Aug 21, 2020 at 12:06pm (PDT)
Duration: 5.13s, Total samples = 2.68s (52.25%)
Showing nodes accounting for 2.66s, 99.25% of 2.68s total
Dropped 8 nodes (cum <= 0.01s)
      flat  flat%   sum%        cum   cum%
     2.65s 98.88% 98.88%      2.65s 98.88%  main.work
     0.01s  0.37% 99.25%      0.02s  0.75%  runtime.findrunnable
         0     0% 99.25%      2.65s 98.88%  main.main
         0     0% 99.25%      2.65s 98.88%  main.main.func2
         0     0% 99.25%      2.65s 98.88%  runtime.main
         0     0% 99.25%      0.02s  0.75%  runtime.mstart
         0     0% 99.25%      0.02s  0.75%  runtime.mstart1
         0     0% 99.25%      0.03s  1.12%  runtime.schedule

$ time ./bumps -tick=3ms
7665148222646782337

real	0m5.007s
user	0m4.977s
sys	0m0.344s
$ go tool pprof -top ./bumps.pb.gz 
File: bumps
Type: cpu
Time: Aug 21, 2020 at 12:08pm (PDT)
Duration: 5s, Total samples = 0 
Showing nodes accounting for 0, 0% of 0 total
      flat  flat%   sum%        cum   cum%
@ianlancetaylor
Copy link
Member

Based on the description, this may be affected by https://golang.org/cl/240003 which was recently submitted. Does tip behave differently than 1.15 here?

@rhysh
Copy link
Contributor Author

rhysh commented Aug 21, 2020

Yes, this works as I'd expect in tip. CL 240003 / commit bd519d0 look like they would be what fixed it. Thanks!

$ go-tip version ./bumps
./bumps: devel +9679b30733 Fri Aug 21 16:52:08 2020 +0000

$ time ./bumps -tick=3ms
-6275116732956957439

real	0m5.128s
user	0m5.149s
sys	0m0.209s

$ go tool pprof -top ./bumps.pb.gz 
File: bumps
Type: cpu
Time: Aug 21, 2020 at 12:56pm (PDT)
Duration: 5.12s, Total samples = 5.20s (101.49%)
Showing nodes accounting for 5.11s, 98.27% of 5.20s total
Dropped 26 nodes (cum <= 0.03s)
      flat  flat%   sum%        cum   cum%
     4.84s 93.08% 93.08%      4.84s 93.08%  main.work
     0.12s  2.31% 95.38%      0.12s  2.31%  runtime.futex
     0.07s  1.35% 96.73%      0.07s  1.35%  runtime.runqgrab
     0.02s  0.38% 97.12%      0.03s  0.58%  runtime.checkTimers
     0.02s  0.38% 97.50%      0.03s  0.58%  runtime.lock2
     0.02s  0.38% 97.88%      0.09s  1.73%  runtime.runqsteal
     0.01s  0.19% 98.08%      0.18s  3.46%  runtime.findrunnable
     0.01s  0.19% 98.27%      0.10s  1.92%  runtime.notesleep
         0     0% 98.27%      4.84s 93.08%  main.main
         0     0% 98.27%      4.84s 93.08%  main.main.func2
         0     0% 98.27%      0.09s  1.73%  runtime.futexsleep
         0     0% 98.27%      0.03s  0.58%  runtime.futexwakeup
         0     0% 98.27%      0.03s  0.58%  runtime.lock (inline)
         0     0% 98.27%      0.03s  0.58%  runtime.lockWithRank (inline)
         0     0% 98.27%      4.84s 93.08%  runtime.main
         0     0% 98.27%      0.11s  2.12%  runtime.mcall
         0     0% 98.27%      0.23s  4.42%  runtime.mstart
         0     0% 98.27%      0.23s  4.42%  runtime.mstart1
         0     0% 98.27%      0.03s  0.58%  runtime.notewakeup
         0     0% 98.27%      0.10s  1.92%  runtime.park_m
         0     0% 98.27%      0.33s  6.35%  runtime.schedule
         0     0% 98.27%      0.04s  0.77%  runtime.startlockedm
         0     0% 98.27%      0.10s  1.92%  runtime.stoplockedm
         0     0% 98.27%      0.03s  0.58%  runtime.stopm

@rhysh rhysh closed this as completed Aug 21, 2020
@golang golang locked and limited conversation to collaborators Aug 21, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants