Skip to content

Commit

Permalink
status: account for gomaxprocs in cpu utilization
Browse files Browse the repository at this point in the history
When a CRDB process had GOMAXPROCS set lower than the number of CPUs
available, the normalized CPU utilization metric would only account for
the number of processors. However, the process could never use more than
GOMAXPROCS processors in parallel, capping CPU capacity. As a result,
the normalized CPU utilization would be under reported.

e.g. When the number of CPUs available is 4, GOMAXPROCS is 2 and the
usage is 1, utilization would be reported as 25%, whilst the real
utilized capacity is 50%.

Update the normalized CPU calculation to take the GOMAXPROCS into
account, using the smallest capacity for utilization calculation. This
affects the `sys.cpu.combined.percent-normalized` metric.

Fixes: cockroachdb#101633
Fixes: cockroachdb#103472

Release note (bug fix): `sys.cpu.combined.percent-normalized` now
uses `GOMAXPROCS`, if lower than the number of CPU shares when
calculating CPU utilization.
  • Loading branch information
kvoli committed Jun 7, 2023
1 parent eb2447d commit 9a821c8
Showing 1 changed file with 25 additions and 3 deletions.
28 changes: 25 additions & 3 deletions pkg/server/status/runtime.go
Original file line number Diff line number Diff line change
Expand Up @@ -457,8 +457,10 @@ func (rsr *RuntimeStatSampler) SampleEnvironment(
if err != nil {
log.Ops.Errorf(ctx, "unable to get cpu usage: %v", err)
}
cgroupCPU, _ := cgroups.GetCgroupCPU()
cpuShare := cgroupCPU.CPUShares()
cpuCapacity, err := getCPUCapacity()
if err != nil {
log.Ops.Errorf(ctx, "unable to get CPU capacity: %v", err)
}

fds := gosigar.ProcFDUsage{}
if err := fds.Get(pid); err != nil {
Expand Down Expand Up @@ -519,7 +521,7 @@ func (rsr *RuntimeStatSampler) SampleEnvironment(
stime := sysTimeMillis * 1e6
urate := float64(utime-rsr.last.utime) / dur
srate := float64(stime-rsr.last.stime) / dur
combinedNormalizedPerc := (srate + urate) / cpuShare
combinedNormalizedPerc := (srate + urate) / cpuCapacity
gcPauseRatio := float64(uint64(gc.PauseTotal)-rsr.last.gcPauseTime) / dur
runnableSum := goschedstats.CumulativeNormalizedRunnableGoroutines()
// The number of runnable goroutines per CPU is a count, but it can vary
Expand Down Expand Up @@ -712,3 +714,23 @@ func GetCPUTime(ctx context.Context) (userTimeMillis, sysTimeMillis int64, err e
}
return int64(cpuTime.User), int64(cpuTime.Sys), nil
}

// getCPUCapacity returns the number of logical CPU processors available for
// use by the process. The capacity accounts for cgroup constraints, GOMAXPROCS
// and the number of host processors.
func getCPUCapacity() (float64, error) {
numProcs := float64(runtime.GOMAXPROCS(0 /* read only */))
cgroupCPU, err := cgroups.GetCgroupCPU()
if err != nil {
// Return the GOMAXPROCS value if unable to read the cgroup settings, in
// practice this is not likely to occur.
return numProcs, err
}
cpuShare := cgroupCPU.CPUShares()
// Take the minimum of the CPU shares and the GOMAXPROCS value. The most CPU
// the process could use is the lesser of the two.
if cpuShare > numProcs {
return numProcs, nil
}
return cpuShare, nil
}

0 comments on commit 9a821c8

Please sign in to comment.