-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
util, server: Add GetCgroupCPU, use it to reflect cpu usage #56461
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 3 of 3 files at r1.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner and @itsbilal)
pkg/server/status/runtime.go, line 445 at r1 (raw file):
} cpuShare := float64(runtime.NumCPU()) cgroupCpu, err := cgroups.GetCgroupCPU()
developer UX nit: would be clearer if this code (and any code using the new cgroups
) package did not have to interact with Go's runtime
at all
Ideally, I'd see the number of CPUs as a field in the returned value here, and maybe have the computation below as methods on this struct.
pkg/util/cgroups/cgroups.go, line 76 at r1 (raw file):
func cgroupFileToUint64(filepath, desc string) (uint64, error) { f, err := os.Open(filepath)
you can share the open + read code between the various functions. Makes the whole easier to read and maintain.
pkg/util/cgroups/cgroups.go, line 81 at r1 (raw file):
} defer func() { _ = f.Close()
nit - FYI: err = errors.CombineErrors(err, f.Close())
(assuming err
is named in the return argument list and you don't shadow it in the body)
ditto below
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's one nuance here which I think is worth calling out. Sometimes people will set cpu requests but not limits. Nothing requires limits be set, just requests. In fact, I suspect we, when running free tier, will do just this as it allows for bursting when there are excess resources. My understanding is that this is configured with cpu.shares
. I suspect that we'll need to read that value too and incorporate it somehow. I think we may want to ultimately be setting GOMAXPROCS to max(truncate_to_integer_core(min(cpu.shares, cpu.max)), 1)
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner and @itsbilal)
53ec764
to
9e60a0f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ajwerner So the passed in cpu "shares" are already accounted for in this implementation, as long as the v1 controller is mounted. The cfs_{quota,period}_us
variables are adjusted to match the shares. I passed in a cpu shares value and confirmed that this was the case.
On V2 it gets a bit trickier. The spec allows for both weight-based and limit-based resource, and it seems like there's no short cut here, we need to read both and make a conclusion there. We already read cpu.max
in the existing code, but we don't read cpu.weight
. And it looks like cpu.weight
must need to be interpreted as numcpu / (cpu.weight / 100)
.
I guess that means the two big TODOs in this PR are 1) account for v2 cpu.weight
, and 2) call runtime.GOMAXPROCS(numCpuAdjustedByShares)
at start.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @knz)
pkg/server/status/runtime.go, line 445 at r1 (raw file):
Previously, knz (kena) wrote…
developer UX nit: would be clearer if this code (and any code using the new
cgroups
) package did not have to interact with Go'sruntime
at allIdeally, I'd see the number of CPUs as a field in the returned value here, and maybe have the computation below as methods on this struct.
Done.
pkg/util/cgroups/cgroups.go, line 76 at r1 (raw file):
Previously, knz (kena) wrote…
you can share the open + read code between the various functions. Makes the whole easier to read and maintain.
Done.
pkg/util/cgroups/cgroups.go, line 81 at r1 (raw file):
Previously, knz (kena) wrote…
nit - FYI:
err = errors.CombineErrors(err, f.Close())
(assuming
err
is named in the return argument list and you don't shadow it in the body)ditto below
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but CI is trying to tell you something.
Reviewed 4 of 4 files at r2.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @itsbilal)
pkg/server/status/runtime.go, line 445 at r1 (raw file):
Previously, itsbilal (Bilal Akhtar) wrote…
Done.
How do you feel about a .cpuShare()
method on the value returned by GetCgroupCPU()
?
9e60a0f
to
50239c1
Compare
Enhances the util/cgroups package to also support getting CPU limits from the process' current cgroup. The cpu limit / share for the current cgroup is denoted in two separate variables: the cpu period, and the cpu quota. When (quota / period) = numCPUs, this cgroup has access to all the CPUs on the system. This PR also updates SampleEnvironment to call this method and adjust the cpu usage % accordingly. Release note (bug fix): Improve accuracy of reported CPU usage when running in containers.
50239c1
to
3a5c37c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TFTR!
I'll address the two above TODOs in a follow-on PR. This is enough of an improvement to get us started for now.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @knz)
pkg/server/status/runtime.go, line 445 at r1 (raw file):
Previously, knz (kena) wrote…
How do you feel about a
.cpuShare()
method on the value returned byGetCgroupCPU()
?
Good point. Added that!
bors r=knz |
Build succeeded: |
Enhances the util/cgroups package to also support getting
CPU limits from the process' current cgroup. The cpu limit / share
for the current cgroup is denoted in two separate variables:
the cpu period, and the cpu quota. When (quota / period) = numCPUs,
this cgroup has access to all the CPUs on the system.
This PR also updates SampleEnvironment to call this method
and adjust the cpu usage % accordingly.
Release note (bug fix): Improve accuracy of reported CPU usage
when running in containers.