Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

util, server: Add GetCgroupCPU, use it to reflect cpu usage #56461

Merged
merged 1 commit into from
Nov 17, 2020

Conversation

itsbilal
Copy link
Contributor

@itsbilal itsbilal commented Nov 9, 2020

Enhances the util/cgroups package to also support getting
CPU limits from the process' current cgroup. The cpu limit / share
for the current cgroup is denoted in two separate variables:
the cpu period, and the cpu quota. When (quota / period) = numCPUs,
this cgroup has access to all the CPUs on the system.

This PR also updates SampleEnvironment to call this method
and adjust the cpu usage % accordingly.

Release note (bug fix): Improve accuracy of reported CPU usage
when running in containers.

@itsbilal itsbilal requested review from knz and ajwerner November 9, 2020 21:28
@itsbilal itsbilal requested a review from a team as a code owner November 9, 2020 21:28
@itsbilal itsbilal self-assigned this Nov 9, 2020
@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Contributor

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 3 of 3 files at r1.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner and @itsbilal)


pkg/server/status/runtime.go, line 445 at r1 (raw file):

	}
	cpuShare := float64(runtime.NumCPU())
	cgroupCpu, err := cgroups.GetCgroupCPU()

developer UX nit: would be clearer if this code (and any code using the new cgroups) package did not have to interact with Go's runtime at all

Ideally, I'd see the number of CPUs as a field in the returned value here, and maybe have the computation below as methods on this struct.


pkg/util/cgroups/cgroups.go, line 76 at r1 (raw file):

func cgroupFileToUint64(filepath, desc string) (uint64, error) {
	f, err := os.Open(filepath)

you can share the open + read code between the various functions. Makes the whole easier to read and maintain.


pkg/util/cgroups/cgroups.go, line 81 at r1 (raw file):

	}
	defer func() {
		_ = f.Close()

nit - FYI: err = errors.CombineErrors(err, f.Close())

(assuming err is named in the return argument list and you don't shadow it in the body)

ditto below

Copy link
Contributor

@ajwerner ajwerner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's one nuance here which I think is worth calling out. Sometimes people will set cpu requests but not limits. Nothing requires limits be set, just requests. In fact, I suspect we, when running free tier, will do just this as it allows for bursting when there are excess resources. My understanding is that this is configured with cpu.shares. I suspect that we'll need to read that value too and incorporate it somehow. I think we may want to ultimately be setting GOMAXPROCS to max(truncate_to_integer_core(min(cpu.shares, cpu.max)), 1)

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner and @itsbilal)

@itsbilal itsbilal force-pushed the cgroup-cpu-calculation branch from 53ec764 to 9e60a0f Compare November 12, 2020 21:54
Copy link
Contributor Author

@itsbilal itsbilal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ajwerner So the passed in cpu "shares" are already accounted for in this implementation, as long as the v1 controller is mounted. The cfs_{quota,period}_us variables are adjusted to match the shares. I passed in a cpu shares value and confirmed that this was the case.

On V2 it gets a bit trickier. The spec allows for both weight-based and limit-based resource, and it seems like there's no short cut here, we need to read both and make a conclusion there. We already read cpu.max in the existing code, but we don't read cpu.weight. And it looks like cpu.weight must need to be interpreted as numcpu / (cpu.weight / 100).

I guess that means the two big TODOs in this PR are 1) account for v2 cpu.weight, and 2) call runtime.GOMAXPROCS(numCpuAdjustedByShares) at start.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @knz)


pkg/server/status/runtime.go, line 445 at r1 (raw file):

Previously, knz (kena) wrote…

developer UX nit: would be clearer if this code (and any code using the new cgroups) package did not have to interact with Go's runtime at all

Ideally, I'd see the number of CPUs as a field in the returned value here, and maybe have the computation below as methods on this struct.

Done.


pkg/util/cgroups/cgroups.go, line 76 at r1 (raw file):

Previously, knz (kena) wrote…

you can share the open + read code between the various functions. Makes the whole easier to read and maintain.

Done.


pkg/util/cgroups/cgroups.go, line 81 at r1 (raw file):

Previously, knz (kena) wrote…

nit - FYI: err = errors.CombineErrors(err, f.Close())

(assuming err is named in the return argument list and you don't shadow it in the body)

ditto below

Done.

Copy link
Contributor

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: but CI is trying to tell you something.

Reviewed 4 of 4 files at r2.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @itsbilal)


pkg/server/status/runtime.go, line 445 at r1 (raw file):

Previously, itsbilal (Bilal Akhtar) wrote…

Done.

How do you feel about a .cpuShare() method on the value returned by GetCgroupCPU()?

@itsbilal itsbilal force-pushed the cgroup-cpu-calculation branch from 9e60a0f to 50239c1 Compare November 17, 2020 15:57
Enhances the util/cgroups package to also support getting
CPU limits from the process' current cgroup. The cpu limit / share
for the current cgroup is denoted in two separate variables:
the cpu period, and the cpu quota. When (quota / period) = numCPUs,
this cgroup has access to all the CPUs on the system.

This PR also updates SampleEnvironment to call this method
and adjust the cpu usage % accordingly.

Release note (bug fix): Improve accuracy of reported CPU usage
when running in containers.
@itsbilal itsbilal force-pushed the cgroup-cpu-calculation branch from 50239c1 to 3a5c37c Compare November 17, 2020 17:59
Copy link
Contributor Author

@itsbilal itsbilal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTR!

I'll address the two above TODOs in a follow-on PR. This is enough of an improvement to get us started for now.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @knz)


pkg/server/status/runtime.go, line 445 at r1 (raw file):

Previously, knz (kena) wrote…

How do you feel about a .cpuShare() method on the value returned by GetCgroupCPU()?

Good point. Added that!

@itsbilal
Copy link
Contributor Author

bors r=knz

@craig
Copy link
Contributor

craig bot commented Nov 17, 2020

Build succeeded:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants