Add hugepage info to v1 node structure #2304

ohsewon · 2019-09-03T04:53:13Z

Purpose of this PR: Add the new field which describes the number of pre-allocate hugepages per NUMA node to v1 node structure.

The node structure is used to describe the resources of the NUMA node, inside the MachineInfo structure.
The new field will describe the number of pre-allocate hugepages per NUMA node.
The purpose of the new filed is to put additional information for future usage.
The future usage means guaranteeing alignment of node resources such as CPU, GPU, NIC, and hugepages by Kubelet component like Topology Manager.

Signed-off-by: sewon.oh [email protected]

googlebot · 2019-09-03T04:53:16Z

Thanks for your pull request. It looks like this may be your first contribution to a Google open source project (if not, look below for help). Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

📝 Please visit https://cla.developers.google.com/ to sign.

Once you've signed (or fixed any issues), please reply here with @googlebot I signed it!) and we'll verify it.

What to do if you already signed the CLA

Individual signers

It's possible we don't have your GitHub username or you're using a different email address on your commit. Check your existing CLA data and verify that your email is set on your git commits.

Corporate signers

Your company has a Point of Contact who decides which employees are authorized to participate. Ask your POC to be added to the group of authorized contributors. If you don't know who your Point of Contact is, direct the Google project maintainer to go/cla#troubleshoot (Public version).
The email used to register you as an authorized contributor must be the email used for the Git commit. Check your existing CLA data and verify that your email is set on your git commits.
The email used to register you as an authorized contributor must also be attached to your GitHub account.

ℹ️ Googlers: Go here for more info.

k8s-ci-robot · 2019-09-03T04:53:33Z

Hi @ohsewon. Thanks for your PR.

I'm waiting for a google or kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

bg-chun · 2019-09-03T04:54:00Z

/ok-to-test

k8s-ci-robot · 2019-09-03T04:54:39Z

@bg-chun: Cannot trigger testing until a trusted user reviews the PR and leaves an /ok-to-test message.

In response to this:

/ok-to-test

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

ohsewon · 2019-09-03T04:55:47Z

@googlebot I signed it!

googlebot · 2019-09-03T04:55:50Z

CLAs look good, thanks!

ℹ️ Googlers: Go here for more info.

bg-chun · 2019-09-03T07:15:33Z

This PR is related to the below issue and KEP.
Issue: kubernetes/kubernetes#80716
KEP: kubernetes/enhancements#1199

And also mentioned in the below slack thread.
https://kubernetes.slack.com/archives/C0BP8PW9G/p1567484868027900

bg-chun · 2019-09-03T07:16:08Z

/assign dashpole

dashpole · 2019-09-03T18:03:30Z

/ok-to-test

dashpole · 2019-09-03T18:05:09Z

Let me know once the KEP is approved, and i'll take a look at this.

bg-chun · 2019-09-04T15:51:23Z

@ohsewon
plz check it out.
It seems that the matter of unittest.

W0904 02:58:03.092] # github.com/google/cadvisor/machine [github.com/google/cadvisor/machine.test]
W0904 02:58:03.093] machine/topology_test.go:63:7: node.HugePagesInfo undefined (type v1.Node has no field or method HugePagesInfo)
W0904 02:58:03.094] machine/topology_test.go:63:35: topology[i].HugePagesInfo undefined (type v1.Node has no field or method HugePagesInfo)
I0904 02:58:04.368] Makefile:38: recipe for target 'vet' failed
W0904 02:58:04.468] make: *** [vet] Error 2

https://storage.googleapis.com/kubernetes-jenkins/pr-logs/pull/google_cadvisor/2304/pull-cadvisor-e2e/1169081552542371840/build-log.txt

Signed-off-by: sewon.oh <[email protected]>

ohsewon · 2019-09-06T02:01:17Z

/test pull-cadvisor-e2e

odinuge

Just some small comments, but the overall implementation looks good to me.

Nice work. 👍

A few tests doesn't hurt, but that may be hard when we read kernel files inside the same functions as the logic tho.

odinuge · 2019-10-02T08:43:18Z

machine/machine.go

+		}
+
+		hugePagesInfo = append(hugePagesInfo, info.HugePagesInfo{
+			PageSize: pageSize / 1024, // Convert to kB.


Isn't this size already in kB?

A test case would be suitable I guess.

In the normal hugepage variant they use the pageSize directly, and since it is parsed from hugepages-<xyz>kB that should be correct.

cadvisor/machine/info.go

Line 90 in f5e7ddf

PageSize: pageSize,

it seems we should be able to reuse the common logic for how we parse /sys/mm/kernel/hugepages with what we parse here so we dont do it differently. can we consolidate on a single approach and reuse what can be reused?

Yes, I agree that we should reuse as much as possible. This is handled together in the Kernel, so imo. it makes perfect sense to just implement this once in cadvisor. The only thing that differ between them is the filepath, and maybe a bit different error handling, but the implementation should be similar.

@odinuge
In the normal hugepage variant they use the pageSize directly, and since it is parsed from hugepages-<xyz>kB that should be correct.
=> It makes sense!

@derekwaynecarr @odinuge
it seems we should be able to reuse the common logic
can we consolidate on a single approach
Yes, I agree that we should reuse as much as possible.
=> I totally agree with it. :)

IMO, I think the below strategy is the best option for us.

Move GetHugePagesInfo() from info.go to machine.go.
(It means that we will use existing way to parse hugepage capacity, not a regex.)

Then make it(machine.GetHugePagesInfo()) to take filepath and to parse capacity from a given file for reusability.
(it will be machine.GetHugePagesInfo(path string))

Now, we can reuse it(machine.GetHugePagesInfo(path string)) in both of info.go and machine.go.

In info.go, machine.GetHugePagesInfo(path string) will be used to parse capacity at the machine level.

in 'machine.go', machine.GetHugePagesInfo(path string) will be used to parse capacity at the NUMA node level.

How about it?

I removed duplicated logic.

odinuge · 2019-10-02T08:44:53Z

machine/machine.go

+		return nil, nil
+	}
+
+	for _, file := range files {


Is there a reason we cannot reuse some of the code from

cadvisor/machine/info.go

Line 66 in f5e7ddf

for _, st := range files {

? The concept is the same here as it is there.

can we consolidate how we parse capacity in GetHugePagesInfo in info.go with how we would parse here? I am fine with the regex approach, I just want us to have a common approach.

I left the comment for this below.

odinuge · 2019-10-02T08:46:54Z

machine/machine.go

@@ -191,6 +194,45 @@ func getNodeIdFromCpuBus(cpuBusPath string, threadId int) (int, error) {
 	return nodeId, nil
 }

+/* Look for per-node hugepages info using node id */


Can you switch to normal // comments here to keep the consistency?

It is intended for the consistency.
Author(of this PR) and I referred below functions in this file.
Those functions have similar usage(parsing info from the given path) and they have comment style like below.

/* Look for sysfs cpu path containing core_id */ /* Such as: sys/bus/cpu/devices/cpu0/topology/core_id */

Take look below functions.

cadvisor/machine/machine.go

Line 135 in f5e7ddf

func getCoreIdFromCpuBus(cpuBusPath string, threadId int) (int, error) {

cadvisor/machine/machine.go

Line 159 in f5e7ddf

func getNodeIdFromCpuBus(cpuBusPath string, threadId int) (int, error) {

derekwaynecarr

This makes total sense to get in cAdvisor, I would like us to just avoid duplicating parsing logic for machine versus node info. Can you update so we have common logic used in both cases and address the serialization token used to map what is used for machine?

derekwaynecarr · 2019-10-02T14:34:23Z

info/v1/machine.go

-	Cores  []Core  `json:"cores"`
-	Caches []Cache `json:"caches"`
+	Memory    uint64          `json:"memory"`
+	HugePages []HugePagesInfo `json:"huge_pages"`


this should be hugepages so its consistent with serialization in MachineInfo

derekwaynecarr · 2019-10-02T14:37:00Z

machine/machine.go

@@ -191,6 +194,45 @@ func getNodeIdFromCpuBus(cpuBusPath string, threadId int) (int, error) {
 	return nodeId, nil
 }

+/* Look for per-node hugepages info using node id */


derekwaynecarr · 2019-10-02T14:39:57Z

machine/machine.go

+		return nil, nil
+	}
+
+	for _, file := range files {


can we consolidate how we parse capacity in GetHugePagesInfo in info.go with how we would parse here? I am fine with the regex approach, I just want us to have a common approach.

derekwaynecarr · 2019-10-02T14:45:49Z

machine/machine.go

+		}
+
+		hugePagesInfo = append(hugePagesInfo, info.HugePagesInfo{
+			PageSize: pageSize / 1024, // Convert to kB.


it seems we should be able to reuse the common logic for how we parse /sys/mm/kernel/hugepages with what we parse here so we dont do it differently. can we consolidate on a single approach and reuse what can be reused?

Signed-off-by: sewon.oh <[email protected]>

odinuge

Thanks for updating. Some nit, but otherwise lgtm. It is possible to write some tests (and that would be nice), but i'll defer the decision to the google folks on whether they require it or not.

/lgtm

machine/machine.go

odinuge · 2019-10-04T06:11:58Z

machine/machine.go

@@ -191,6 +193,45 @@ func getNodeIdFromCpuBus(cpuBusPath string, threadId int) (int, error) {
 	return nodeId, nil
 }

+// GetHugePagesInfo returns information about pre-allocated huge pages
+func GetHugePagesInfo(hugepagesDirectory string) ([]info.HugePagesInfo, error) {


Maybe add a comment about what directories this function wants.

nit: This makes it easy to write some simple tests.

There are already some tests her: https://github.com/google/cadvisor/blob/master/machine/topology_test.go. The coverage of unit tests is low, but some tests never hurt I guess. 😄

Thanks for your review. I updated.

Co-Authored-By: Odin Ugedal <[email protected]>

ohsewon · 2019-10-04T07:38:37Z

/retest

Signed-off-by: sewon.oh <[email protected]>

derekwaynecarr · 2019-10-07T18:03:28Z

Thank you for making the update and adding tests!

/lgtm
/approve

derekwaynecarr · 2019-10-07T18:05:41Z

@dashpole are you able to help merge this?

dashpole · 2019-10-07T18:09:35Z

Done.

k8s-ci-robot added the needs-ok-to-test label Sep 3, 2019

ohsewon changed the title ~~[WIP]Add huge page info per node~~ Add hugepage info to v1 node structure Sep 3, 2019

ohsewon force-pushed the huge_page_info_per_node branch 2 times, most recently from 92b6dfa to d474239 Compare September 3, 2019 05:36

bg-chun mentioned this pull request Sep 3, 2019

Update huge pages KEP for container isolation of huge pages kubernetes/enhancements#1199

Merged

k8s-ci-robot added ok-to-test and removed needs-ok-to-test labels Sep 3, 2019

ohsewon force-pushed the huge_page_info_per_node branch 3 times, most recently from 65ea93f to d144da2 Compare September 4, 2019 02:54

ohsewon force-pushed the huge_page_info_per_node branch from d144da2 to 07b678e Compare September 5, 2019 05:48

Add hugepage info to v1 node structure

d138b59

Signed-off-by: sewon.oh <[email protected]>

ohsewon force-pushed the huge_page_info_per_node branch from 07b678e to d138b59 Compare September 5, 2019 06:13

bg-chun mentioned this pull request Sep 18, 2019

[WIP]Update huge pages KEP for NUMA support of huge pages kubernetes/enhancements#1245

Closed

odinuge reviewed Oct 2, 2019

View reviewed changes

derekwaynecarr requested changes Oct 2, 2019

View reviewed changes

ohsewon force-pushed the huge_page_info_per_node branch 2 times, most recently from 64be0d1 to ae53af9 Compare October 4, 2019 05:25

Remove duplicated logic

cb3a2be

Signed-off-by: sewon.oh <[email protected]>

ohsewon force-pushed the huge_page_info_per_node branch from ae53af9 to cb3a2be Compare October 4, 2019 05:31

odinuge approved these changes Oct 4, 2019

View reviewed changes

Update machine/machine.go

95453fe

Co-Authored-By: Odin Ugedal <[email protected]>

ohsewon force-pushed the huge_page_info_per_node branch from 31199d8 to 455eaa2 Compare October 4, 2019 07:28

ohsewon force-pushed the huge_page_info_per_node branch from 455eaa2 to 3814c69 Compare October 4, 2019 07:50

Add comments and unit tests for GetHugePagesInfo()

255729d

Signed-off-by: sewon.oh <[email protected]>

ohsewon force-pushed the huge_page_info_per_node branch from 3814c69 to 255729d Compare October 7, 2019 02:15

derekwaynecarr approved these changes Oct 7, 2019

View reviewed changes

dashpole merged commit 5d7c0c5 into google:master Oct 7, 2019

Add hugepage info to v1 node structure #2304

Add hugepage info to v1 node structure #2304

Conversation

ohsewon commented Sep 3, 2019 • edited Loading

googlebot commented Sep 3, 2019

What to do if you already signed the CLA

Individual signers

Corporate signers

k8s-ci-robot commented Sep 3, 2019

bg-chun commented Sep 3, 2019

k8s-ci-robot commented Sep 3, 2019

ohsewon commented Sep 3, 2019

googlebot commented Sep 3, 2019

bg-chun commented Sep 3, 2019

bg-chun commented Sep 3, 2019

dashpole commented Sep 3, 2019

dashpole commented Sep 3, 2019

bg-chun commented Sep 4, 2019 • edited Loading

ohsewon commented Sep 6, 2019

odinuge left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

derekwaynecarr left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bg-chun Oct 3, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

odinuge left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ohsewon Oct 4, 2019 • edited Loading

Choose a reason for hiding this comment

ohsewon commented Oct 4, 2019

derekwaynecarr commented Oct 7, 2019

derekwaynecarr commented Oct 7, 2019

dashpole commented Oct 7, 2019

ohsewon commented Sep 3, 2019 •

edited

Loading

bg-chun commented Sep 4, 2019 •

edited

Loading

bg-chun Oct 3, 2019 •

edited

Loading

ohsewon Oct 4, 2019 •

edited

Loading