Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add hugepage info to v1 node structure #2304

Merged
merged 4 commits into from
Oct 7, 2019

Conversation

ohsewon
Copy link
Contributor

@ohsewon ohsewon commented Sep 3, 2019

Purpose of this PR: Add the new field which describes the number of pre-allocate hugepages per NUMA node to v1 node structure.

The node structure is used to describe the resources of the NUMA node, inside the MachineInfo structure.
The new field will describe the number of pre-allocate hugepages per NUMA node.
The purpose of the new filed is to put additional information for future usage.
The future usage means guaranteeing alignment of node resources such as CPU, GPU, NIC, and hugepages by Kubelet component like Topology Manager.

Signed-off-by: sewon.oh [email protected]

@googlebot
Copy link
Collaborator

Thanks for your pull request. It looks like this may be your first contribution to a Google open source project (if not, look below for help). Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

📝 Please visit https://cla.developers.google.com/ to sign.

Once you've signed (or fixed any issues), please reply here with @googlebot I signed it!) and we'll verify it.


What to do if you already signed the CLA

Individual signers
Corporate signers

ℹ️ Googlers: Go here for more info.

@k8s-ci-robot
Copy link
Collaborator

Hi @ohsewon. Thanks for your PR.

I'm waiting for a google or kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@bg-chun
Copy link

bg-chun commented Sep 3, 2019

/ok-to-test

@k8s-ci-robot
Copy link
Collaborator

@bg-chun: Cannot trigger testing until a trusted user reviews the PR and leaves an /ok-to-test message.

In response to this:

/ok-to-test

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@ohsewon
Copy link
Contributor Author

ohsewon commented Sep 3, 2019

@googlebot I signed it!

@googlebot
Copy link
Collaborator

CLAs look good, thanks!

ℹ️ Googlers: Go here for more info.

@ohsewon ohsewon changed the title [WIP]Add huge page info per node Add hugepage info to v1 node structure Sep 3, 2019
@ohsewon ohsewon force-pushed the huge_page_info_per_node branch 2 times, most recently from 92b6dfa to d474239 Compare September 3, 2019 05:36
@bg-chun
Copy link

bg-chun commented Sep 3, 2019

This PR is related to the below issue and KEP.
Issue: kubernetes/kubernetes#80716
KEP: kubernetes/enhancements#1199

And also mentioned in the below slack thread.
https://kubernetes.slack.com/archives/C0BP8PW9G/p1567484868027900

@bg-chun
Copy link

bg-chun commented Sep 3, 2019

/assign dashpole

@dashpole
Copy link
Collaborator

dashpole commented Sep 3, 2019

/ok-to-test

@dashpole
Copy link
Collaborator

dashpole commented Sep 3, 2019

Let me know once the KEP is approved, and i'll take a look at this.

@ohsewon ohsewon force-pushed the huge_page_info_per_node branch 3 times, most recently from 65ea93f to d144da2 Compare September 4, 2019 02:54
@bg-chun
Copy link

bg-chun commented Sep 4, 2019

@ohsewon
plz check it out.
It seems that the matter of unittest.

W0904 02:58:03.092] # github.com/google/cadvisor/machine [github.com/google/cadvisor/machine.test]
W0904 02:58:03.093] machine/topology_test.go:63:7: node.HugePagesInfo undefined (type v1.Node has no field or method HugePagesInfo)
W0904 02:58:03.094] machine/topology_test.go:63:35: topology[i].HugePagesInfo undefined (type v1.Node has no field or method HugePagesInfo)
I0904 02:58:04.368] Makefile:38: recipe for target 'vet' failed
W0904 02:58:04.468] make: *** [vet] Error 2

https://storage.googleapis.com/kubernetes-jenkins/pr-logs/pull/google_cadvisor/2304/pull-cadvisor-e2e/1169081552542371840/build-log.txt

@ohsewon ohsewon force-pushed the huge_page_info_per_node branch from d144da2 to 07b678e Compare September 5, 2019 05:48
@ohsewon ohsewon force-pushed the huge_page_info_per_node branch from 07b678e to d138b59 Compare September 5, 2019 06:13
@ohsewon
Copy link
Contributor Author

ohsewon commented Sep 6, 2019

/test pull-cadvisor-e2e

Copy link
Contributor

@odinuge odinuge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some small comments, but the overall implementation looks good to me.

Nice work. 👍

A few tests doesn't hurt, but that may be hard when we read kernel files inside the same functions as the logic tho.

}

hugePagesInfo = append(hugePagesInfo, info.HugePagesInfo{
PageSize: pageSize / 1024, // Convert to kB.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this size already in kB?

A test case would be suitable I guess.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the normal hugepage variant they use the pageSize directly, and since it is parsed from hugepages-<xyz>kB that should be correct.

PageSize: pageSize,

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it seems we should be able to reuse the common logic for how we parse /sys/mm/kernel/hugepages with what we parse here so we dont do it differently. can we consolidate on a single approach and reuse what can be reused?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I agree that we should reuse as much as possible. This is handled together in the Kernel, so imo. it makes perfect sense to just implement this once in cadvisor. The only thing that differ between them is the filepath, and maybe a bit different error handling, but the implementation should be similar.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@odinuge
In the normal hugepage variant they use the pageSize directly, and since it is parsed from hugepages-<xyz>kB that should be correct.
=> It makes sense!

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@derekwaynecarr @odinuge
it seems we should be able to reuse the common logic
can we consolidate on a single approach
Yes, I agree that we should reuse as much as possible.
=> I totally agree with it. :)

IMO, I think the below strategy is the best option for us.

  1. Move GetHugePagesInfo() from info.go to machine.go.
    (It means that we will use existing way to parse hugepage capacity, not a regex.)
  2. Then make it(machine.GetHugePagesInfo()) to take filepath and to parse capacity from a given file for reusability.
    (it will be machine.GetHugePagesInfo(path string))
  3. Now, we can reuse it(machine.GetHugePagesInfo(path string)) in both of info.go and machine.go.
  4. In info.go, machine.GetHugePagesInfo(path string) will be used to parse capacity at the machine level.
  5. in 'machine.go', machine.GetHugePagesInfo(path string) will be used to parse capacity at the NUMA node level.

How about it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed duplicated logic.

return nil, nil
}

for _, file := range files {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason we cannot reuse some of the code from

for _, st := range files {
? The concept is the same here as it is there.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we consolidate how we parse capacity in GetHugePagesInfo in info.go with how we would parse here? I am fine with the regex approach, I just want us to have a common approach.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left the comment for this below.

@@ -191,6 +194,45 @@ func getNodeIdFromCpuBus(cpuBusPath string, threadId int) (int, error) {
return nodeId, nil
}

/* Look for per-node hugepages info using node id */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you switch to normal // comments here to keep the consistency?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is intended for the consistency.
Author(of this PR) and I referred below functions in this file.
Those functions have similar usage(parsing info from the given path) and they have comment style like below.

/* Look for sysfs cpu path containing core_id */ 
/* Such as: sys/bus/cpu/devices/cpu0/topology/core_id */ 

Take look below functions.

func getCoreIdFromCpuBus(cpuBusPath string, threadId int) (int, error) {

func getNodeIdFromCpuBus(cpuBusPath string, threadId int) (int, error) {

Copy link
Collaborator

@derekwaynecarr derekwaynecarr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes total sense to get in cAdvisor, I would like us to just avoid duplicating parsing logic for machine versus node info. Can you update so we have common logic used in both cases and address the serialization token used to map what is used for machine?

Cores []Core `json:"cores"`
Caches []Cache `json:"caches"`
Memory uint64 `json:"memory"`
HugePages []HugePagesInfo `json:"huge_pages"`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be hugepages so its consistent with serialization in MachineInfo

Copy link

@bg-chun bg-chun Oct 3, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed it.

@@ -191,6 +194,45 @@ func getNodeIdFromCpuBus(cpuBusPath string, threadId int) (int, error) {
return nodeId, nil
}

/* Look for per-node hugepages info using node id */
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed.

return nil, nil
}

for _, file := range files {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we consolidate how we parse capacity in GetHugePagesInfo in info.go with how we would parse here? I am fine with the regex approach, I just want us to have a common approach.

}

hugePagesInfo = append(hugePagesInfo, info.HugePagesInfo{
PageSize: pageSize / 1024, // Convert to kB.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it seems we should be able to reuse the common logic for how we parse /sys/mm/kernel/hugepages with what we parse here so we dont do it differently. can we consolidate on a single approach and reuse what can be reused?

@ohsewon ohsewon force-pushed the huge_page_info_per_node branch 2 times, most recently from 64be0d1 to ae53af9 Compare October 4, 2019 05:25
Signed-off-by: sewon.oh <[email protected]>
@ohsewon ohsewon force-pushed the huge_page_info_per_node branch from ae53af9 to cb3a2be Compare October 4, 2019 05:31
Copy link
Contributor

@odinuge odinuge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for updating. Some nit, but otherwise lgtm. It is possible to write some tests (and that would be nice), but i'll defer the decision to the google folks on whether they require it or not.

/lgtm

machine/machine.go Outdated Show resolved Hide resolved
@@ -191,6 +193,45 @@ func getNodeIdFromCpuBus(cpuBusPath string, threadId int) (int, error) {
return nodeId, nil
}

// GetHugePagesInfo returns information about pre-allocated huge pages
func GetHugePagesInfo(hugepagesDirectory string) ([]info.HugePagesInfo, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add a comment about what directories this function wants.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: This makes it easy to write some simple tests.

There are already some tests her: https://github.com/google/cadvisor/blob/master/machine/topology_test.go. The coverage of unit tests is low, but some tests never hurt I guess. 😄

Copy link
Contributor Author

@ohsewon ohsewon Oct 4, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your review. I updated.

Co-Authored-By: Odin Ugedal <[email protected]>
@ohsewon ohsewon force-pushed the huge_page_info_per_node branch from 31199d8 to 455eaa2 Compare October 4, 2019 07:28
@ohsewon
Copy link
Contributor Author

ohsewon commented Oct 4, 2019

/retest

@ohsewon ohsewon force-pushed the huge_page_info_per_node branch from 455eaa2 to 3814c69 Compare October 4, 2019 07:50
@ohsewon ohsewon force-pushed the huge_page_info_per_node branch from 3814c69 to 255729d Compare October 7, 2019 02:15
@derekwaynecarr
Copy link
Collaborator

Thank you for making the update and adding tests!

/lgtm
/approve

@derekwaynecarr
Copy link
Collaborator

@dashpole are you able to help merge this?

@dashpole dashpole merged commit 5d7c0c5 into google:master Oct 7, 2019
@dashpole
Copy link
Collaborator

dashpole commented Oct 7, 2019

Done.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants