Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update hugepages KEP for 1.18 rel #1540

Merged
merged 4 commits into from
Feb 7, 2020
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
37 changes: 36 additions & 1 deletion keps/sig-node/20190129-hugepages.md
Original file line number Diff line number Diff line change
Expand Up @@ -44,10 +44,14 @@ superseded-by:
- [Huge pages as shared memory](#huge-pages-as-shared-memory)
- [NUMA](#numa)
- [Graduation Criteria](#graduation-criteria)
- [Test Plan](#test-plan)
- [Implementation History](#implementation-history)
- [Version 1.8](#version-18)
- [Version 1.9](#version-19)
- [Version 1.14](#version-114)
- [Version 1.18](#version-118)
- [Version 1.19[TBD]](#version-119tbd)
- [Release Signoff Checklist](#release-signoff-checklist)
<!-- /toc -->

## Summary
Expand Down Expand Up @@ -552,6 +556,18 @@ locality guarantees as a feature of QoS. In particular, pods in the
- E2E testing validating its usage.
-- https://k8s-testgrid.appspot.com/sig-node-kubelet#node-kubelet-serial&include-filter-by-regex=Feature%3AHugePages

## Test Plan

- A test plan will consist of the following tests
- Unit tests
- Each unit test for enhancement will be implemented.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you already have unit tests in place or will these need to be built ?
If you do have unit tests in place can you give us an idea of what they test?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already have unit tests.
Here are major PRs for changes and all include unittests.
It's hard to explain how unittests works just right now.
(I also need a time to take a look, I reviewed unittest of the first PR alost months ago...)

  1. Implement support for setting hugepages limit on container cgroup sandbox. kubernetes#84154 //merged
  2. Add support for pre-allocated hugepages with 2+ sizes kubernetes#82820 //merged
  3. Implement support for multiple sizes huge pages kubernetes#84051 //derek will merge after got approval of this PR from sig-rel

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Container Isolation of hugepages
    A unit test validates size(in a string) whether it has been transrated to right format e.g., 1024(pod spec) ->1KB(CRI)
    Also, another unit test validates transformation results from v1 resource list to the list of CRI data structures.

  2. Multiple sizes of hugepage support
    A unit test checks node has the right amount of hugepage resources for each hugepage size as given.
    Other unit test requests multiple sizes of hugeapges for a container, then checks enabled number of hugepages sizes on a container.
    Also, another unit test validates whether requested medium HugePages-<hugepagesize> has the right format.

- E2E tests
- There is a test suit for huge pages in e2e test, it will be extended to validate enhancements.
- here: https://github.com/kubernetes/kubernetes/blob/master/test/e2e_node/hugepages_test.go
Comment on lines +565 to +566
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How will it be extended?
How much work do you think will have to be done to get those e2e test to validate the changes in this kep?
Are there any open PRs?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My team member opened PR to update e2etest only for container isolation of hugepages.
kubernetes/kubernetes#87118
(In the PR, we added a new test case on hugepages test suit.)

So, I think another PR should be opened for supporting multiple size of hugepages by these guys(@bart0sh @odinuge)

That(need for another PR) is the reason why I added e2etest ont the future plan:

### Version 1.19[TBD]

Extending of huge pages test suit of E2E tests and cri-tools for enhancements after GA.

Copy link
Member Author

@bg-chun bg-chun Feb 6, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kubernetes/kubernetes#87118
In the e2e test, a new test case sets a hugepage limit before creating a container, and then when container created, it checks container has the limit as given.
(test case takes given hugepages limit numbers from CRI runtime logs by using regexp. It is the same mechanism as to how CPU Manager e2e test works.)

- cri-tools
- Test case will be added to cri-tools to be used in CRI runtime' test(CI).
- here: https://github.com/kubernetes-sigs/cri-tools
Comment on lines +568 to +569
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How much work will this be?
How much do you think needs to be built to test this kep?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This works will be done in the almost same manner of extending e2e-node testsuit of hugepages.
In the cri-tools, It already has testsuit for hugepages.
So, I will add additional test cases on the test suite for both of enhancements.

The mechanism will be:

  1. build container cgroup configuration with hugepages config.
    (Now hugepages config is set on only pod config with only one size of hugepages)
  2. send send container cgroup configuration to CRI runtimes
  3. get container creation result from CRI runtime.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it can be done in 1.19

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for more detail: containerd project uses cri-tools in it's CI.


## Implementation History

### Version 1.8
Expand All @@ -565,4 +581,23 @@ Beta support for huge pages
### Version 1.14

GA support for huge pages proposed based on feedback from user community
using the feature without issue.
using the feature without issue.

### Version 1.18

Extending of huge pages feature to support container isolation of huge pages and multiple sizes of huge pages.

### Version 1.19[TBD]

Extending of huge pages test suit of E2E tests and cri-tools for enhancements after GA.

## Release Signoff Checklist
- \[x] kubernetes/enhancements issue in release milestone, which links to KEP (this should be a link to the KEP location in kubernetes/enhancements, not the initial KEP PR)
- \[ ] KEP approvers have set the KEP status to `implementable`
- The KEP is already implemented/GA.
- \[x] Design details are appropriately documented
- \[x] Test plan is in place, giving consideration to SIG Architecture and SIG Testing input
- \[x] Graduation criteria is in place
- \[x] "Implementation History" section is up-to-date for milestone
- \[x] User-facing documentation has been created in [kubernetes/website], for publication to [kubernetes.io]
- \[x] Supporting documentation e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes