Skip to content

Commit

Permalink
update existing hugepages kep
Browse files Browse the repository at this point in the history
Put Phase 2 and 3 for hugepage KEP
  • Loading branch information
bg-chun authored Aug 5, 2019
1 parent 59798cf commit aa43ea9
Showing 1 changed file with 62 additions and 29 deletions.
91 changes: 62 additions & 29 deletions keps/sig-node/20190129-hugepages.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,22 +32,29 @@ superseded-by:
- [Proposal](#proposal)
- [User Stories [optional]](#user-stories-optional)
- [Implementation Details/Notes/Constraints [optional]](#implementation-detailsnotesconstraints-optional)
- [Feature Gate](#feature-gate)
- [Node Specification](#node-specification)
- [Pod Specification](#pod-specification)
- [CRI Updates](#cri-updates)
- [Cgroup Enforcement](#cgroup-enforcement)
- [Limits and Quota](#limits-and-quota)
- [Scheduler changes](#scheduler-changes)
- [cAdvisor changes](#cadvisor-changes)
- [Phase 1](#Phase-1)
- [Feature Gate](#feature-gate)
- [Node Specification](#node-specification)
- [Pod Specification](#pod-specification)
- [CRI Updates](#cri-updates)
- [Cgroup Enforcement](#cgroup-enforcement)
- [Limits and Quota](#limits-and-quota)
- [Scheduler changes](#scheduler-changes)
- [cAdvisor changes(Phase 1)](#cadvisor-changes(Phase-1))
- [Phase 2](#Phase-2)
- [Support container isolation of huge pages](#support-container-isolation-of-huge-pages)
- [Support reserve huge pages for system on Node Allocatable feature](#support-reserve-huge-pages-for-system-on-node-allocatable-feature)
- [cAdviser changes(Phase 2)](#cAdviser-changes(Phase-2))
- [Phase 3](#Phase-3)
- [Update LinuxContainerResources(CRI) to support specifying huge page limits](#update-linuxcontainerresources(cri)-to-support-specifying-huge-page-limits)
- [Risks and Mitigations](#risks-and-mitigations)
- [Huge pages as shared memory](#huge-pages-as-shared-memory)
- [NUMA](#numa)
- [Graduation Criteria](#graduation-criteria)
- [Implementation History](#implementation-history)
- [Version 1.8](#version-18)
- [Version 1.9](#version-19)
- [Version 1.14](#version-114)
- [Implementation History and Roadmap](#implementation-history-and-roadmap)
- [Phase 1: [DONE]](#phase-1-done)
- [Phase 2: [TARGET: Kubernetes v1.17]](#phase-2-target-kubernetes-v117)
- [Phase 3: [TARGET: TBD]](#phase-3-target-tbd)
<!-- /toc -->

## Summary
Expand Down Expand Up @@ -130,14 +137,16 @@ Applications can generally use huge pages by calling

### Implementation Details/Notes/Constraints [optional]

#### Feature Gate
#### Phase 1

##### Feature Gate

The proposal introduces huge pages as an Alpha feature.

It must be enabled via the `--feature-gates=HugePages=true` flag on pertinent
components pending graduation to Beta.

#### Node Specification
##### Node Specification

Huge pages cannot be overcommitted on a node.

Expand Down Expand Up @@ -196,7 +205,7 @@ status:
...
```

#### Pod Specification
##### Pod Specification

A pod must make a request to consume pre-allocated huge pages using the resource
`hugepages-<hugepagesize>` whose quantity is a positive amount of memory in
Expand Down Expand Up @@ -270,7 +279,7 @@ spec:
medium: HugePages
```

#### CRI Updates
##### CRI Updates

The `LinuxContainerResources` message should be extended to support specifying
huge page limits per size. The specification for huge pages should align with
Expand All @@ -281,7 +290,7 @@ https://github.com/opencontainers/runtime-spec/blob/master/config-linux.md#huge-

The CRI changes are required before promoting this feature to Beta.

#### Cgroup Enforcement
##### Cgroup Enforcement

To use this feature, the `--cgroups-per-qos` must be enabled. In addition, the
`hugetlb` cgroup must be mounted.
Expand All @@ -304,24 +313,43 @@ container cgroup sandbox will be configured with the specified limit.
The `kubelet` will ensure the `hugetlb` has no usage charged to the pod level
cgroup sandbox prior to deleting the pod to ensure all resources are reclaimed.

#### Limits and Quota
##### Limits and Quota

The `ResourceQuota` resource will be extended to support accounting for
`hugepages-<hugepagesize>` similar to `cpu` and `memory`. The `LimitRange`
resource will be extended to define min and max constraints for `hugepages`
similar to `cpu` and `memory`.

#### Scheduler changes
##### Scheduler changes

The scheduler will need to ensure any huge page request defined in the pod spec
can be fulfilled by a candidate node.

#### cAdvisor changes
#### cAdvisor changes(Phase 1)

cAdvisor will need to be modified to return the number of pre-allocated huge
pages per page size on the node. It will be used to determine capacity and
calculate allocatable values on the node.

### Phase 2

#### Support container isolation of huge pages

Container isolation of huge pages should be supported to avoid competition between containers to consume huge pages. Currently, `kubelet` sets the agregated huge pages limits on pod's cgroup of hugetlb subsystem. This should be enhanced to set limits on container's cgroup.

#### cAdviser changes(Phase 2)
To support NUMA, `cAdviser` should collect and store pre-allocated huge pages per NUMA node.


#### Support reserve huge pages for system on Node Allocatable feature
Some system services like `OVS-DPDK` comsume huge pages per NUMA node, to determine allocatalbe number of huge pages in `kubelet`, `Node Allocatable feature` should support to reserve huge pages per NUMA node.

### Phase 3

##### Update LinuxContainerResources(CRI) to support specifying huge page limits

To set huge page limits through systemd by using CRI, `LinuxContainerResources` message should be updated to specify huge page limits. `systemd` also should be updated to handle huge page.

### Risks and Mitigations

#### Huge pages as shared memory
Expand Down Expand Up @@ -350,17 +378,22 @@ locality guarantees as a feature of QoS. In particular, pods in the
- E2E testing validating its usage.
-- https://k8s-testgrid.appspot.com/sig-node-kubelet#node-kubelet-serial&include-filter-by-regex=Feature%3AHugePages

## Implementation History

### Version 1.8
## Implementation History and Roadmap

Initial alpha support for huge pages usage by pods.
### Phase 1: [DONE]
#### Version 1.8
- Initial alpha support for huge pages usage by pods.

### Version 1.9
#### Version 1.9
- Beta support for huge pages

Beta support for huge pages
#### Version 1.14
- GA support for huge pages proposed based on feedback from user community
using the feature without issue.

### Version 1.14
### Phase 2: [TARGET: Kubernetes v1.17]
- Support container isolation of huge pages
- Support reserve huge pages for system on Node Allocatable feature

GA support for huge pages proposed based on feedback from user community
using the feature without issue.
### Phase 3: [TARGET: TBD]
- Update LinuxContainerResources(CRI) to support specifying huge page limits

0 comments on commit aa43ea9

Please sign in to comment.