If you are using a released version of Kubernetes, you should refer to the docs that go with that version.
The latest release of this document can be found [here](http://releases.k8s.io/release-1.3/docs/design/downward_api_resources_limits_requests.md).Documentation for other releases can be found at releases.k8s.io.
Currently the downward API (via environment variables and volume plugin) only supports exposing a Pod's name, namespace, annotations, labels and its IP (see details). This document explains the need and design to extend them to expose resources (e.g. cpu, memory) limits and requests.
Software applications require configuration to work optimally with the resources they're allowed to use. Exposing the requested and limited amounts of available resources inside containers will allow these applications to be configured more easily. Although docker already exposes some of this information inside containers, the downward API helps exposing this information in a runtime-agnostic manner in Kubernetes.
As an application author, I want to be able to use cpu or memory requests and
limits to configure the operational requirements of my applications inside containers.
For example, Java applications expect to be made aware of the available heap size via
a command line argument to the JVM, for example: java -Xmx:<heap-size>
. Similarly, an
application may want to configure its thread pool based on available cpu resources and
the exported value of GOMAXPROCS.
This is mostly driven by the discussion in this issue. There are three approaches discussed in this document to obtain resources limits and requests to be exposed as environment variables and volumes inside containers:
-
The first approach requires users to specify full json path selectors in which selectors are relative to the pod spec. The benefit of this approach is to specify pod-level resources, and since containers are also part of a pod spec, it can be used to specify container-level resources too.
-
The second approach requires specifying partial json path selectors which are relative to the container spec. This approach helps in retrieving a container specific resource limits and requests, and at the same time, it is simpler to specify than full json path selectors.
-
In the third approach, users specify fixed strings (magic keys) to retrieve resources limits and requests and do not specify any json path selectors. This approach is similar to the existing downward API implementation approach. The advantages of this approach are that it is simpler to specify that the first two, and does not require any type of conversion between internal and versioned objects or json selectors as discussed below.
Before discussing a bit more about merits of each approach, here is a brief discussion about json path selectors and some implications related to their use.
Versioned objects in kubernetes have json tags as part of their golang fields. Currently, objects in the internal API have json tags, but it is planned that these will eventually be removed (see 3933 for discussion). So for discussion in this proposal, we assume that internal objects do not have json tags. In the first two approaches (full and partial json selectors), when a user creates a pod and its containers, the user specifies a json path selector in the pod's spec to retrieve values of its limits and requests. The selector is composed of json tags similar to json paths used with kubectl (json). This proposal uses kubernetes' json path library to process the selectors to retrieve the values. As kubelet operates on internal objects (without json tags), and the selectors are part of versioned objects, retrieving values of the limits and requests can be handled using these two solutions:
-
By converting an internal object to versioned obejct, and then using the json path library to retrieve the values from the versioned object by processing the selector.
-
By converting a json selector of the versioned objects to internal object's golang expression and then using the json path library to retrieve the values from the internal object by processing the golang expression. However, converting a json selector of the versioned objects to internal object's golang expression will still require an instance of the versioned object, so it seems more work from the first solution unless there is another way without requiring the versioned object.
So there is a one time conversion cost associated with the first (full path) and second (partial path) approaches, whereas the third approach (magic keys) does not require any such conversion and can directly work on internal objects. If we want to avoid conversion cost and to have implementation simplicity, my opinion is that magic keys approach is relatively easiest to implement to expose limits and requests with least impact on existing functionality.
To summarize merits/demerits of each approach:
Approach | Scope | Conversion cost | JSON selectors | Future extension |
---|---|---|---|---|
Full selectors | Pod/Container | Yes | Yes | Possible |
Partial selectors | Container | Yes | Yes | Possible |
Magic keys | Container | No | No | Possible |
Note: Please note that pod resources can always be accessed using existing type ObjectFieldSelector
object
in conjunction with partial selectors and magic keys approaches.
Full json path selectors specify the complete path to the resources limits and requests relative to pod spec.
This table shows how selectors can be used for various requests and limits to be exposed as environment variables. Environment variable names are examples only and not necessarily as specified, and the selectors do not have to start with dot.
Env Var Name | Selector |
---|---|
CPU_LIMIT | spec.containers[?(@.name=="container-name")].resources.limits.cpu |
MEMORY_LIMIT | spec.containers[?(@.name=="container-name")].resources.limits.memory |
CPU_REQUEST | spec.containers[?(@.name=="container-name")].resources.requests.cpu |
MEMORY_REQUEST | spec.containers[?(@.name=="container-name")].resources.requests.memory |
This table shows how selectors can be used for various requests and limits to be exposed as volumes. The path names are examples only and not necessarily as specified, and the selectors do not have to start with dot.
Path | Selector |
---|---|
cpu_limit | spec.containers[?(@.name=="container-name")].resources.limits.cpu |
memory_limit | spec.containers[?(@.name=="container-name")].resources.limits.memory |
cpu_request | spec.containers[?(@.name=="container-name")].resources.requests.cpu |
memory_request | spec.containers[?(@.name=="container-name")].resources.requests.memory |
Volumes are pod scoped, so a selector must be specified with a container name.
Full json path selectors will use existing type ObjectFieldSelector
to extend the current implementation for resources requests and limits.
// ObjectFieldSelector selects an APIVersioned field of an object.
type ObjectFieldSelector struct {
APIVersion string `json:"apiVersion"`
// Required: Path of the field to select in the specified API version
FieldPath string `json:"fieldPath"`
}
These examples show how to use full selectors with environment variables and volume plugin.
apiVersion: v1
kind: Pod
metadata:
name: dapi-test-pod
spec:
containers:
- name: test-container
image: gcr.io/google_containers/busybox
command: [ "/bin/sh","-c", "env" ]
resources:
requests:
memory: "64Mi"
cpu: "250m"
limits:
memory: "128Mi"
cpu: "500m"
env:
- name: CPU_LIMIT
valueFrom:
fieldRef:
fieldPath: spec.containers[?(@.name=="test-container")].resources.limits.cpu
apiVersion: v1
kind: Pod
metadata:
name: kubernetes-downwardapi-volume-example
spec:
containers:
- name: client-container
image: gcr.io/google_containers/busybox
command: ["sh", "-c", "while true; do if [[ -e /etc/labels ]]; then cat /etc/labels; fi; if [[ -e /etc/annotations ]]; then cat /etc/annotations; fi;sleep 5; done"]
resources:
requests:
memory: "64Mi"
cpu: "250m"
limits:
memory: "128Mi"
cpu: "500m"
volumeMounts:
- name: podinfo
mountPath: /etc
readOnly: false
volumes:
- name: podinfo
downwardAPI:
items:
- path: "cpu_limit"
fieldRef:
fieldPath: spec.containers[?(@.name=="client-container")].resources.limits.cpu
For APIs with full json path selectors, verify that selectors are valid relative to pod spec.
Partial json path selectors specify paths to resources limits and requests
relative to the container spec. These will be implemented by introducing a
ContainerSpecFieldSelector
(json: containerSpecFieldRef
) to extend the current
implementation for type DownwardAPIVolumeFile struct
and type EnvVarSource struct
.
// ContainerSpecFieldSelector selects an APIVersioned field of an object.
type ContainerSpecFieldSelector struct {
APIVersion string `json:"apiVersion"`
// Container name
ContainerName string `json:"containerName,omitempty"`
// Required: Path of the field to select in the specified API version
FieldPath string `json:"fieldPath"`
}
// Represents a single file containing information from the downward API
type DownwardAPIVolumeFile struct {
// Required: Path is the relative path name of the file to be created.
Path string `json:"path"`
// Selects a field of the pod: only annotations, labels, name and
// namespace are supported.
FieldRef *ObjectFieldSelector `json:"fieldRef, omitempty"`
// Selects a field of the container: only resources limits and requests
// (resources.limits.cpu, resources.limits.memory, resources.requests.cpu,
// resources.requests.memory) are currently supported.
ContainerSpecFieldRef *ContainerSpecFieldSelector `json:"containerSpecFieldRef,omitempty"`
}
// EnvVarSource represents a source for the value of an EnvVar.
// Only one of its fields may be set.
type EnvVarSource struct {
// Selects a field of the container: only resources limits and requests
// (resources.limits.cpu, resources.limits.memory, resources.requests.cpu,
// resources.requests.memory) are currently supported.
ContainerSpecFieldRef *ContainerSpecFieldSelector `json:"containerSpecFieldRef,omitempty"`
// Selects a field of the pod; only name and namespace are supported.
FieldRef *ObjectFieldSelector `json:"fieldRef,omitempty"`
// Selects a key of a ConfigMap.
ConfigMapKeyRef *ConfigMapKeySelector `json:"configMapKeyRef,omitempty"`
// Selects a key of a secret in the pod's namespace.
SecretKeyRef *SecretKeySelector `json:"secretKeyRef,omitempty"`
}
This table shows how partial selectors can be used for various requests and limits to be exposed as environment variables. Environment variable names are examples only and not necessarily as specified, and the selectors do not have to start with dot.
Env Var Name | Selector |
---|---|
CPU_LIMIT | resources.limits.cpu |
MEMORY_LIMIT | resources.limits.memory |
CPU_REQUEST | resources.requests.cpu |
MEMORY_REQUEST | resources.requests.memory |
Since environment variables are container scoped, it is optional to specify container name as part of the partial selectors as they are relative to container spec. If container name is not specified, then it defaults to current container. However, container name could be specified to expose variables from other containers.
This table shows volume paths and partial selectors used for resources cpu and memory. Volume path names are examples only and not necessarily as specified, and the selectors do not have to start with dot.
Path | Selector |
---|---|
cpu_limit | resources.limits.cpu |
memory_limit | resources.limits.memory |
cpu_request | resources.requests.cpu |
memory_request | resources.requests.memory |
Volumes are pod scoped, the container name must be specified as part of
containerSpecFieldRef
with them.
These examples show how to use partial selectors with environment variables and volume plugin.
apiVersion: v1
kind: Pod
metadata:
name: dapi-test-pod
spec:
containers:
- name: test-container
image: gcr.io/google_containers/busybox
command: [ "/bin/sh","-c", "env" ]
resources:
requests:
memory: "64Mi"
cpu: "250m"
limits:
memory: "128Mi"
cpu: "500m"
env:
- name: CPU_LIMIT
valueFrom:
containerSpecFieldRef:
fieldPath: resources.limits.cpu
apiVersion: v1
kind: Pod
metadata:
name: kubernetes-downwardapi-volume-example
spec:
containers:
- name: client-container
image: gcr.io/google_containers/busybox
command: ["sh", "-c", "while true; do if [[ -e /etc/labels ]]; then cat /etc/labels; fi; if [[ -e /etc/annotations ]]; then cat /etc/annotations; fi; sleep 5; done"]
resources:
requests:
memory: "64Mi"
cpu: "250m"
limits:
memory: "128Mi"
cpu: "500m"
volumeMounts:
- name: podinfo
mountPath: /etc
readOnly: false
volumes:
- name: podinfo
downwardAPI:
items:
- path: "cpu_limit"
containerSpecFieldRef:
containerName: "client-container"
fieldPath: resources.limits.cpu
For APIs with partial json path selectors, verify that selectors are valid relative to container spec. Also verify that container name is provided with volumes.
In this approach, users specify fixed strings (or magic keys) to retrieve resources
limits and requests. This approach is similar to the existing downward
API implementation approach. The fixed string used for resources limits and requests
for cpu and memory are limits.cpu
, limits.memory
,
requests.cpu
and requests.memory
. Though these strings are same
as json path selectors but are processed as fixed strings. These will be implemented by
introducing a ResourceFieldSelector
(json: resourceFieldRef
) to extend the current
implementation for type DownwardAPIVolumeFile struct
and type EnvVarSource struct
.
The fields in ResourceFieldSelector are containerName
to specify the name of a
container, resource
to specify the type of a resource (cpu or memory), and divisor
to specify the output format of values of exposed resources. The default value of divisor
is 1
which means cores for cpu and bytes for memory. For cpu, divisor's valid
values are 1m
(millicores), 1
(cores), and for memory, the valid values in fixed point integer
(decimal) are 1
(bytes), 1k
(kilobytes), 1M
(megabytes), 1G
(gigabytes),
1T
(terabytes), 1P
(petabytes), 1E
(exabytes), and in their power-of-two equivalents 1Ki(kibibytes)
,
1Mi
(mebibytes), 1Gi
(gibibytes), 1Ti
(tebibytes), 1Pi
(pebibytes), 1Ei
(exbibytes).
For more information about these resource formats, see details.
Also, the exposed values will be ceiling
of the actual values in the requestd format in divisor.
For example, if requests.cpu is 250m
(250 millicores) and the divisor by default is 1
, then
exposed value will be 1
core. It is because 250 millicores when converted to cores will be 0.25 and
the ceiling of 0.25 is 1.
type ResourceFieldSelector struct {
// Container name
ContainerName string `json:"containerName,omitempty"`
// Required: Resource to select
Resource string `json:"resource"`
// Specifies the output format of the exposed resources
Divisor resource.Quantity `json:"divisor,omitempty"`
}
// Represents a single file containing information from the downward API
type DownwardAPIVolumeFile struct {
// Required: Path is the relative path name of the file to be created.
Path string `json:"path"`
// Selects a field of the pod: only annotations, labels, name and
// namespace are supported.
FieldRef *ObjectFieldSelector `json:"fieldRef, omitempty"`
// Selects a resource of the container: only resources limits and requests
// (limits.cpu, limits.memory, requests.cpu and requests.memory) are currently supported.
ResourceFieldRef *ResourceFieldSelector `json:"resourceFieldRef,omitempty"`
}
// EnvVarSource represents a source for the value of an EnvVar.
// Only one of its fields may be set.
type EnvVarSource struct {
// Selects a resource of the container: only resources limits and requests
// (limits.cpu, limits.memory, requests.cpu and requests.memory) are currently supported.
ResourceFieldRef *ResourceFieldSelector `json:"resourceFieldRef,omitempty"`
// Selects a field of the pod; only name and namespace are supported.
FieldRef *ObjectFieldSelector `json:"fieldRef,omitempty"`
// Selects a key of a ConfigMap.
ConfigMapKeyRef *ConfigMapKeySelector `json:"configMapKeyRef,omitempty"`
// Selects a key of a secret in the pod's namespace.
SecretKeyRef *SecretKeySelector `json:"secretKeyRef,omitempty"`
}
This table shows environment variable names and strings used for resources cpu and memory. The variable names are examples only and not necessarily as specified.
Env Var Name | Resource |
---|---|
CPU_LIMIT | limits.cpu |
MEMORY_LIMIT | limits.memory |
CPU_REQUEST | requests.cpu |
MEMORY_REQUEST | requests.memory |
Since environment variables are container scoped, it is optional to specify container name as part of the partial selectors as they are relative to container spec. If container name is not specified, then it defaults to current container. However, container name could be specified to expose variables from other containers.
This table shows volume paths and strings used for resources cpu and memory. Volume path names are examples only and not necessarily as specified.
Path | Resource |
---|---|
cpu_limit | limits.cpu |
memory_limit | limits.memory |
cpu_request | requests.cpu |
memory_request | requests.memory |
Volumes are pod scoped, the container name must be specified as part of
resourceFieldRef
with them.
These examples show how to use magic keys approach with environment variables and volume plugin.
apiVersion: v1
kind: Pod
metadata:
name: dapi-test-pod
spec:
containers:
- name: test-container
image: gcr.io/google_containers/busybox
command: [ "/bin/sh","-c", "env" ]
resources:
requests:
memory: "64Mi"
cpu: "250m"
limits:
memory: "128Mi"
cpu: "500m"
env:
- name: CPU_LIMIT
valueFrom:
resourceFieldRef:
resource: limits.cpu
- name: MEMORY_LIMIT
valueFrom:
resourceFieldRef:
resource: limits.memory
divisor: "1Mi"
In the above example, the exposed values of CPU_LIMIT and MEMORY_LIMIT will be 1 (in cores) and 128 (in Mi), respectively.
apiVersion: v1
kind: Pod
metadata:
name: kubernetes-downwardapi-volume-example
spec:
containers:
- name: client-container
image: gcr.io/google_containers/busybox
command: ["sh", "-c","while true; do if [[ -e /etc/labels ]]; then cat /etc/labels; fi; if [[ -e /etc/annotations ]]; then cat /etc/annotations; fi; sleep 5; done"]
resources:
requests:
memory: "64Mi"
cpu: "250m"
limits:
memory: "128Mi"
cpu: "500m"
volumeMounts:
- name: podinfo
mountPath: /etc
readOnly: false
volumes:
- name: podinfo
downwardAPI:
items:
- path: "cpu_limit"
resourceFieldRef:
containerName: client-container
resource: limits.cpu
divisor: "1m"
- path: "memory_limit"
resourceFieldRef:
containerName: client-container
resource: limits.memory
In the above example, the exposed values of CPU_LIMIT and MEMORY_LIMIT will be 500 (in millicores) and 134217728 (in bytes), respectively.
For APIs with magic keys, verify that the resource strings are valid and is one
of limits.cpu
, limits.memory
, requests.cpu
and requests.memory
.
Also verify that container name is provided with volumes.
Pod-level resources (like metadata.name
, status.podIP
) will always be accessed with type ObjectFieldSelector
object in
all approaches. Container-level resources will be accessed by type ObjectFieldSelector
with full selector approach; and by type ContainerSpecFieldRef
and type ResourceFieldRef
with partial and magic keys approaches, respectively. The following table
summarizes resource access with these approaches.
Approach | Pod resources | Container resources |
---|---|---|
Full selectors | ObjectFieldSelector |
ObjectFieldSelector |
Partial selectors | ObjectFieldSelector |
ContainerSpecFieldRef |
Magic keys | ObjectFieldSelector |
ResourceFieldRef |
The output format for resources limits and requests will be same as
cgroups output format, i.e. cpu in cpu shares (cores multiplied by 1024
and rounded to integer) and memory in bytes. For example, memory request
or limit of 64Mi
in the container spec will be output as 67108864
bytes, and cpu request or limit of 250m
(millicores) will be output as
256
of cpu shares.
The current implementation of this proposal will focus on the API with magic keys approach. The main reason for selecting this approach is that it might be easier to incorporate and extend resource specific functionality.
Here we discuss how to use exposed resource values to set, for example, Java
memory size or GOMAXPROCS for your applications. Lets say, you expose a container's
(running an application like tomcat for example) requested memory as HEAP_SIZE
and requested cpu as CPU_LIMIT (or could be GOMAXPROCS directly) environment variable.
One way to set the heap size or cpu for this application would be to wrap the binary
in a shell script, and then export JAVA_OPTS
(assuming your container image supports it)
and GOMAXPROCS environment variables inside the container image. The spec file for the
application pod could look like:
apiVersion: v1
kind: Pod
metadata:
name: kubernetes-downwardapi-volume-example
spec:
containers:
- name: test-container
image: gcr.io/google_containers/busybox
command: [ "/bin/sh","-c", "env" ]
resources:
requests:
memory: "64M"
cpu: "250m"
limits:
memory: "128M"
cpu: "500m"
env:
- name: HEAP_SIZE
valueFrom:
resourceFieldRef:
resource: requests.memory
- name: CPU_LIMIT
valueFrom:
resourceFieldRef:
resource: requests.cpu
Note that the value of divisor by default is 1
. Now inside the container,
the HEAP_SIZE (in bytes) and GOMAXPROCS (in cores) could be exported as:
export JAVA_OPTS="$JAVA_OPTS -Xmx:$(HEAP_SIZE)"
and
export GOMAXPROCS=$(CPU_LIMIT)"