diff --git a/docs/admin/disruptions.md b/docs/admin/disruptions.md new file mode 100644 index 0000000000000..96157a50046cd --- /dev/null +++ b/docs/admin/disruptions.md @@ -0,0 +1,93 @@ +--- +assignees: +- mml + +--- +This guide is for anyone wishing to specify safety constraints on pods or anyone +wishing to write software (typically automation software) that respects those +constraints. + +* TOC +{:toc} + +## Rationale + +Various cluster management operations may voluntarily evict pods. "Voluntary" +means an eviction can be safely delayed for a reasonable period of time. The +principal examples today are draining a node for maintenance or upgrade +(`kubectl drain`), and cluster autoscaling down. In the future the +[rescheduler](https://github.com/kubernetes/kubernetes/blob/master/docs/proposals/rescheduling.md) +may also perform voluntary evictions. By contrast, something like evicting pods +because a node has become unreachable or reports NotReady, is not "voluntary." + +For voluntary evictions, it can be useful for applications to be able to limit +the number of pods that are down. For example, a quorum-based application would +like to ensure that the number of replicas running is never brought below the +number needed for a quorum, even temporarily. Or a web front end might want to +ensure that the number of replicas serving load never falls below a certain +percentage of the total, even briefly. `PodDisruptionBudget` is an API object +that specifies the minimum number or percentage of replicas of a collection that +must be up at a time. Components that wish to evict a pod subject to disruption +budget use the `/eviction` subresource; unlike a regular pod deletion, this +operation may be rejected by the API server if the eviction would cause a +disruption budget to be violated. + +## Specifying a PodDisruptionBudget + +A `PodDisruptionBudget` has two components: a selector to specify the set of +pods, and a description of the minimum number of available pods for a disruption +to be allowed. The latter can be either an absolute number or a percentage. In +typical usage, a single budget would be used for a collection of pods managed by +a controller—for example, the pods in a single ReplicaSet. + +Note that a disruption budget does not truly guarantee that the specified +number/percentage of pods will always be up. For example, a node that hosts a +pod from the collection may fail when the collection is at the minimum size +specified in the budget, thus bringing the number of available pods from the +collection below the specified size. The budget can only protect against +voluntary evictions, not all causes of unavailability. + +## Requesting an eviction + +If you are writing infrastructure software that wants to produce these voluntary +evictions, you will need to use the eviction API. The eviction subresource of a +pod can be thought of as a kind of policy-controlled DELETE operation on the pod +itself. To attempt an eviction (perhaps more REST-precisely, to attempt to +*create* an eviction), you POST an attempted operation. Here's an example: + +```json +{ + "apiVersion": "policy/v1alpha1", + "kind": "Eviction", + "name": "quux", + "namespace": "default" +} +``` + +and here is how you would attempt this with `curl` + +```bash +curl -v -X POST -H 'Content-type: application/json' +http://127.0.0.1:8080/api/v1/namespaces/default/pods/quux/eviction -d +eviction.json +``` + +The API can respond in one of three ways. + + 1. If the eviction is granted, then the pod is deleted just as if you had sent + a `DELETE` request to the pod's URL and you get back `200 OK`. + 2. If the current state of affairs wouldn't allow an eviction by the rules set + forth in the budget, you get back `429 Too Many Requests`. This is + typically used for generic rate limiting of *any* requests, but here we mean + that this request isn't allowed *right now*, but it may be allowed later. + Currently, callers do not get any `Retry-After` advice, but they may in + future versions. + 3. If there is some kind of misconfiguration, like multiple budgets pointing at + the same pod, you will get `500 Internal Server Error`. + +For a given eviction request, there are two cases. + + 1. There is no budget that matches this pod. In this case, the server always + returns `200 OK`. + 2. There is at least one budget. In this case, any of the three responses may + apply.