-
Notifications
You must be signed in to change notification settings - Fork 14.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add disruption admin guide. #1244
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,93 @@ | ||
--- | ||
assignees: | ||
- mml | ||
|
||
--- | ||
This guide is for anyone wishing to specify safety constraints on pods or anyone | ||
wishing to write software (typically automation software) that respects those | ||
constraints. | ||
|
||
* TOC | ||
{:toc} | ||
|
||
## Rationale | ||
|
||
Various cluster management operations may voluntarily evict pods. "Voluntary" | ||
means an eviction can be safely delayed for a reasonable period of time. The | ||
principal examples today are draining a node for maintenance or upgrade | ||
(`kubectl drain`), and cluster autoscaling down. In the future the | ||
[rescheduler](https://github.com/kubernetes/kubernetes/blob/master/docs/proposals/rescheduling.md) | ||
may also perform voluntary evictions. By contrast, something like evicting pods | ||
because a node has become unreachable or reports NotReady, is not "voluntary." | ||
|
||
For voluntary evictions, it can be useful for applications to be able to limit | ||
the number of pods that are down. For example, a quorum-based application would | ||
like to ensure that the number of replicas running is never brought below the | ||
number needed for a quorum, even temporarily. Or a web front end might want to | ||
ensure that the number of replicas serving load never falls below a certain | ||
percentage of the total, even briefly. `PodDisruptionBudget` is an API object | ||
that specifies the minimum number or percentage of replicas of a collection that | ||
must be up at a time. Components that wish to evict a pod subject to disruption | ||
budget use the `/eviction` subresource; unlike a regular pod deletion, this | ||
operation may be rejected by the API server if the eviction would cause a | ||
disruption budget to be violated. | ||
|
||
## Specifying a PodDisruptionBudget | ||
|
||
A `PodDisruptionBudget` has two components: a selector to specify the set of | ||
pods, and a description of the minimum number of available pods for a disruption | ||
to be allowed. The latter can be either an absolute number or a percentage. In | ||
typical usage, a single budget would be used for a collection of pods managed by | ||
a controller—for example, the pods in a single ReplicaSet. | ||
|
||
Note that a disruption budget does not truly guarantee that the specified | ||
number/percentage of pods will always be up. For example, a node that hosts a | ||
pod from the collection may fail when the collection is at the minimum size | ||
specified in the budget, thus bringing the number of available pods from the | ||
collection below the specified size. The budget can only protect against | ||
voluntary evictions, not all causes of unavailability. | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I would add something at the end that says "Note that DisruptionBudget does not truly guarantee that the specified number/percentage of pods will always be up -- for example, a node that hosts a pod from the collection may fail when the collection is at the minimum size specified in the PodDisruptionBudget, thus bringing the number of available pods from the collection below the specified size. DisruptionBudget can only protect against voluntary evictions, not all causes of unavailability." There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done. |
||
## Requesting an eviction | ||
|
||
If you are writing infrastructure software that wants to produce these voluntary | ||
evictions, you will need to use the eviction API. The eviction subresource of a | ||
pod can be thought of as a kind of policy-controlled DELETE operation on the pod | ||
itself. To attempt an eviction (perhaps more REST-precisely, to attempt to | ||
*create* an eviction), you POST an attempted operation. Here's an example: | ||
|
||
```json | ||
{ | ||
"apiVersion": "policy/v1alpha1", | ||
"kind": "Eviction", | ||
"name": "quux", | ||
"namespace": "default" | ||
} | ||
``` | ||
|
||
and here is how you would attempt this with `curl` | ||
|
||
```bash | ||
curl -v -X POST -H 'Content-type: application/json' | ||
http://127.0.0.1:8080/api/v1/namespaces/default/pods/quux/eviction -d | ||
eviction.json | ||
``` | ||
|
||
The API can respond in one of three ways. | ||
|
||
1. If the eviction is granted, then the pod is deleted just as if you had sent | ||
a `DELETE` request to the pod's URL and you get back `200 OK`. | ||
2. If the current state of affairs wouldn't allow an eviction by the rules set | ||
forth in the budget, you get back `429 Too Many Requests`. This is | ||
typically used for generic rate limiting of *any* requests, but here we mean | ||
that this request isn't allowed *right now*, but it may be allowed later. | ||
Currently, callers do not get any `Retry-After` advice, but they may in | ||
future versions. | ||
3. If there is some kind of misconfiguration, like multiple budgets pointing at | ||
the same pod, you will get `500 Internal Server Error`. | ||
|
||
For a given eviction request, there are two cases. | ||
|
||
1. There is no budget that matches this pod. In this case, the server always | ||
returns `200 OK`. | ||
2. There is at least one budget. In this case, any of the three responses may | ||
apply. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would rewrite this as
Currently a
PodDisruptionBudget
has two components: a selector to specify a set of pods, and a description of the minimum number of pods from that set that must be available (i.e. an eviction will not be allowed if it will cause the number of available pods to fall below this threshold).There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The "i.e." is not how it's implemented. This may be a simple off-by-one error, or it may be a place where we didn't think carefully enough about the spec.
As it stands, if you say "100%", we will allow evictions if all the pods are ready. If we want "100%" to mean "never permit a 'voluntary' eviction", I will clarify the docs, and changing the code and tests will be easy.
Or maybe the right thing is to open an issue to discuss and track this, and for now document the way it works?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes sorry I didn't catch this sooner but I think the semantics you described are not what people are going to expect. I now see that the comment in the proto, "the minimum number of pods that must be available simultaneously" is ambiguous. And I guess I wasn't thinking about this when I was looking at the code and tests. In any event, I do think we want people to be able to say "don't allow any voluntary evictions". Can you send a PR to change the behavior and tests, and change the documentation to say something like what I suggested? Very sorry about that. Thanks for noting the discrepancy with my suggestion!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also can you update the comment on the proto to describe the new behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes to all of these. Alright if I address the semantic change in a followup when I actually change the code? It'll be sent today, but I would like to get this merged in some form, and it is correct as-is.