Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add disruption admin guide. #1244

Merged
merged 1 commit into from
Sep 20, 2016
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
93 changes: 93 additions & 0 deletions docs/admin/disruptions.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,93 @@
---
assignees:
- mml

---
This guide is for anyone wishing to specify safety constraints on pods or anyone
wishing to write software (typically automation software) that respects those
constraints.

* TOC
{:toc}

## Rationale

Various cluster management operations may voluntarily evict pods. "Voluntary"
means an eviction can be safely delayed for a reasonable period of time. The
principal examples today are draining a node for maintenance or upgrade
(`kubectl drain`), and cluster autoscaling down. In the future the
[rescheduler](https://github.com/kubernetes/kubernetes/blob/master/docs/proposals/rescheduling.md)
may also perform voluntary evictions. By contrast, something like evicting pods
because a node has become unreachable or reports NotReady, is not "voluntary."

For voluntary evictions, it can be useful for applications to be able to limit
the number of pods that are down. For example, a quorum-based application would
like to ensure that the number of replicas running is never brought below the
number needed for a quorum, even temporarily. Or a web front end might want to
ensure that the number of replicas serving load never falls below a certain
percentage of the total, even briefly. `PodDisruptionBudget` is an API object
that specifies the minimum number or percentage of replicas of a collection that
must be up at a time. Components that wish to evict a pod subject to disruption
budget use the `/eviction` subresource; unlike a regular pod deletion, this
operation may be rejected by the API server if the eviction would cause a
disruption budget to be violated.

## Specifying a PodDisruptionBudget

A `PodDisruptionBudget` has two components: a selector to specify the set of
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rewrite this as

Currently a PodDisruptionBudget has two components: a selector to specify a set of pods, and a description of the minimum number of pods from that set that must be available (i.e. an eviction will not be allowed if it will cause the number of available pods to fall below this threshold).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "i.e." is not how it's implemented. This may be a simple off-by-one error, or it may be a place where we didn't think carefully enough about the spec.

As it stands, if you say "100%", we will allow evictions if all the pods are ready. If we want "100%" to mean "never permit a 'voluntary' eviction", I will clarify the docs, and changing the code and tests will be easy.

Or maybe the right thing is to open an issue to discuss and track this, and for now document the way it works?

Copy link
Member

@davidopp davidopp Sep 20, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes sorry I didn't catch this sooner but I think the semantics you described are not what people are going to expect. I now see that the comment in the proto, "the minimum number of pods that must be available simultaneously" is ambiguous. And I guess I wasn't thinking about this when I was looking at the code and tests. In any event, I do think we want people to be able to say "don't allow any voluntary evictions". Can you send a PR to change the behavior and tests, and change the documentation to say something like what I suggested? Very sorry about that. Thanks for noting the discrepancy with my suggestion!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also can you update the comment on the proto to describe the new behavior.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes to all of these. Alright if I address the semantic change in a followup when I actually change the code? It'll be sent today, but I would like to get this merged in some form, and it is correct as-is.

pods, and a description of the minimum number of available pods for a disruption
to be allowed. The latter can be either an absolute number or a percentage. In
typical usage, a single budget would be used for a collection of pods managed by
a controller—for example, the pods in a single ReplicaSet.

Note that a disruption budget does not truly guarantee that the specified
number/percentage of pods will always be up. For example, a node that hosts a
pod from the collection may fail when the collection is at the minimum size
specified in the budget, thus bringing the number of available pods from the
collection below the specified size. The budget can only protect against
voluntary evictions, not all causes of unavailability.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would add something at the end that says "Note that DisruptionBudget does not truly guarantee that the specified number/percentage of pods will always be up -- for example, a node that hosts a pod from the collection may fail when the collection is at the minimum size specified in the PodDisruptionBudget, thus bringing the number of available pods from the collection below the specified size. DisruptionBudget can only protect against voluntary evictions, not all causes of unavailability."

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

## Requesting an eviction

If you are writing infrastructure software that wants to produce these voluntary
evictions, you will need to use the eviction API. The eviction subresource of a
pod can be thought of as a kind of policy-controlled DELETE operation on the pod
itself. To attempt an eviction (perhaps more REST-precisely, to attempt to
*create* an eviction), you POST an attempted operation. Here's an example:

```json
{
"apiVersion": "policy/v1alpha1",
"kind": "Eviction",
"name": "quux",
"namespace": "default"
}
```

and here is how you would attempt this with `curl`

```bash
curl -v -X POST -H 'Content-type: application/json'
http://127.0.0.1:8080/api/v1/namespaces/default/pods/quux/eviction -d
eviction.json
```

The API can respond in one of three ways.

1. If the eviction is granted, then the pod is deleted just as if you had sent
a `DELETE` request to the pod's URL and you get back `200 OK`.
2. If the current state of affairs wouldn't allow an eviction by the rules set
forth in the budget, you get back `429 Too Many Requests`. This is
typically used for generic rate limiting of *any* requests, but here we mean
that this request isn't allowed *right now*, but it may be allowed later.
Currently, callers do not get any `Retry-After` advice, but they may in
future versions.
3. If there is some kind of misconfiguration, like multiple budgets pointing at
the same pod, you will get `500 Internal Server Error`.

For a given eviction request, there are two cases.

1. There is no budget that matches this pod. In this case, the server always
returns `200 OK`.
2. There is at least one budget. In this case, any of the three responses may
apply.