Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add disruption admin guide. #1244

Merged
merged 1 commit into from
Sep 20, 2016
Merged

Conversation

mml
Copy link
Contributor

@mml mml commented Sep 16, 2016

This change is Reviewable

@devin-donnelly
Copy link
Contributor

Can you please retarget this PR to the release-1.4 branch?

@mml mml changed the base branch from master to release-1.4 September 16, 2016 18:00
@googlebot
Copy link

We found a Contributor License Agreement for you (the sender of this pull request) and all commit authors, but as best as we can tell these commits were authored by someone else. If that's the case, please add them to this pull request and have them confirm that they're okay with these commits being contributed to Google. If we're mistaken and you did author these commits, just reply here to confirm.

@mml
Copy link
Contributor Author

mml commented Sep 16, 2016

Can you please retarget this PR to the release-1.4 branch?

Done.

@mml
Copy link
Contributor Author

mml commented Sep 16, 2016

No idea why the bot freaked out.

% git log upstream/release-1.4..disruption
commit e5c144b7cde8b92308a59fc3d7cd4b154ea2ad6a
Author: Matt Liggett <[email protected]>
Date:   Fri Sep 16 10:21:21 2016 -0700

    Add disruption admin guide.

@mml
Copy link
Contributor Author

mml commented Sep 16, 2016

We found a Contributor License Agreement for you (the sender of this pull request) and all commit authors, but as best as we can tell these commits were authored by someone else. If that's the case, please add them to this pull request and have them confirm that they're okay with these commits being contributed to Google. If we're mistaken and you did author these commits, just reply here to confirm.

@googlebot I authored all these commits.

@devin-donnelly devin-donnelly added this to the 1.4 milestone Sep 16, 2016

## Rationale

Various cluster management operations may "voluntarily" evict pods. By
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Avoid using "we" as a general principle. Suggested rephrase:

"In a "voluntary" eviction , a pod is removed from a node in a controlled manner, such as draining the node for maintenance (kubectl drain) or autoscaling a cluster downward. In the future, the [rescheduler] may also perform voluntary evictions. By contrast, a non-voluntary eviction removes a pod because a node has become unreachable or reports NotReady."

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went more directly definitional for the first sentence.

For the second one, the given example is just that, an example. It's not the definition of non-voluntary and only illustrates the principle.

have the
[rescheduler](https://github.com/kubernetes/kubernetes/blob/master/docs/proposals/rescheduling.md)
and possibly other examples. (In contrast, something like evicting pods because
a node has become unreachable or reports NotReady, is not "voluntary.") For
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd start a new paragraph here.

"For voluntary evictions, it can be useful..."

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@googlebot
Copy link

CLAs look good, thanks!

@mml
Copy link
Contributor Author

mml commented Sep 19, 2016

@davidopp think I just need a technical LG on this.

because a node has become unreachable or reports NotReady, is not "voluntary."

For voluntary evictions, it can be useful for applications to be able to limit
the number of pods that are down, or the rate with which pods are evicted. For
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we didn't implement it yet, I would just remove the "or the rate with which pods are evicted"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

temporarily. Or a web front end might want to ensure that the number of replicas
serving load never falls below a certain percentage of the total, even briefly.
`PodDisruptionBudget` is an API object that specifies the minimum number or
percentage of replicas of a collection that must be up at a time, and/or the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove "and/or the maximum eviction rate across the collection."

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

serving load never falls below a certain percentage of the total, even briefly.
`PodDisruptionBudget` is an API object that specifies the minimum number or
percentage of replicas of a collection that must be up at a time, and/or the
maximum eviction rate across the collection. Components that wish to eviction a
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/eviction/evict/

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


## Specifying a PodDisruptionBudget

A `PodDisruptionBudget` has two components: a selector to specify the set of
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rewrite this as

Currently a PodDisruptionBudget has two components: a selector to specify a set of pods, and a description of the minimum number of pods from that set that must be available (i.e. an eviction will not be allowed if it will cause the number of available pods to fall below this threshold).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "i.e." is not how it's implemented. This may be a simple off-by-one error, or it may be a place where we didn't think carefully enough about the spec.

As it stands, if you say "100%", we will allow evictions if all the pods are ready. If we want "100%" to mean "never permit a 'voluntary' eviction", I will clarify the docs, and changing the code and tests will be easy.

Or maybe the right thing is to open an issue to discuss and track this, and for now document the way it works?

Copy link
Member

@davidopp davidopp Sep 20, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes sorry I didn't catch this sooner but I think the semantics you described are not what people are going to expect. I now see that the comment in the proto, "the minimum number of pods that must be available simultaneously" is ambiguous. And I guess I wasn't thinking about this when I was looking at the code and tests. In any event, I do think we want people to be able to say "don't allow any voluntary evictions". Can you send a PR to change the behavior and tests, and change the documentation to say something like what I suggested? Very sorry about that. Thanks for noting the discrepancy with my suggestion!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also can you update the comment on the proto to describe the new behavior.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes to all of these. Alright if I address the semantic change in a followup when I actually change the code? It'll be sent today, but I would like to get this merged in some form, and it is correct as-is.

pods, and a description of the minimum number of available pods for a disruption
to be allowed. The latter can be either an absolute number or a percentage. In
typical usage, a single budget would be used for a collection of pods managed by
a controller.
Copy link
Member

@davidopp davidopp Sep 19, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add ", for example the pods in a single ReplicaSet"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

3. If there is some kind of misconfiguration, like multiple budgets pointing at
the same pod, you will get `500 Internal Server Error`.

For a given eviction, there are two cases.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/eviction/eviction request/

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

to be allowed. The latter can be either an absolute number or a percentage. In
typical usage, a single budget would be used for a collection of pods managed by
a controller.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would add something at the end that says "Note that DisruptionBudget does not truly guarantee that the specified number/percentage of pods will always be up -- for example, a node that hosts a pod from the collection may fail when the collection is at the minimum size specified in the PodDisruptionBudget, thus bringing the number of available pods from the collection below the specified size. DisruptionBudget can only protect against voluntary evictions, not all causes of unavailability."

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@mml
Copy link
Contributor Author

mml commented Sep 19, 2016

@davidopp PTAL

@davidopp
Copy link
Member

Sure, LGTM.

@devin-donnelly devin-donnelly merged commit 67fb745 into kubernetes:release-1.4 Sep 20, 2016
mikutas pushed a commit to mikutas/k8s-website that referenced this pull request Sep 22, 2022
…tes#1290)

This back-ports the install instructions from the 2.12 branch (kubernetes#1244)
into the 2.11 branch, specifically for the edge helm charts.

Co-authored-by: cpretzer <[email protected]>
mikutas pushed a commit to mikutas/k8s-website that referenced this pull request Sep 22, 2022
Fixes linkerd/linkerd2#7745

Copied `grafana.md` doc from kubernetes#1244 from the 2.12 docs into 2.11, for
folks that are on edge releases, that already have stripped-off the
Grafana instance.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants