Avoid sending events for every non-conformant pod in disruption controller #98128

mortent · 2021-01-18T00:42:42Z

What type of PR is this?
/kind feature

What this PR does / why we need it:
The disruption controller currently sends an event for every pod covered by a PDB selector that doesn't conform (either doesn't have a controller or the controller doesn't implement scale) when scale is needed (every PDB configuration except when minAvailable is a number). This can have scalability implications.
With this PR, we will instead only send a single event (CalculateExpectedPodCountFailed) per a PDB when this happens.

We can also consider eventually using information from the PDB condition introduced in #98127 to avoid sending an event on every reconcile. But this change should address the scalability concerns.

This change doesn't fully follow the plan as described in the PDB to GA KEP: kubernetes/enhancements#2114. The KEP will be updated to reflect this change in plans.

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

NONE

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

KEP: Update KEP for graduating PodDisruptionBudget to GA enhancements#2114

@kubernetes/sig-apps-pr-reviews

mortent · 2021-01-18T19:30:06Z

This PR is similar to #85553. It was never merged, but the discussion mostly applies to this PR too.

mortent · 2021-01-31T18:50:20Z

/assign @soltysh

mortent · 2021-02-15T02:19:18Z

/test pull-kubernetes-e2e-gce-100-performance

soltysh · 2021-02-22T20:30:37Z

pkg/controller/disruption/disruption.go

-			dc.recorder.Event(pdb, v1.EventTypeWarning, "NoControllerRef", err.Error())
-			return
+			// Only create event if one hasn't been sent already.
+			if !slice.ContainsString(knownNonScalePods, pod.Name, nil) {


The problem with this approach is that for controllers re-creating pods this might grow indefinitely and you're not cleaning that list anywhere other than when a PDB is removed. In the case described in #77383 the cronjob will end up creating jobs let's say every minute. You can easily fill that cache with names of those pods within a few minutes if the cronjob defines sufficiently big task, with hundreds of pods.

So the list of pods are "recreated" on every reconcile, so the list will never contain more entries than the number of pods in the cluster that is covered by the PDB. If we don't want to keep this list, then it seems like we either need to accept that we will generate events for all non-conforming pods on every reconcile (which has scalability concerns), or change the way we handle events for PDBs to no longer send an event per pod, but instead just create one event per PDB (in which case maybe sending an event per reconcile is also less of an issue).

soltysh · 2021-02-22T20:30:55Z

* It doesn't add a condition for this situation, but instead relies on events. I haven't found a way to make conditions work well for this, in particular making the information in a condition useful when multiple pods might lack a scale controller.

The benefit of condition over event is that events will eventually disappear (based on cluster's --event-ttl setting, which is 1h by default iirc), and conditions will be visible for the entire lifetime of a PDB. The problem I'm seeing is that PDBs don't have conditions field, so we'd need to add one.

michaelgugino · 2021-02-22T21:36:40Z

I agree with the condition comment. That also seems relevant to the other PR.

It seems to me the current behavior of 'giving up' is possibly the right course of action. Rather than attempt to work around this issue, a status message explaining the situation would be helpful; EG "Some matching pods do not have a controller that implements scale. Percentage MinAvailable/MaxUnavailable is invalid."

Was there a particular place in the KEP this was discussed in more detail?

mortent · 2021-02-24T02:42:08Z

@michaelgugino I think that is a good point. With the changes in #98346 and #98127 we do improve the visibility of any issues with the pods covered by a PDB, so maybe it is better to "fail hard" by blocking disruptions in this situation rather than "limping along" with the solution in this PR. I'm open to just closing this (and updating the KEP) if we decide that is the better solution.
This is discussed in the KEP

michaelgugino · 2021-02-24T13:51:33Z

@mortent

So, I reviewed the following section: https://github.com/kubernetes/enhancements/tree/master/keps/sig-apps/85-Graduate-PDB-to-Stable#make-the-disruption-controller-more-lenient-for-pods-belonging-to-non-scale-controllers

The issue referenced is this: #77383

I think we came to the conclusion in that issue, PDBs cover cron jobs is desirable, and the core issue was a label overlap between deployment and cronjob.

Reading the suggested solution, I disagree with the proposal. Ignoring some pods while applying the PDB to others is probably not what the user wanted, and they probably expect that all the pods are covered by PDBs in some form or another. I the issue, that particular user didn't need this, but other users do. Personal experience tells me that the average user isn't watching events, so they won't know something is off until their pods start getting evicted unexpectedly.

In one use case I'm familiar with, PDBs are used on batch-type jobs (I'm unsure of the exact scheduling controller, but they are not from a scaling controller). The intent is to block all evictions to let the jobs run to completion. They might have set 0, or they might have set 0%, I'm not sure. In the latter case, we'll regress for anyone that happens to be using %.

I'm also concerned about the eviction side of this story. Computing DisruptionsAllowed is now only 1 component of eviction. The other components are CurrentyHealthy and DesiredHealthy: https://github.com/kubernetes/kubernetes/pull/94381/files#diff-6105105bc4d40e3d45b9cc38165516be0205f0329929421fd7df65df7abaeb2cR208

For me, I think the approach here would work, but it adds undesired complexity to the UX that will require a detailed explanation of which pods do an don't apply to a PDB under different conditions, and I'm not sure anybody really needs this feature.

mortent · 2021-02-24T21:10:59Z

@michaelgugino Having PDBs that cover CronJobs can be useful, but in those cases users will have to set the PDB up with minAvailable and a number of pods (rather than a percentage).
So if I understand correctly, you are suggesting that the current behavior when we find non-scale pods when we try to compute scale is the best option? It avoids the more complicated issues with trying to ignore some pods. And as I mentioned above, if we add conditions through #98127, it will be easier for users to understand the cause of blocked evictions.

michaelgugino · 2021-02-24T21:19:14Z

@michaelgugino Having PDBs that cover CronJobs can be useful, but in those cases users will have to set the PDB up with minAvailable and a number of pods (rather than a percentage).
So if I understand correctly, you are suggesting that the current behavior when we find non-scale pods when we try to compute scale is the best option? It avoids the more complicated issues with trying to ignore some pods. And as I mentioned above, if we add conditions through #98127, it will be easier for users to understand the cause of blocked evictions.

Yes, I think we are saying the same thing.

minAvailable as integer: Any kind of pod is okay
minAvailable as percentage: If pods without scale controllers are present, give error in conditions.

If this is the current behavior (except conditions which we are adding), then yes, current behavior :)

mortent · 2021-02-25T08:47:26Z

If @soltysh also agrees with that approach, I can go back and create an update to the KEP.

We should still figure out how to handle the events. We are currently generating events for non-conforming pods on every reconcile, which @wojtek-t has already flagged can lead to scalability issues. So I think we have two options here:

Continue to send events for every pod, but use a local cache to avoid sending on every reconcile. As @soltysh has highlighted in a different comment on this PR, this might lead to a large cache if the PDB is targeting many pods.
Just send a single event for the PDB. We are actually already creating an event for the PDB whenever a reconcile fails. So maybe that is sufficient. This would probably reduce the concerns about a high number of events, but we can consider using a cache even for this situation.

michaelgugino · 2021-02-25T13:36:03Z

I've seen some use cases where 1 pod gets 1 PDB. If we're going to use conditions, perhaps we should emit an event when a particular condition transitions. I prefer the per-pdb approach, and emitting an event when the condition transitions keeps the noise down and also allows us to track state without having a local cache or adding some kind of tracking field.

soltysh · 2021-02-25T17:52:41Z

👍 for per-pdb approach, that will solve it nicely.

…oller

mortent · 2021-03-01T03:58:03Z

@soltysh @michaelgugino
I have updated the PR to simply just remove the code that sends an event for every reconcile of a PDB. I'm not sure if we should try to optimize it further. Since the error included in the event might change (if the user fixes some but not all of the non-conforming pods), we would need to compare not only the type of error, but actually the error messages themselves.

wojtek-t · 2021-03-01T12:06:29Z

+1 for per-pdb approach, that will solve it nicely.

+1

mortent · 2021-03-04T03:51:02Z

@soltysh I have updated the PR to just avoid creating events for every non-conformant pod. It will still create an event for the PDB on every reconcile that fails. Once #98127 is merged, the error message will be included in the condition when sync fails.

michaelgugino · 2021-03-04T14:16:20Z

I think we should only emit events when a condition transitions from one value to another, or if the reason changes. One event per PDB per reconcile is a lot of useless events.

soltysh · 2021-03-04T20:27:51Z

I think we should only emit events when a condition transitions from one value to another, or if the reason changes. One event per PDB per reconcile is a lot of useless events.

Controllers shouldn't count on seeing a change, but react to state (https://github.com/kubernetes/community/blob/master/contributors/devel/sig-api-machinery/controllers.md)

soltysh

/lgtm
/approve
/triage accepted
/priority backlog

k8s-ci-robot · 2021-03-04T20:32:30Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: mortent, soltysh

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~pkg/controller/disruption/OWNERS~~ [soltysh]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

k8s-ci-robot requested review from foxish and krmayankk January 18, 2021 00:43

k8s-ci-robot added sig/apps Categorizes an issue or PR as relevant to SIG Apps. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Jan 18, 2021

mortent force-pushed the AvoidErrOnNonScale branch from 464073d to e382cda Compare January 18, 2021 02:50

k8s-ci-robot assigned soltysh Jan 31, 2021

mortent force-pushed the AvoidErrOnNonScale branch from e382cda to 9d675a2 Compare February 15, 2021 00:54

soltysh requested changes Feb 22, 2021

View reviewed changes

k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 24, 2021

Avoid sending events for every non-conformant pod in disruption contr…

5ad4280

…oller

mortent force-pushed the AvoidErrOnNonScale branch from 9d675a2 to 5ad4280 Compare March 1, 2021 03:45

k8s-ci-robot added size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Mar 1, 2021

mortent changed the title ~~Ignore non-scale pods in disruption controller instead of error~~ Avoid sending events for every non-conformant pod in disruption controller Mar 1, 2021

soltysh approved these changes Mar 4, 2021

View reviewed changes

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 4, 2021

k8s-ci-robot merged commit de9821c into kubernetes:master Mar 4, 2021

k8s-ci-robot added this to the v1.21 milestone Mar 4, 2021

k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. and removed release-note Denotes a PR that will be considered when it comes time to generate release notes. labels Mar 29, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Avoid sending events for every non-conformant pod in disruption controller #98128

Avoid sending events for every non-conformant pod in disruption controller #98128

mortent commented Jan 18, 2021 •

edited by liggitt

Loading

mortent commented Jan 18, 2021

mortent commented Jan 31, 2021

mortent commented Feb 15, 2021

soltysh Feb 22, 2021

mortent Feb 24, 2021

soltysh commented Feb 22, 2021

michaelgugino commented Feb 22, 2021

mortent commented Feb 24, 2021

michaelgugino commented Feb 24, 2021

mortent commented Feb 24, 2021

michaelgugino commented Feb 24, 2021

mortent commented Feb 25, 2021

michaelgugino commented Feb 25, 2021

soltysh commented Feb 25, 2021

mortent commented Mar 1, 2021

wojtek-t commented Mar 1, 2021

mortent commented Mar 4, 2021

michaelgugino commented Mar 4, 2021

soltysh commented Mar 4, 2021

soltysh left a comment

k8s-ci-robot commented Mar 4, 2021

Avoid sending events for every non-conformant pod in disruption controller #98128

Avoid sending events for every non-conformant pod in disruption controller #98128

Conversation

mortent commented Jan 18, 2021 • edited by liggitt Loading

mortent commented Jan 18, 2021

mortent commented Jan 31, 2021

mortent commented Feb 15, 2021

soltysh Feb 22, 2021

Choose a reason for hiding this comment

mortent Feb 24, 2021

Choose a reason for hiding this comment

soltysh commented Feb 22, 2021

michaelgugino commented Feb 22, 2021

mortent commented Feb 24, 2021

michaelgugino commented Feb 24, 2021

mortent commented Feb 24, 2021

michaelgugino commented Feb 24, 2021

mortent commented Feb 25, 2021

michaelgugino commented Feb 25, 2021

soltysh commented Feb 25, 2021

mortent commented Mar 1, 2021

wojtek-t commented Mar 1, 2021

mortent commented Mar 4, 2021

michaelgugino commented Mar 4, 2021

soltysh commented Mar 4, 2021

soltysh left a comment

Choose a reason for hiding this comment

k8s-ci-robot commented Mar 4, 2021

mortent commented Jan 18, 2021 •

edited by liggitt

Loading