-
Notifications
You must be signed in to change notification settings - Fork 40.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid sending events for every non-conformant pod in disruption controller #98128
Conversation
464073d
to
e382cda
Compare
This PR is similar to #85553. It was never merged, but the discussion mostly applies to this PR too. |
/assign @soltysh |
e382cda
to
9d675a2
Compare
/test pull-kubernetes-e2e-gce-100-performance |
dc.recorder.Event(pdb, v1.EventTypeWarning, "NoControllerRef", err.Error()) | ||
return | ||
// Only create event if one hasn't been sent already. | ||
if !slice.ContainsString(knownNonScalePods, pod.Name, nil) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem with this approach is that for controllers re-creating pods this might grow indefinitely and you're not cleaning that list anywhere other than when a PDB is removed. In the case described in #77383 the cronjob will end up creating jobs let's say every minute. You can easily fill that cache with names of those pods within a few minutes if the cronjob defines sufficiently big task, with hundreds of pods.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the list of pods are "recreated" on every reconcile, so the list will never contain more entries than the number of pods in the cluster that is covered by the PDB. If we don't want to keep this list, then it seems like we either need to accept that we will generate events for all non-conforming pods on every reconcile (which has scalability concerns), or change the way we handle events for PDBs to no longer send an event per pod, but instead just create one event per PDB (in which case maybe sending an event per reconcile is also less of an issue).
The benefit of condition over event is that events will eventually disappear (based on cluster's |
I agree with the condition comment. That also seems relevant to the other PR. It seems to me the current behavior of 'giving up' is possibly the right course of action. Rather than attempt to work around this issue, a status message explaining the situation would be helpful; EG "Some matching pods do not have a controller that implements scale. Percentage MinAvailable/MaxUnavailable is invalid." Was there a particular place in the KEP this was discussed in more detail? |
@michaelgugino I think that is a good point. With the changes in #98346 and #98127 we do improve the visibility of any issues with the pods covered by a PDB, so maybe it is better to "fail hard" by blocking disruptions in this situation rather than "limping along" with the solution in this PR. I'm open to just closing this (and updating the KEP) if we decide that is the better solution. |
So, I reviewed the following section: https://github.com/kubernetes/enhancements/tree/master/keps/sig-apps/85-Graduate-PDB-to-Stable#make-the-disruption-controller-more-lenient-for-pods-belonging-to-non-scale-controllers The issue referenced is this: #77383 I think we came to the conclusion in that issue, PDBs cover cron jobs is desirable, and the core issue was a label overlap between deployment and cronjob. Reading the suggested solution, I disagree with the proposal. Ignoring some pods while applying the PDB to others is probably not what the user wanted, and they probably expect that all the pods are covered by PDBs in some form or another. I the issue, that particular user didn't need this, but other users do. Personal experience tells me that the average user isn't watching events, so they won't know something is off until their pods start getting evicted unexpectedly. In one use case I'm familiar with, PDBs are used on batch-type jobs (I'm unsure of the exact scheduling controller, but they are not from a scaling controller). The intent is to block all evictions to let the jobs run to completion. They might have set 0, or they might have set 0%, I'm not sure. In the latter case, we'll regress for anyone that happens to be using %. I'm also concerned about the eviction side of this story. Computing For me, I think the approach here would work, but it adds undesired complexity to the UX that will require a detailed explanation of which pods do an don't apply to a PDB under different conditions, and I'm not sure anybody really needs this feature. |
@michaelgugino Having PDBs that cover CronJobs can be useful, but in those cases users will have to set the PDB up with |
Yes, I think we are saying the same thing. minAvailable as integer: Any kind of pod is okay If this is the current behavior (except conditions which we are adding), then yes, current behavior :) |
If @soltysh also agrees with that approach, I can go back and create an update to the KEP. We should still figure out how to handle the events. We are currently generating events for non-conforming pods on every reconcile, which @wojtek-t has already flagged can lead to scalability issues. So I think we have two options here:
|
I've seen some use cases where 1 pod gets 1 PDB. If we're going to use conditions, perhaps we should emit an event when a particular condition transitions. I prefer the per-pdb approach, and emitting an event when the condition transitions keeps the noise down and also allows us to track state without having a local cache or adding some kind of tracking field. |
👍 for per-pdb approach, that will solve it nicely. |
9d675a2
to
5ad4280
Compare
@soltysh @michaelgugino |
+1 |
I think we should only emit events when a condition transitions from one value to another, or if the reason changes. One event per PDB per reconcile is a lot of useless events. |
Controllers shouldn't count on seeing a change, but react to state (https://github.com/kubernetes/community/blob/master/contributors/devel/sig-api-machinery/controllers.md) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/approve
/triage accepted
/priority backlog
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: mortent, soltysh The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
What type of PR is this?
/kind feature
What this PR does / why we need it:
The disruption controller currently sends an event for every pod covered by a PDB selector that doesn't conform (either doesn't have a controller or the controller doesn't implement scale) when scale is needed (every PDB configuration except when
minAvailable
is a number). This can have scalability implications.With this PR, we will instead only send a single event (
CalculateExpectedPodCountFailed
) per a PDB when this happens.We can also consider eventually using information from the PDB condition introduced in #98127 to avoid sending an event on every reconcile. But this change should address the scalability concerns.
This change doesn't fully follow the plan as described in the PDB to GA KEP: kubernetes/enhancements#2114. The KEP will be updated to reflect this change in plans.
Special notes for your reviewer:
Does this PR introduce a user-facing change?:
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:
@kubernetes/sig-apps-pr-reviews