Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid sending events for every non-conformant pod in disruption controller #98128

Merged
merged 1 commit into from
Mar 4, 2021

Conversation

mortent
Copy link
Member

@mortent mortent commented Jan 18, 2021

What type of PR is this?
/kind feature

What this PR does / why we need it:
The disruption controller currently sends an event for every pod covered by a PDB selector that doesn't conform (either doesn't have a controller or the controller doesn't implement scale) when scale is needed (every PDB configuration except when minAvailable is a number). This can have scalability implications.
With this PR, we will instead only send a single event (CalculateExpectedPodCountFailed) per a PDB when this happens.

We can also consider eventually using information from the PDB condition introduced in #98127 to avoid sending an event on every reconcile. But this change should address the scalability concerns.

This change doesn't fully follow the plan as described in the PDB to GA KEP: kubernetes/enhancements#2114. The KEP will be updated to reflect this change in plans.

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

NONE

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

@kubernetes/sig-apps-pr-reviews

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/feature Categorizes issue or PR as related to a new feature. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Jan 18, 2021
@k8s-ci-robot k8s-ci-robot added sig/apps Categorizes an issue or PR as relevant to SIG Apps. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Jan 18, 2021
@mortent mortent force-pushed the AvoidErrOnNonScale branch from 464073d to e382cda Compare January 18, 2021 02:50
@mortent
Copy link
Member Author

mortent commented Jan 18, 2021

This PR is similar to #85553. It was never merged, but the discussion mostly applies to this PR too.

@mortent
Copy link
Member Author

mortent commented Jan 31, 2021

/assign @soltysh

@mortent
Copy link
Member Author

mortent commented Feb 15, 2021

/test pull-kubernetes-e2e-gce-100-performance

dc.recorder.Event(pdb, v1.EventTypeWarning, "NoControllerRef", err.Error())
return
// Only create event if one hasn't been sent already.
if !slice.ContainsString(knownNonScalePods, pod.Name, nil) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem with this approach is that for controllers re-creating pods this might grow indefinitely and you're not cleaning that list anywhere other than when a PDB is removed. In the case described in #77383 the cronjob will end up creating jobs let's say every minute. You can easily fill that cache with names of those pods within a few minutes if the cronjob defines sufficiently big task, with hundreds of pods.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the list of pods are "recreated" on every reconcile, so the list will never contain more entries than the number of pods in the cluster that is covered by the PDB. If we don't want to keep this list, then it seems like we either need to accept that we will generate events for all non-conforming pods on every reconcile (which has scalability concerns), or change the way we handle events for PDBs to no longer send an event per pod, but instead just create one event per PDB (in which case maybe sending an event per reconcile is also less of an issue).

@soltysh
Copy link
Contributor

soltysh commented Feb 22, 2021

* It doesn't add a condition for this situation, but instead relies on events. I haven't found a way to make conditions work well for this, in particular making the information in a condition useful when multiple pods might lack a scale controller.

The benefit of condition over event is that events will eventually disappear (based on cluster's --event-ttl setting, which is 1h by default iirc), and conditions will be visible for the entire lifetime of a PDB. The problem I'm seeing is that PDBs don't have conditions field, so we'd need to add one.

@michaelgugino
Copy link
Contributor

I agree with the condition comment. That also seems relevant to the other PR.

It seems to me the current behavior of 'giving up' is possibly the right course of action. Rather than attempt to work around this issue, a status message explaining the situation would be helpful; EG "Some matching pods do not have a controller that implements scale. Percentage MinAvailable/MaxUnavailable is invalid."

Was there a particular place in the KEP this was discussed in more detail?

@mortent
Copy link
Member Author

mortent commented Feb 24, 2021

@michaelgugino I think that is a good point. With the changes in #98346 and #98127 we do improve the visibility of any issues with the pods covered by a PDB, so maybe it is better to "fail hard" by blocking disruptions in this situation rather than "limping along" with the solution in this PR. I'm open to just closing this (and updating the KEP) if we decide that is the better solution.
This is discussed in the KEP

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 24, 2021
@michaelgugino
Copy link
Contributor

@mortent

So, I reviewed the following section: https://github.com/kubernetes/enhancements/tree/master/keps/sig-apps/85-Graduate-PDB-to-Stable#make-the-disruption-controller-more-lenient-for-pods-belonging-to-non-scale-controllers

The issue referenced is this: #77383

I think we came to the conclusion in that issue, PDBs cover cron jobs is desirable, and the core issue was a label overlap between deployment and cronjob.

Reading the suggested solution, I disagree with the proposal. Ignoring some pods while applying the PDB to others is probably not what the user wanted, and they probably expect that all the pods are covered by PDBs in some form or another. I the issue, that particular user didn't need this, but other users do. Personal experience tells me that the average user isn't watching events, so they won't know something is off until their pods start getting evicted unexpectedly.

In one use case I'm familiar with, PDBs are used on batch-type jobs (I'm unsure of the exact scheduling controller, but they are not from a scaling controller). The intent is to block all evictions to let the jobs run to completion. They might have set 0, or they might have set 0%, I'm not sure. In the latter case, we'll regress for anyone that happens to be using %.

I'm also concerned about the eviction side of this story. Computing DisruptionsAllowed is now only 1 component of eviction. The other components are CurrentyHealthy and DesiredHealthy: https://github.com/kubernetes/kubernetes/pull/94381/files#diff-6105105bc4d40e3d45b9cc38165516be0205f0329929421fd7df65df7abaeb2cR208

For me, I think the approach here would work, but it adds undesired complexity to the UX that will require a detailed explanation of which pods do an don't apply to a PDB under different conditions, and I'm not sure anybody really needs this feature.

@mortent
Copy link
Member Author

mortent commented Feb 24, 2021

@michaelgugino Having PDBs that cover CronJobs can be useful, but in those cases users will have to set the PDB up with minAvailable and a number of pods (rather than a percentage).
So if I understand correctly, you are suggesting that the current behavior when we find non-scale pods when we try to compute scale is the best option? It avoids the more complicated issues with trying to ignore some pods. And as I mentioned above, if we add conditions through #98127, it will be easier for users to understand the cause of blocked evictions.

@michaelgugino
Copy link
Contributor

@michaelgugino Having PDBs that cover CronJobs can be useful, but in those cases users will have to set the PDB up with minAvailable and a number of pods (rather than a percentage).
So if I understand correctly, you are suggesting that the current behavior when we find non-scale pods when we try to compute scale is the best option? It avoids the more complicated issues with trying to ignore some pods. And as I mentioned above, if we add conditions through #98127, it will be easier for users to understand the cause of blocked evictions.

Yes, I think we are saying the same thing.

minAvailable as integer: Any kind of pod is okay
minAvailable as percentage: If pods without scale controllers are present, give error in conditions.

If this is the current behavior (except conditions which we are adding), then yes, current behavior :)

@mortent
Copy link
Member Author

mortent commented Feb 25, 2021

If @soltysh also agrees with that approach, I can go back and create an update to the KEP.

We should still figure out how to handle the events. We are currently generating events for non-conforming pods on every reconcile, which @wojtek-t has already flagged can lead to scalability issues. So I think we have two options here:

  • Continue to send events for every pod, but use a local cache to avoid sending on every reconcile. As @soltysh has highlighted in a different comment on this PR, this might lead to a large cache if the PDB is targeting many pods.
  • Just send a single event for the PDB. We are actually already creating an event for the PDB whenever a reconcile fails. So maybe that is sufficient. This would probably reduce the concerns about a high number of events, but we can consider using a cache even for this situation.

@michaelgugino
Copy link
Contributor

I've seen some use cases where 1 pod gets 1 PDB. If we're going to use conditions, perhaps we should emit an event when a particular condition transitions. I prefer the per-pdb approach, and emitting an event when the condition transitions keeps the noise down and also allows us to track state without having a local cache or adding some kind of tracking field.

@soltysh
Copy link
Contributor

soltysh commented Feb 25, 2021

👍 for per-pdb approach, that will solve it nicely.

@mortent mortent force-pushed the AvoidErrOnNonScale branch from 9d675a2 to 5ad4280 Compare March 1, 2021 03:45
@k8s-ci-robot k8s-ci-robot added size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Mar 1, 2021
@mortent mortent changed the title Ignore non-scale pods in disruption controller instead of error Avoid sending events for every non-conformant pod in disruption controller Mar 1, 2021
@mortent
Copy link
Member Author

mortent commented Mar 1, 2021

@soltysh @michaelgugino
I have updated the PR to simply just remove the code that sends an event for every reconcile of a PDB. I'm not sure if we should try to optimize it further. Since the error included in the event might change (if the user fixes some but not all of the non-conforming pods), we would need to compare not only the type of error, but actually the error messages themselves.

@wojtek-t
Copy link
Member

wojtek-t commented Mar 1, 2021

+1 for per-pdb approach, that will solve it nicely.

+1

@mortent
Copy link
Member Author

mortent commented Mar 4, 2021

@soltysh I have updated the PR to just avoid creating events for every non-conformant pod. It will still create an event for the PDB on every reconcile that fails. Once #98127 is merged, the error message will be included in the condition when sync fails.

@michaelgugino
Copy link
Contributor

I think we should only emit events when a condition transitions from one value to another, or if the reason changes. One event per PDB per reconcile is a lot of useless events.

@soltysh
Copy link
Contributor

soltysh commented Mar 4, 2021

I think we should only emit events when a condition transitions from one value to another, or if the reason changes. One event per PDB per reconcile is a lot of useless events.

Controllers shouldn't count on seeing a change, but react to state (https://github.com/kubernetes/community/blob/master/contributors/devel/sig-api-machinery/controllers.md)

Copy link
Contributor

@soltysh soltysh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve
/triage accepted
/priority backlog

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. priority/backlog Higher priority than priority/awaiting-more-evidence. lgtm "Looks good to me", indicates that a PR is ready to be merged. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Mar 4, 2021
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: mortent, soltysh

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 4, 2021
@k8s-ci-robot k8s-ci-robot merged commit de9821c into kubernetes:master Mar 4, 2021
@k8s-ci-robot k8s-ci-robot added this to the v1.21 milestone Mar 4, 2021
@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. and removed release-note Denotes a PR that will be considered when it comes time to generate release notes. labels Mar 29, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/backlog Higher priority than priority/awaiting-more-evidence. release-note-none Denotes a PR that doesn't merit a release note. sig/apps Categorizes an issue or PR as relevant to SIG Apps. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants