-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Eviction controller #942
base: main
Are you sure you want to change the base?
Conversation
0d52af4
to
e910804
Compare
c1e80cd
to
0b76225
Compare
var crp placementv1beta1.ClusterResourcePlacement | ||
if err := r.Client.Get(ctx, types.NamespacedName{Name: eviction.Spec.PlacementName}, &crp); err != nil { | ||
if !errors.IsNotFound(err) { | ||
return runtime.Result{}, err |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
controller.NewAPIServerError(
if evictionTargetBinding == nil { | ||
evictionTargetBinding = &crbList.Items[i] | ||
} else { | ||
klog.V(2).InfoS(evictionInvalidMultipleCRB, "clusterResourcePlacementEviction", evictionName, "clusterResourcePlacement", eviction.Spec.PlacementName) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how can this happen? We don't have two bindings point to the same cluster
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This check was added to cover the case handled here https://github.com/Azure/fleet/blob/main/pkg/controllers/rollout/controller.go#L224
var db placementv1alpha1.ClusterResourcePlacementDisruptionBudget | ||
if err := r.Client.Get(ctx, types.NamespacedName{Name: crp.Name}, &db); err != nil { | ||
if !errors.IsNotFound(err) { | ||
return runtime.Result{}, err |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
controller.NewAPIServerError(
} | ||
markEvictionExecuted(&eviction, fmt.Sprintf(evictionAllowedPDBSpecified, disruptionsAllowed, availableBindings, desiredBindings, totalBindings)) | ||
} else { | ||
markEvictionNotExecuted(&eviction, fmt.Sprintf(evictionBlockedPDBSpecified, disruptionsAllowed, availableBindings, desiredBindings, totalBindings)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this will stop the eviction forever as per line 71?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
disruptionsAllowed is always <= 0 when it is blocked, that number seems not useful
if err := r.deleteClusterResourceBinding(ctx, evictionTargetBinding); err != nil { | ||
return runtime.Result{}, err | ||
} | ||
markEvictionExecuted(&eviction, fmt.Sprintf(evictionAllowedPDBSpecified, disruptionsAllowed, availableBindings, desiredBindings, totalBindings)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do user care "disruptionsAllowed" number?
var desiredBindings int | ||
switch crp.Spec.Policy.PlacementType { | ||
case placementv1beta1.PickAllPlacementType: | ||
desiredBindings = len(crbList.Items) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The list includes bindings that are "noScheduled", it's not the accurate representation of the desired number of bindings
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added some comments, PTAL 🙏
if evictionTargetBinding == nil { | ||
evictionTargetBinding = &crbList.Items[i] | ||
} else { | ||
klog.V(2).InfoS(evictionInvalidMultipleCRB, "clusterResourcePlacementEviction", evictionName, "clusterResourcePlacement", eviction.Spec.PlacementName) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Arvind! I think our scheduler/rollout controller has been configured to not create two bindings for the same cluster.
} | ||
} | ||
var disruptionsAllowed int | ||
if db.Spec.MaxUnavailable != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Arvind! The two fields are mutually exclusive and on the API definition there hasn't been validation rules yet; considering that our webhooks offer best-effort protection only; it might be better for the code here to be a switch clause (one outcome only) rather than multiple if checks (can run multiple times).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have API validation to prevent this concern https://github.com/Azure/fleet/blob/main/apis/placement/v1alpha1/disruptionbudget_types.go#L30
But I do agree that there should only be one outcome though
} | ||
|
||
// isEvictionAllowed calculates if eviction allowed based on available bindings and spec specified in placement disruption budget. | ||
func isEvictionAllowed(desiredBindings int, bindings []placementv1beta1.ClusterResourceBinding, db placementv1alpha1.ClusterResourcePlacementDisruptionBudget) (bool, int, int) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if the user is trying to evict an unavailable binding?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need unavailable binding policy similar to https://kubernetes.io/docs/tasks/run-application/configure-pdb/#unhealthy-pod-eviction-policy
reasonClusterResourcePlacementEvictionExecuted = "ClusterResourcePlacementEvictionExecuted" | ||
reasonClusterResourcePlacementEvictionNotExecuted = "ClusterResourcePlacementEvictionNotExecuted" | ||
|
||
evictionInvalidMissingCRP = "Failed to find cluster resource placement targeted by eviction" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Arvind! Just a nit: would you mind adding a Message
suffix/prefix to the variables, just to make things a bit more clear?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, binding are generally speaking considered to be our internal APIs, to keep things a bit more clear on the user end, it might make more sense to just use cluster references in the message part.
// SetupWithManager sets up the controller with the Manager. | ||
func (r *Reconciler) SetupWithManager(mgr runtime.Manager) error { | ||
return runtime.NewControllerManagedBy(mgr). | ||
WithOptions(ctrl.Options{MaxConcurrentReconciles: 1}). // set the max number of concurrent reconciles |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A nit: we might need to add a note here to explain why we have to use 1 max. concurrent reconciliations for now.
Description of your changes
Fixes #
I have:
make reviewable
to ensure this PR is ready for review.How has this code been tested
UTs and ITs
Special notes for your reviewer
Changes to rollout controller needs to be present to add E2E tests