Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Eviction controller #942

Open
wants to merge 5 commits into
base: main
Choose a base branch
from
Open

Conversation

Arvindthiru
Copy link
Contributor

@Arvindthiru Arvindthiru commented Oct 31, 2024

Description of your changes

Fixes #

I have:

  • Run make reviewable to ensure this PR is ready for review.

How has this code been tested

UTs and ITs

Special notes for your reviewer

Changes to rollout controller needs to be present to add E2E tests

@Arvindthiru Arvindthiru marked this pull request as ready for review November 13, 2024 04:10
var crp placementv1beta1.ClusterResourcePlacement
if err := r.Client.Get(ctx, types.NamespacedName{Name: eviction.Spec.PlacementName}, &crp); err != nil {
if !errors.IsNotFound(err) {
return runtime.Result{}, err
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

controller.NewAPIServerError(

if evictionTargetBinding == nil {
evictionTargetBinding = &crbList.Items[i]
} else {
klog.V(2).InfoS(evictionInvalidMultipleCRB, "clusterResourcePlacementEviction", evictionName, "clusterResourcePlacement", eviction.Spec.PlacementName)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how can this happen? We don't have two bindings point to the same cluster

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

var db placementv1alpha1.ClusterResourcePlacementDisruptionBudget
if err := r.Client.Get(ctx, types.NamespacedName{Name: crp.Name}, &db); err != nil {
if !errors.IsNotFound(err) {
return runtime.Result{}, err
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

controller.NewAPIServerError(

}
markEvictionExecuted(&eviction, fmt.Sprintf(evictionAllowedPDBSpecified, disruptionsAllowed, availableBindings, desiredBindings, totalBindings))
} else {
markEvictionNotExecuted(&eviction, fmt.Sprintf(evictionBlockedPDBSpecified, disruptionsAllowed, availableBindings, desiredBindings, totalBindings))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this will stop the eviction forever as per line 71?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

disruptionsAllowed is always <= 0 when it is blocked, that number seems not useful

if err := r.deleteClusterResourceBinding(ctx, evictionTargetBinding); err != nil {
return runtime.Result{}, err
}
markEvictionExecuted(&eviction, fmt.Sprintf(evictionAllowedPDBSpecified, disruptionsAllowed, availableBindings, desiredBindings, totalBindings))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do user care "disruptionsAllowed" number?

var desiredBindings int
switch crp.Spec.Policy.PlacementType {
case placementv1beta1.PickAllPlacementType:
desiredBindings = len(crbList.Items)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The list includes bindings that are "noScheduled", it's not the accurate representation of the desired number of bindings

Copy link
Contributor

@michaelawyu michaelawyu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some comments, PTAL 🙏

if evictionTargetBinding == nil {
evictionTargetBinding = &crbList.Items[i]
} else {
klog.V(2).InfoS(evictionInvalidMultipleCRB, "clusterResourcePlacementEviction", evictionName, "clusterResourcePlacement", eviction.Spec.PlacementName)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Arvind! I think our scheduler/rollout controller has been configured to not create two bindings for the same cluster.

}
}
var disruptionsAllowed int
if db.Spec.MaxUnavailable != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Arvind! The two fields are mutually exclusive and on the API definition there hasn't been validation rules yet; considering that our webhooks offer best-effort protection only; it might be better for the code here to be a switch clause (one outcome only) rather than multiple if checks (can run multiple times).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have API validation to prevent this concern https://github.com/Azure/fleet/blob/main/apis/placement/v1alpha1/disruptionbudget_types.go#L30

But I do agree that there should only be one outcome though

}

// isEvictionAllowed calculates if eviction allowed based on available bindings and spec specified in placement disruption budget.
func isEvictionAllowed(desiredBindings int, bindings []placementv1beta1.ClusterResourceBinding, db placementv1alpha1.ClusterResourcePlacementDisruptionBudget) (bool, int, int) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if the user is trying to evict an unavailable binding?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reasonClusterResourcePlacementEvictionExecuted = "ClusterResourcePlacementEvictionExecuted"
reasonClusterResourcePlacementEvictionNotExecuted = "ClusterResourcePlacementEvictionNotExecuted"

evictionInvalidMissingCRP = "Failed to find cluster resource placement targeted by eviction"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Arvind! Just a nit: would you mind adding a Message suffix/prefix to the variables, just to make things a bit more clear?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, binding are generally speaking considered to be our internal APIs, to keep things a bit more clear on the user end, it might make more sense to just use cluster references in the message part.

// SetupWithManager sets up the controller with the Manager.
func (r *Reconciler) SetupWithManager(mgr runtime.Manager) error {
return runtime.NewControllerManagedBy(mgr).
WithOptions(ctrl.Options{MaxConcurrentReconciles: 1}). // set the max number of concurrent reconciles
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A nit: we might need to add a note here to explain why we have to use 1 max. concurrent reconciliations for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants