-
Notifications
You must be signed in to change notification settings - Fork 228
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AUTH-543: Add optional operand deletion condition #1902
base: master
Are you sure you want to change the base?
Conversation
@liouk: This pull request references Jira Issue OCPBUGS-44937, which is invalid:
Comment The bug has been updated to refer to the pull request using the external bug tracker. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
2daf3c8
to
ee16aa0
Compare
@liouk: This pull request explicitly references no jira issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
ee16aa0
to
a9ce87e
Compare
} | ||
}() | ||
|
||
if _, err := c.deploymentLister.Deployments(c.targetNamespace).Get(workloadName); err != nil && !apierrors.IsNotFound(err) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the Get()
call necessary before attempting the Delete()
call? What does the extra call buy us?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Get()
happens on the lister (cache), which means that if the deployment has already been deleted, it'll save us an extra API call to Delete()
. This will be happening on every sync.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I totally missed that the Get()
would be on the cache - that makes sense :). Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be beneficial to add some unit tests for the new deletion behavior?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The unit tests would end up using mocks for most of the stuff that the deletion does; but on second thought it might be beneficial to test the operator status, so I'll add some 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be good to import and use this change in one of the consumers (e.g. cluster-authentication-operator) and show that the CI is passing there.
a9ce87e
to
520c3d2
Compare
deploymentAvailableCondition = deploymentAvailableCondition. | ||
WithStatus(operatorv1.ConditionFalse). | ||
WithReason("DeletionError") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we include a message in all conditions? I'm not familiar with how these conditions have been constructed in the past, but I have seen logs that say not setting the message in a condition will eventually be fatal:
W1203 04:13:07.671272 1 dynamic_operator_client.go:355] .status.conditions["APIServerDeploymentAvailable"].message is missing; this will eventually be fatal
Even though it would be a repetitive message in this case, with ^ in mind maybe it is worth it to future-proof this implementation as much as we can for when that does get flipped to fatal?
// | ||
// the "deletionConditionFn" will be used to check whether the workload specified by the | ||
// returned name which is part of targetNamespace must be deleted | ||
func NewControllerWithDeletion(instanceName, operatorNamespace, targetNamespace, targetOperandVersion, operandNamePrefix, conditionsPrefix string, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there sufficient benefit of creating a 2nd constructor for this? We could just pass the deletion option as nil in the normal use case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still applies, I think we do not need any change in the constructor. Let's drive such logic through the delegate.
if conditionMet, workload, err := c.deletionConditionFn(); err != nil { | ||
return err | ||
} else if conditionMet { | ||
return c.deleteWorkload(ctx, workload) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't the delegate be responsible for the lifecycle of the Deployment? I am not sure if it makes sense to fragment the logic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This still applies, why do we have to manage the deletion in here?
return c.deleteWorkload(ctx, workload) | ||
} | ||
} | ||
|
||
if fulfilled, err := c.delegate.PreconditionFulfilled(ctx); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wouldn't it be better to let the delegate communicate everything as we do with the preconditions?
@@ -295,6 +295,46 @@ func (cs *APIServerControllerSet) WithWorkloadController( | |||
return cs | |||
} | |||
|
|||
func (cs *APIServerControllerSet) WithWorkloadControllerWithDeletion( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we let delegate controller we do not need another method
@@ -356,6 +424,55 @@ func (c *Controller) updateOperatorStatus(ctx context.Context, previousStatus *o | |||
return nil | |||
} | |||
|
|||
func (c *Controller) deleteWorkload(ctx context.Context, workloadName string) (err error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to split it from the updateOperatorStatus
reconciliation logic? It would be nicer to keep the concern in the same place and consider all 4 conditions.
I do not have a problem with calling a util function in updateOperatorStatus
if needed though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
still applies, if we need to manage the status/conditions during scale down of the workload, it would be better to do it in a single place (updateOperatorStatus)
if _, getErr := c.deploymentLister.Deployments(c.targetNamespace).Get(workloadName); getErr != nil && !apierrors.IsNotFound(getErr) { | ||
deploymentAvailableCondition = deploymentAvailableCondition. | ||
WithStatus(operatorv1.ConditionFalse). | ||
WithReason("DeletionError") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: this isn't a deletion error
deploymentAvailableCondition = deploymentAvailableCondition. | ||
WithStatus(operatorv1.ConditionTrue). | ||
WithReason("AsExpected") | ||
workloadDegradedCondition = workloadDegradedCondition. | ||
WithStatus(operatorv1.ConditionFalse) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the value of preserving these conditions (Available=True
, Degraded=False
) if no workload exists for them?
Or are we interested in measuring the availability of OIDC provider? Is it possible? And even then this probably isn't the right place to indicate it, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be good to import and use this change in one of the consumers (e.g. cluster-authentication-operator) and show that the CI is passing there.
@liouk: This pull request references AUTH-543 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target either version "4.19." or "openshift-4.19.", but it targets "openshift-4.18" instead. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
/jira refresh |
@liouk: This pull request references AUTH-543 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target either version "4.19." or "openshift-4.19.", but it targets "openshift-4.18" instead. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
520c3d2
to
8367a93
Compare
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: liouk The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
deploymentDegradedCondition := applyoperatorv1.OperatorCondition(). | ||
WithType(fmt.Sprintf("%sDeploymentDegraded", c.conditionsPrefix)) | ||
if removeConditions { | ||
jsonPatch := v1helpers.RemoveConditionsJSONPatch(previousStatus, []string{typeAvailable, typeDegraded, typeProgressing, typeWorkloadDegraded}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems we cannot use SSA for removing conditions, but I am not sure if patch is better than v1helpers.UpdateStatus
here. Would be good to add an additional opinion on this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The advantage of the patch here in my opinion is that we're only adding to the patch the specific conditions that we want to remove, so it's more concise -- we won't have to manage the whole status object to perform the update, so maybe it's less prone to mistakes.
What would you think @bertinatto?
|
||
deploymentDegradedCondition := applyoperatorv1.OperatorCondition(). | ||
WithType(fmt.Sprintf("%sDeploymentDegraded", c.conditionsPrefix)) | ||
if removeConditions { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should check if workload == nil
to make sure the delegate has deleted (and not recreated) the workload
|
||
deploymentDegradedCondition := applyoperatorv1.OperatorCondition(). | ||
WithType(fmt.Sprintf("%sDeploymentDegraded", c.conditionsPrefix)) | ||
if removeConditions { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
even the workload should be gone and we should not get any errs, we should at least log them if they passed on by the delegate
deploymentAvailableCondition := applyoperatorv1.OperatorCondition().WithType(typeAvailable) | ||
workloadDegradedCondition := applyoperatorv1.OperatorCondition().WithType(typeWorkloadDegraded) | ||
deploymentDegradedCondition := applyoperatorv1.OperatorCondition().WithType(typeDegraded) | ||
deploymentProgressingCondition := applyoperatorv1.OperatorCondition().WithType(typeProgressing) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might make more sense to still consider preconditions even when the workload will be deleted later. Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that the delete would happen during the delegate's sync, we shouldn't normally reach the point of removing the conditions if preconditions are failing. But you are right, we should safe-guard against this -- I'll add a check before removing conditions.
if conditionMet, workload, err := c.deletionConditionFn(); err != nil { | ||
return err | ||
} else if conditionMet { | ||
return c.deleteWorkload(ctx, workload) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This still applies, why do we have to manage the deletion in here?
// | ||
// the "deletionConditionFn" will be used to check whether the workload specified by the | ||
// returned name which is part of targetNamespace must be deleted | ||
func NewControllerWithDeletion(instanceName, operatorNamespace, targetNamespace, targetOperandVersion, operandNamePrefix, conditionsPrefix string, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still applies, I think we do not need any change in the constructor. Let's drive such logic through the delegate.
@@ -356,6 +424,55 @@ func (c *Controller) updateOperatorStatus(ctx context.Context, previousStatus *o | |||
return nil | |||
} | |||
|
|||
func (c *Controller) deleteWorkload(ctx context.Context, workloadName string) (err error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
still applies, if we need to manage the status/conditions during scale down of the workload, it would be better to do it in a single place (updateOperatorStatus)
8367a93
to
f07c37b
Compare
@liouk: all tests passed! Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
There might be cases (as demonstrated in OCPBUGS-44937) where we might want to gracefully delete the operand workload of the workload controller, and keep the operator status available (instead of unavailable or degraded).
This PR adds an optional way of specifying a deletion condition which will trigger the deletion of the operand gracefully, keeping the operator's status as
Available=True
. To avoid breaking changes, I've added new setup functions to be used for wiring this particular case.This is needed in the scope of openshift/cluster-authentication-operator#740.