-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ROX-17712 prevent unnecessary central updates #1122
Conversation
// This will prevent unnecessary operator reconciliation loops. | ||
|
||
desiredCentral := existingCentral.DeepCopy() | ||
desiredCentral.Spec = *central.Spec.DeepCopy() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if an annotation was updated? It looks to me that changes on the metadata are not updated anymore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current logic ignores them completely, and just takes the labels and annotations from the existing central:
existingCentral.Spec = central.Spec
if err := r.client.Update(ctx, &existingCentral); err != nil {
return errors.Wrapf(err, "updating central %s/%s", central.GetNamespace(), central.GetName())
} | ||
} | ||
|
||
return nil | ||
} | ||
|
||
func printCentralDiff(desired, actual *v1alpha1.Central) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think the log verbosity could be problem?
When ~50 Centrals are reconciled, during an upgrade this log could get very verbose when it logs all Centrals which do not have any errors.
Could the diff be more conditional to not always print it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey, yes I've added a feature flag for enabling the diffs.
printCentralDiff(wouldBeCentral, &existingCentral) | ||
|
||
updatedCentral := existingCentral.DeepCopy() | ||
updatedCentral.Spec = *central.Spec.DeepCopy() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be possible to build a hash over the labels/annotations + spec and use the computed hash to detect if an update is necessary?
Not useful for printing a diff though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be possible, but imho it's a less robust approach
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is it less robust? 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My take on it is that the apiserver must be thought as some sort of black box. It might have various webhooks, and other defaulting strategies that are impossible to account for. So the existingCentral
might differ from the central
because of apiserver side effects. By using a dry-run, those side effects are accounted for
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I got the point about dryRun and webhooks from the API server and agree.
With that being said, I don't see the reason for checking every property instead of building a hash over the same data in this function:
https://github.com/stackrox/acs-fleet-manager/pull/1122/files#diff-8b2c412ebc3fa158d46c2dfb0136cfef521abb4fb2dda0a77b403a32afd9c58eR466-R483
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you suggesting that instead of DeepCompare
, we would compare the hash of the would be
central and the existing
central ? What would be the advantage ?
glog.Infof("Update central %s/%s", central.GetNamespace(), central.GetName()) | ||
existingCentral.Spec = central.Spec | ||
// perform a dry run to see if the update would change anything. | ||
// This would apply the defaults and the mutating webhooks without actually updating the object. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a problem currently with mutating webhooks?
I am not fully understanding this change after a second pass. The logic feels redundant with the last Central hash implementation already existent and the logic here seems to complicated to safe one unnecessary update.
Which problem is solved here by running a dry run? 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The dry run will run the update through the api server, setting all default fields, applying (if any) mutating/validating webhooks, etc, without storing the update. The object returned will be as if it was actually updated. Then only is it safe to compare the desired state (after a "would-be" update) with the current state.
The problem with the central hash is that it contains fields that don't affect the actual Central CR, such as the status. So the fleetshard will still update the Central CR even if it didn't actually needs to change.
Also, everytime fleetshard does so, it increments a "revision" annotation. But it turns out that in 95% of cases, the only thing that fleetshard updates is the "revision", without modifying any other field on the CR. This creates a lot of traffic for the operator. This is especially visible in the first minutes of deploying a Central, where there could be as many as 20-40 unnecessary reconciliation loops (without actual changes to the CR) triggered due to this.
So basically this logic is to check whether or not the Central will change as a result of the update or not. And the dry-run is to mitigate any side effects that could be somehow applied either on the apiserver, or through webhooks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is also how the helm operator detects if resources are changed or not, and if an upgrade needs to be performed or not. https://github.com/operator-framework/helm-operator-plugins/blob/c16a400954f7e43bb987b196e8ecf2f4d2d4ab0f/pkg/reconciler/reconciler.go#L720
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: ludydoo, SimonBaeumer The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
This PR adds some logic to prevent fleetshard from updating Centrals if there are no changes, which prevent unnecessary opreator reconciliation loops.