-
Notifications
You must be signed in to change notification settings - Fork 89
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Explicitly Queue a Reconcile Request if a Shared Provider has Expired #241
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @ulucinar LGTM. I left a nit comment for further discussions
return managed.ExternalObservation{ | ||
ResourceExists: true, | ||
ResourceUpToDate: true, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I do not have a strong opinion, but I think the case of a resource that has never been scheduled and that of a resource that has been scheduled and reconciled several times (at least within the application) can be evaluated differently.
So, we may consider returning an error for a resource that has never been scheduled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @sergenyalcin for considering this. I agree with your point but I also feel a bit uncomfortable with continuing to produce unsync'ed MRs even for certain cases since this change is a UX change.
As we have discussed, I also feel uncomfortable for returning a direct success from the external client's Create
function to the managed reconciler but the situation is similar to what we do in the async mode of operation, i.e., we already return an immediate success by just starting a goroutine that may very well fail to initiate a creation request for the external resource.
I propose we continue with suppressing the unsync'ed state of MRs and if folks complain about prolonging wait times than we already have a solution for this: If you don't want to wait, please give more memory to the native provider and increase the provider-ttl
command-line parameter. Btw, this PR is not expected to prolong any wait times on its own. No change in the wait times when a runner has expired, and we should just be preventing the MRs from being unsync'ed when the runner has expired and new scheduling requests cannot be admitted until the old one is drained and replaced. (One thing regarding these statements is the max delay configured for the rate limiter, if it's higher than the managed reconciler's, then wait times may change, or any other aspect of the rate limiter).
Maybe we will just end up implementing a scheduler that will not deny requests coming to an expired runner but will fork a new one as needed at the cost of running multiple runners in parallel and increasing the memory footprint. Let's try the approach implemented here first as we already have the provider-ttl
parameter.
5451bc0
to
0950921
Compare
…ng errors - The external client will requeue at most 20 times before reporting an error Signed-off-by: Alper Rifat Ulucinar <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @ulucinar LGTM!
Description of your changes
Fixes #192
We currently return an error if a terraform.SharedProvider has already expired and an external client requests service from the expired shared provider. This results in the MR being reconciled by the external client to acquire a
Synced: False
status condition. This is a temporary situation that's resolved once the expired shared provider is drained and the terraform.SharedProviderScheduler replaces it.However, until the shared provider is replaced, the MRs entering into the
Synced: False
state become confusing. Before thehandler.EventHandler
implementation from #231, we had to return an error to the managed reconciler to requeue a timely reconciliation request. However, we can now explicitly requeue an exponentially backed-off reconciliation request so that the external client will retry to get service from the provider in a timely manner without having to wait for the poll period (default 10 min for the official providers) and without forcing the MR falling into theSynced: False
state.The external client will retry to schedule a shared runner up to 20 times without erroring, exponentially backing-off after each failure. And after this initial 20 retries, it will report the error to the managed reconciler and keep trying.
I have:
make reviewable
to ensure this PR is ready for review.backport release-x.y
labels to auto-backport this PR if necessary.How has this code been tested
Tested via crossplane-contrib/provider-upjet-aws#805.