-
Notifications
You must be signed in to change notification settings - Fork 89
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Requeue the reconcile request on async update error #231
Conversation
e54d702
to
a5721c1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, @ulucinar. These are my comments. According to the results in the description, deletion times seem to be getting longer. How would you like to handle this?
Signed-off-by: Alper Rifat Ulucinar <[email protected]>
dfe3e46
to
945fff4
Compare
Signed-off-by: Alper Rifat Ulucinar <[email protected]>
Signed-off-by: Alper Rifat Ulucinar <[email protected]>
Did a set of measurements with this PR. While this PR mainly decreases the resource utilization in case of especially when the external connector (or client) fails fast in the
|
Did some refactoring and repeated the tests here. I think the experiments with 5 MRs have high variance but still the total experiment times are around the expected values. |
…omponents Signed-off-by: Alper Rifat Ulucinar <[email protected]>
Signed-off-by: Alper Rifat Ulucinar <[email protected]>
Signed-off-by: Alper Rifat Ulucinar <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @ulucinar LGTM!
Some performance test results. There are improvements with this PR. v0.37.0 AWS Monolithic Test Results (1) INFO[0683] Results - v0.37.0 - AWS - 100 ECR Repository (2) INFO[0776] Results - v0.37.0 - AWS - 100 ECR Repository (3) INFO[0783] Results - v0.37.0 - AWS - 100 ECR Repository AWS Monolithic Test Results from this branch (1) INFO[0351] Results - index.docker.io/ulucinar/provider-aws:v0.38.0-rc.0.43.g24291c953 - AWS - 100 ECR Repository (2) INFO[0361] Results - index.docker.io/ulucinar/provider-aws:v0.38.0-rc.0.43.g24291c953 - AWS - 100 ECR Repository (3) INFO[0370] Results - index.docker.io/ulucinar/provider-aws:v0.38.0-rc.0.43.g24291c953 - AWS - 100 ECR Repository |
Description of your changes
Fixes #176
Fixes #180
This PR employs the new
DesiredStatusChanged
predicate predicate from the crossplane-runtime as an event filter to prevent too-frequent reconciles on error conditions or on updates to the external create pending/failed annotations. It also allows the async callback functions to requeue a reconcile request with an exponential back-off, when the async operation fails.I have:
make reviewable
to ensure this PR is ready for review.backport release-x.y
labels to auto-backport this PR if necessary.How has this code been tested
Tested the PR with crossplane-contrib/provider-upjet-aws#789.
When, for example, the external connector's
Connect
call fails, the requeue request is exponentially backed-off with a cap of 1 min. Here are the sample timestamps from an experiment with aConnect
call failing:To test whether we exponentially back-off when we error in asynchronous callbacks, I provisioned an S3 bucket, put a file into it, and attempted to delete the bucket. Here are the timestamps belonging to the (failing) reconciliation attempts while deleting the
Bucket.s3
MR:, which again shows the reconcile requests are being exponentially backed-off.
We have also done some performance tests and those results can be found here.