Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch to exponential backoff while creating/deletion machines #483

Open
hardikdr opened this issue Jul 4, 2020 · 4 comments
Open

Switch to exponential backoff while creating/deletion machines #483

hardikdr opened this issue Jul 4, 2020 · 4 comments
Labels
area/robustness Robustness, reliability, resilience related effort/2d Effort for issue is around 2 days exp/intermediate Issue that requires some project experience kind/enhancement Enhancement, improvement, extension lifecycle/rotten Nobody worked on this for 12 months (final aging stage) needs/planning Needs (more) planning with other MCM maintainers priority/3 Priority (lower number equals higher priority)

Comments

@hardikdr
Copy link
Member

hardikdr commented Jul 4, 2020

What would you like to be added:
On failure of machine creation or deletion requests, MCM constantly tries to create or delete the machine-objects. This could cause a heavy load on control-cluster's API-server, and exhaust the API rate-limits of cloud-provider. We should exponentially back-off on the failure of requests.

Why is this needed:

@hardikdr hardikdr added the kind/enhancement Enhancement, improvement, extension label Jul 4, 2020
@prashanth26 prashanth26 added area/performance Performance (across all domains, such as control plane, networking, storage, etc.) related exp/intermediate Issue that requires some project experience priority/critical Needs to be resolved soon, because it impacts users negatively size/s Size of pull request is small (see gardener-robot robot/bots/size.py) status/new Issue is new and unprocessed labels Aug 16, 2020
@prashanth26
Copy link
Contributor

/assign @hardikdr @prashanth26
/priority blocker

@gardener-robot gardener-robot added priority/blocker Needs to be resolved now, because it breaks the service and removed priority/critical Needs to be resolved soon, because it impacts users negatively labels Sep 29, 2020
@hardikdr
Copy link
Member Author

hardikdr commented Oct 8, 2020

/priority normal
We implemented the constant backoff here #525. We should consider looking at a more sophisticated exponential backoff mechanism, a proposal would be nice.
I mainly see 2 options,

  1. Backoff at the queue. An attempt to machine-set queue: Back off failed Machine operations #510
  2. Backoff inside the reconcile function.

cc @zuzzas

@gardener-robot gardener-robot added priority/normal and removed priority/blocker Needs to be resolved now, because it breaks the service labels Oct 8, 2020
@hardikdr hardikdr added the priority/blocker Needs to be resolved now, because it breaks the service label Oct 8, 2020
@hardikdr hardikdr added priority/normal and removed priority/blocker Needs to be resolved now, because it breaks the service labels Oct 8, 2020
@zuzzas
Copy link
Contributor

zuzzas commented Oct 8, 2020

Thanks to #525 we can now attach a RateLimitingInterface to the queue, and throttle Machines in CrashLoopBackoff.

  1. I'd take the backoff_manager concept from here.
  2. Create a throttling-by-CrashLoopBackoff function here.
  3. And attach the resulting RateLimitingInterface to the queue here.

Then, there's a matter of replacing Adds with AddRateLimiteds to ensure that our new RateLimiter is being triggered.

@gardener-robot gardener-robot added the lifecycle/stale Nobody worked on this for 6 months (will further age) label Dec 8, 2020
@gardener-robot gardener-robot added priority/3 Priority (lower number equals higher priority) effort/2d Effort for issue is around 2 days and removed priority/normal size/s Size of pull request is small (see gardener-robot robot/bots/size.py) labels Mar 8, 2021
@prashanth26
Copy link
Contributor

/title Switch to exponential backoff while creating/deletion machines

@gardener-robot gardener-robot changed the title Throttle the deletion or creation of the machines on failure. Switch to exponential backoff while creating/deletion machines Jul 21, 2021
@prashanth26 prashanth26 removed status/new Issue is new and unprocessed lifecycle/stale Nobody worked on this for 6 months (will further age) labels Jul 21, 2021
@gardener-robot gardener-robot added the lifecycle/stale Nobody worked on this for 6 months (will further age) label Jan 18, 2022
@gardener-robot gardener-robot added lifecycle/rotten Nobody worked on this for 12 months (final aging stage) and removed lifecycle/stale Nobody worked on this for 6 months (will further age) labels Jul 17, 2022
@himanshu-kun himanshu-kun added needs/planning Needs (more) planning with other MCM maintainers area/robustness Robustness, reliability, resilience related and removed lifecycle/rotten Nobody worked on this for 12 months (final aging stage) area/performance Performance (across all domains, such as control plane, networking, storage, etc.) related labels Feb 21, 2023
@gardener-robot gardener-robot added the lifecycle/stale Nobody worked on this for 6 months (will further age) label Nov 1, 2023
@gardener-robot gardener-robot added lifecycle/rotten Nobody worked on this for 12 months (final aging stage) and removed lifecycle/stale Nobody worked on this for 6 months (will further age) labels Jul 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/robustness Robustness, reliability, resilience related effort/2d Effort for issue is around 2 days exp/intermediate Issue that requires some project experience kind/enhancement Enhancement, improvement, extension lifecycle/rotten Nobody worked on this for 12 months (final aging stage) needs/planning Needs (more) planning with other MCM maintainers priority/3 Priority (lower number equals higher priority)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants