Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move drain logic into a separate controller #621

Open
1 of 5 tasks
Tracked by #724
prashanth26 opened this issue Jun 14, 2021 · 5 comments
Open
1 of 5 tasks
Tracked by #724

Move drain logic into a separate controller #621

prashanth26 opened this issue Jun 14, 2021 · 5 comments
Labels
area/dev-productivity Developer productivity related (how to improve development) component/mcm Machine Controller Manager (including Node Problem Detector, Cluster Auto Scaler, etc.) effort/1m Effort for issue is around 1 month kind/enhancement Enhancement, improvement, extension lifecycle/rotten Nobody worked on this for 12 months (final aging stage) needs/planning Needs (more) planning with other MCM maintainers priority/3 Priority (lower number equals higher priority)

Comments

@prashanth26
Copy link
Contributor

prashanth26 commented Jun 14, 2021

How to categorize this issue?

/area dev-productivity
/kind enhancement
/priority 3

What would you like to be added:
We would like to move out the drain logic of the provider machine controller into a separate controller. With this draining logic would no longer be a part of the core machine reconcile logic It would rather be a different controller taking care of this logic. The machine controller and drain controller can use the machine CRD status/labels to coordinate the handover.

Why is this needed:
To make the machine reconcile loop more efficient.

@prashanth26 prashanth26 added the kind/enhancement Enhancement, improvement, extension label Jun 14, 2021
@gardener-robot gardener-robot added area/dev-productivity Developer productivity related (how to improve development) priority/3 Priority (lower number equals higher priority) labels Jun 14, 2021
@prashanth26 prashanth26 added roadmap/internal component/mcm Machine Controller Manager (including Node Problem Detector, Cluster Auto Scaler, etc.) effort/1m Effort for issue is around 1 month labels Jun 14, 2021
@amshuman-kr amshuman-kr added this to the 2021-Q4 milestone Jun 14, 2021
@vlerenc
Copy link
Member

vlerenc commented Jun 15, 2021

/remove internal
@prashanth26 I hope it's OK if I remove the roadmap label. With internal we rather mean of central importance to Gardener as a whole. All teams have (team-) internal tasks, but those are not reported in the global roadmap or else it becomes too large. Things like improved-drain-of-pods-with-volumes-v2 is internal and rightly so, because it has such a huge impact to everybody. That's kind of the differentiation. I hope, that's fine.

@prashanth26
Copy link
Contributor Author

/remove internal
@prashanth26 I hope it's OK if I remove the roadmap label. With internal we rather mean of central importance to Gardener as a whole. All teams have (team-) internal tasks, but those are not reported in the global roadmap or else it becomes too large. Things like improved-drain-of-pods-with-volumes-v2 is internal and rightly so, because it has such a huge impact to everybody. That's kind of the differentiation. I hope, that's fine.

Sure Vedran. No problem, sounds good to me.

@danielfoehrKn
Copy link

danielfoehrKn commented Nov 4, 2021

This is also important if the machineDrainTimeout is changed during an ongoing drain e.g in Gardener for a worker pool.
The only thing that helps is currently is to edit the machine and restart the MCM.

@himanshu-kun
Copy link
Contributor

Another case to handle -
If a machine drain is ongoing and a force-deletion:true label is added on the machine obj, then force deletion of machine should happen.
Currently on shoot deletion force deletion label is added , but it doesn't force delete the machine, if machine is already draining or stuck in draining , and thus shoot deletion is also stuck (until drain timeout occurs which could be of days)

@gardener-robot gardener-robot added the lifecycle/stale Nobody worked on this for 6 months (will further age) label Jun 29, 2022
@himanshu-kun himanshu-kun modified the milestones: 2021-Q4, 2023-Q1 Aug 16, 2022
@gardener-robot gardener-robot added lifecycle/rotten Nobody worked on this for 12 months (final aging stage) and removed lifecycle/stale Nobody worked on this for 6 months (will further age) labels Feb 12, 2023
@elankath
Copy link
Contributor

Grooming decision: Too much refactoring needed in current code. Agree on drain being a separate controller. We should target this for controller-runtime port.

@himanshu-kun himanshu-kun added needs/planning Needs (more) planning with other MCM maintainers and removed lifecycle/rotten Nobody worked on this for 12 months (final aging stage) labels Feb 22, 2023
@himanshu-kun himanshu-kun removed this from the 2023-Q1 milestone Feb 22, 2023
@gardener-robot gardener-robot added the lifecycle/stale Nobody worked on this for 6 months (will further age) label Nov 1, 2023
@gardener-robot gardener-robot added lifecycle/rotten Nobody worked on this for 12 months (final aging stage) and removed lifecycle/stale Nobody worked on this for 6 months (will further age) labels Jul 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/dev-productivity Developer productivity related (how to improve development) component/mcm Machine Controller Manager (including Node Problem Detector, Cluster Auto Scaler, etc.) effort/1m Effort for issue is around 1 month kind/enhancement Enhancement, improvement, extension lifecycle/rotten Nobody worked on this for 12 months (final aging stage) needs/planning Needs (more) planning with other MCM maintainers priority/3 Priority (lower number equals higher priority)
Projects
None yet
Development

No branches or pull requests

7 participants