Optimise Pod Handling in Machine Roll-Out #120

vlerenc · 2018-07-20T07:29:08Z

When we do a rolling update, we don't give Kubernetes any hints to schedule the drained pods on the new machines rather than the olds ones, right? This in turn means, a rolling update may hit a pod more than once. Is there a chance to "guide" the scheduler in scheduling the displaced pods rather on the new than the old machines, maybe with some temporary node labels but ideally without modifying/patching the deployment/pod specs, which would be pretty messy? Maybe, the easiest solution could be to actually cordon the old machines the moment the first batch of new machines joined the cluster (I am wary to do it earlier because that could mean that pods can't be scheduled anymore until new machines have joined the cluster). What do you think?

The text was updated successfully, but these errors were encountered:

hardikdr · 2018-07-20T12:08:27Z

At the moment we do not ensure that pods evicted due to Rolling-update do not get back to old machines.
As you already suggested, It would be nice to cordon all the old-versioned machines once rolling-update starts.

bergerx · 2018-07-21T00:36:20Z

cordon the old machines the moment the first batch of new machines joined the cluster

This could be a dangerous assumption that all users will have maxSurge>0. But would still make sense after gardener/gardener#274 is merged, if i get it right.

Another alternative could be applying PreferNoSchedule taint to the old machines rather than cordon them all at once. Seems like a safer alternative that could also cover the maxSurge=0 case, but i'm not really sure.

Also i thought this should be an already solved problem, and maybe we could get benefit from the work of others. But seems like this specific case is not covered by most of the major kubernetes providers:

Also worth checking in related kubernetes SIGs.

vlerenc · 2018-07-21T04:39:44Z

Yes, #274 was necessary in case the cluster is minimal and all resources are used up, which also means, that when it's merged, maxSure must be at least 1, but that it is since the beginning.

We actually hope to find some time later to make the update behaviour more configurable - through the end user that knows the work load (best). At the latest, when we see very large clusters, that will become a necessity. We used to work with Bosh 2 years ago and know that you if you have thousands of machines, you need to work in large waves (significant percentages of the machines) and need hints by the workload owners, otherwise this can't work. At the moment we run with a low maxSurge>0, simply because we respect PDBs, but need to terminate uncooperative pods/nodes after a certain grace period, which means that a low number is safe. But theoretically your workload could be slow to start and that could mean, that hard limits cause too many pods to become unavailable at the same time.

Yes, the others have no great solutions either and yes, PreferNoSchedule is the safer/better way, thank you!

vlerenc · 2018-10-04T20:34:16Z

@amshuman-kr @prashanth26 @hardikdr Can we implement something here to improve the shoot control plane availability (PreferNoSchedule)?

hardikdr · 2018-10-05T10:59:20Z

Thanks, PreferNoSchedule sounds good to me, we could basically taint all the old-nodes while rolling-update, this should try not to schedule the pods on old-nodes unless new-ones are not available.

On a side note, we will then enable mcm to sync taints to/from machine-objects from/to node-objects - which would anyway be a useful function.

prashanth26 · 2018-10-05T11:08:00Z

I have opened up this issue, that would help us in allowing us to specify taints on nodes which would help us in adding taints and tolerations to a node.

After which we could add this taint of PreferNoSchedule while doing a rolling update as mentioned by @hardikdr above.

Thanks & Regards,
Prashanth

amshuman-kr · 2018-10-07T14:40:19Z

PreferNoSchedule looks good to try and attempt.

prashanth26 · 2019-01-14T05:34:44Z

Hi @ggaurav10 & @hardikdr ,

Could you review this PR when you get time? I have updated the testcases.

vlerenc added kind/discussion Discussion (enaging others in deciding about multiple options) component/machine-controller-manager area/high-availability High availability related labels Jul 20, 2018

gardener-robot-ci-1 added lifecycle/stale Nobody worked on this for 6 months (will further age) and removed lifecycle/stale Nobody worked on this for 6 months (will further age) labels Sep 19, 2018

vlerenc added the priority/critical Needs to be resolved soon, because it impacts users negatively label Oct 4, 2018

prashanth26 mentioned this issue Oct 5, 2018

Enable support for taints, annotations and labels #174

Closed

prashanth26 mentioned this issue Oct 8, 2018

Preferred Scheduling for Shoot K8s Control Plane gardener/gardener#250

Closed

prashanth26 self-assigned this Nov 14, 2018

prashanth26 mentioned this issue Dec 17, 2018

Optimise pod handling in machine updates #202

Merged

gardener-robot-ci-1 added the lifecycle/stale Nobody worked on this for 6 months (will further age) label Jan 14, 2019

gardener-robot-ci-1 removed the lifecycle/stale Nobody worked on this for 6 months (will further age) label Jan 15, 2019

vlerenc mentioned this issue Jan 16, 2019

Add NoExists and NoSchedule tolerations to system components gardener/gardener#657

Merged

hardikdr closed this as completed in #202 Jan 25, 2019

ghost added the component/mcm Machine Controller Manager (including Node Problem Detector, Cluster Auto Scaler, etc.) label Mar 7, 2020

gardener-robot added priority/2 Priority (lower number equals higher priority) and removed priority/critical Needs to be resolved soon, because it impacts users negatively labels Mar 8, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimise Pod Handling in Machine Roll-Out #120

Optimise Pod Handling in Machine Roll-Out #120

vlerenc commented Jul 20, 2018 •

edited

Loading

hardikdr commented Jul 20, 2018

bergerx commented Jul 21, 2018

vlerenc commented Jul 21, 2018

vlerenc commented Oct 4, 2018

hardikdr commented Oct 5, 2018

prashanth26 commented Oct 5, 2018 •

edited

Loading

amshuman-kr commented Oct 7, 2018 •

edited

Loading

prashanth26 commented Jan 14, 2019

Optimise Pod Handling in Machine Roll-Out #120

Optimise Pod Handling in Machine Roll-Out #120

Comments

vlerenc commented Jul 20, 2018 • edited Loading

hardikdr commented Jul 20, 2018

bergerx commented Jul 21, 2018

vlerenc commented Jul 21, 2018

vlerenc commented Oct 4, 2018

hardikdr commented Oct 5, 2018

prashanth26 commented Oct 5, 2018 • edited Loading

amshuman-kr commented Oct 7, 2018 • edited Loading

prashanth26 commented Jan 14, 2019

vlerenc commented Jul 20, 2018 •

edited

Loading

prashanth26 commented Oct 5, 2018 •

edited

Loading

amshuman-kr commented Oct 7, 2018 •

edited

Loading