-
Notifications
You must be signed in to change notification settings - Fork 120
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimise Pod Handling in Machine Roll-Out #120
Comments
At the moment we do not ensure that pods evicted due to Rolling-update do not get back to old machines. |
This could be a dangerous assumption that all users will have Another alternative could be applying Also i thought this should be an already solved problem, and maybe we could get benefit from the work of others. But seems like this specific case is not covered by most of the major kubernetes providers:
Also worth checking in related kubernetes SIGs. |
Yes, #274 was necessary in case the cluster is minimal and all resources are used up, which also means, that when it's merged, We actually hope to find some time later to make the update behaviour more configurable - through the end user that knows the work load (best). At the latest, when we see very large clusters, that will become a necessity. We used to work with Bosh 2 years ago and know that you if you have thousands of machines, you need to work in large waves (significant percentages of the machines) and need hints by the workload owners, otherwise this can't work. At the moment we run with a low Yes, the others have no great solutions either and yes, |
@amshuman-kr @prashanth26 @hardikdr Can we implement something here to improve the shoot control plane availability ( |
Thanks,
|
|
Hi @ggaurav10 & @hardikdr , Could you review this PR when you get time? I have updated the testcases. |
When we do a rolling update, we don't give Kubernetes any hints to schedule the drained pods on the new machines rather than the olds ones, right? This in turn means, a rolling update may hit a pod more than once. Is there a chance to "guide" the scheduler in scheduling the displaced pods rather on the new than the old machines, maybe with some temporary node labels but ideally without modifying/patching the deployment/pod specs, which would be pretty messy? Maybe, the easiest solution could be to actually
cordon
the old machines the moment the first batch of new machines joined the cluster (I am wary to do it earlier because that could mean that pods can't be scheduled anymore until new machines have joined the cluster). What do you think?The text was updated successfully, but these errors were encountered: