-
Notifications
You must be signed in to change notification settings - Fork 120
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Erratic serialized drain if there are large number of volumes attached per node #468
Comments
/priority critical |
@ggaurav10 do you see any challenge in adding a minor delay after evicting the volume-based pods - to confirm detached volume is not flapping but gone for good, also how do you generally see the approach? |
TL;DR: Just thinking out loud: Just wondering if MCM should wait for that |
We discussed today to pick this up later after the OOT for Azure is out. cc @AxiomSamarth . |
Right, @hardikdr. Now with kupid we can steer where we want to have our ETCDs and how many of them. |
To be fixed with #621 |
This problem is solved now with since in the current drain code, we don't just wait for volume detach, but we also wait for volume attachment to another node. So, even if volume transiently disappears from |
/close as per explanation given by Tarun above |
What happened:
In a provider like GCP or Azure (where a relatively large number of volumes are allowed to be attached per node), serialized eviction of pods with volumes while draining a node shows some erratic behaviour. Most pod evictions (and the corresponding volume detachment) takes between 4s to 15s. But if there are a large number of volumes attached to a node (>= 40), sometimes (unpredictably), a bunch of pods are deleted (and their corresponding volumes detached) within a matter of 5ms-10ms.
Though the drain logic thinks that the pods' volumes are detached in a matter of milliseconds, in reality these volumes are not fully detached and this causes disproportionate delays in attachment of the volumes and and startup of the replacement pods.
What you expected to happen:
The serialized eviction of pods should proceed normally irrespective of the number of pods with volume per node.
How to reproduce it (as minimally and precisely as possible):
Steps:
nodeAffinity
,taints
andtolerations
).Machine
object backing the node on which the pods are hosted.node.Status.VolumesAttached
) and MCM logs.Anything else we need to know:
MCM watched
node.Status.VolumesAttached
to check if a volume has been detached after the corresponding pod has been evicted. But I have noticed inconsistency in updating of thenode.Status.VolumesAttached
if there are a large number of volumes attached per node. Sometimes, after eviction of the pod, the corresponding volume gets removed too quickly fromnode.Status.VolumesAttached
but then it reappears in the array, only to disappear again. Sometimes, it even make a few such disappearances and reappearances before going away for good. In this case, MCM would consider the volume to be detached at the first disappearance and would move on to the next pod eviction.Environment:
provider: GCP or Azure
Approaches for resolution:
The text was updated successfully, but these errors were encountered: