-
Notifications
You must be signed in to change notification settings - Fork 204
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Maintenance sentinel #314
Comments
This is not ment for maintenance of kured itself, but task done on the k8s node. |
Thanks for clarifying, that makes sense. My 2¢ are that this adds additional functionality and responsibility to what kured has historically done. The title of the project is "kured - The Kubernetes Reboot Daemon", so having kured be able to do convenient cordon + drain is sort of orthogonal to that mission. @dholbach should we solicit some additional opinions and thoughts here? Another problem I see with this approach is that the OS file system is not really the best deterministic vector to initiate a node's cordon + drain. By marking a sentinel file, and then relying upon a continuous process running as a privileged daemonset, you would also have to wait for that daemonset's container process to successfully cordon + drain before continuing with maintenance (cordon + drain isn't guarantee to be successful and/or graceful). So you'd probably need yet another sentinel file, like I think the cleanest solution to this problem would be a generic "cordon + drain" daemonset (so, another tool, not kured) that is responsible for doing cordon + drain reconciliation, and then marking the OS so that maintenance tasks can be safely performed without impact to the cluster. Imagine a tool that does the following:
With a tool that exhibits the behaviors above, you could then write maintenance scripts like the below: #!/bin/bash
# this script assumes we are running as root
touch /var/run/cordon-drain-required
while true; do
# if I'm in maintenance mode then that means I'm not actively in the cluster
if [ -f /var/run/k8s-node-maintenance ]; then
# maintenance tasks here
break
else
sleep 5
fi
done
# if we get here then maintenance was successful
# mark the node as no longer in maintenance mode
rm -f /var/run/k8s-node-maintenance
# request the node re-join the cluster if the maintenance doesn't require a reboot
if [ ! -f /var/run/reboot-required ]; then
touch /var/run/uncordon-required
fi @Michael-Sinz in case you have any thoughts on doing something like the above @devigned @CecileRobertMichon just curious, is there a canonical cluster-api-ish way of doing the above already? |
The whole question of who "owns" a node's cordon state (and drain behavior - they are separate but drain happens under cordon) is one that has plagued kubernetes administration for a while. We have seen other pod/service that take it upon themselves to cordon and uncordon nodes due to other signals (for example Azure infrastructure maintenance notifications) and that then interacts poorly with some other need to cordon a node as that pod undoes what another pod is intending. Having some universal solution to this such that we can address the multi-master problem of different entities thinking it is time to cordon/drain or uncordon nodes would be great. I don't know if writing to the host filesystem is the right place to do that. I would expect that we really want this to be an intention on the node in kubernetes and that may get set by various things. The work on the node itself (below kubernetes) would then want to, as Jack showed, have some standard for knowing when it can or can not move forward. I tend to prefer systems with active locks - that is, you have to prevent it from doing something, such that failure mode is that maintenance will run (after all, it could be that maintenance is required to get back into operational readiness) As Jack also correctly points out, drain may fail if pods on the node are in critical state (outside of pod disruption budget). In those conditions, you can only chose to loop back later and try again or force pods that are critical to be evicted which may have service impacts. The second case is useful if some pod has been preventing a drain from completing for a while now and you really must complete the reboot/maintenance process (security patch/etc) but it is really a path of last resort. We (my team) have worked hard to minimize the number of ways maintenance is done at a node level such that we get less multi-master problems and generally push towards all real maintenance is part of the reboot cycle as that addresses the concerns of multiple different maintenance needs pending at the same time. (Which does happen). This has reduced our places for cordon/drain to a few points: auto-scaler scaling down nodes, rebooter doing node reboots for maintenance reasons, and zombie-killer which try to resolve lost forward progress nodes. (Usually that just is cordon/drain in order to let kubernetes know that the node is unhealthy in a way it may not have noticed yet) Even as such, that still leaves us with 3 and I don't know how to lower that number. (Zombie-killer could be seen as a way to scale out a node that is unhealthy - so one could call that "like" scale down from auto-scaler but it is still its own agent) |
Nice comments and thanks for the feedback. I totally understand your concerns of adding this feature to kured but I had to try :) |
It is a "hard" topic as to who owns the "platform" upon which kubernetes runs. It is sort of abstract but the infrastructure is important as it provides the basis for kubernetes to do its work. Changes below kubernetes really are about updating that infrastructure. How to then make sure any assumptions/invariants are restored in the kubernetes space is the next tricky part. An "easy" pattern is to assume that you have to reboot/restart most anything as you don't know the transitive closure of the dependencies that may have been impacted by the maintenance that was done. If you do know that exactly you could avoid the reboot by just restarting those elements involved. We had gone down the route that changing the invariants below kubernetes is best done by a graceful reboot. We place our servicing work either in the boot process or the shutdown process (usually in the boot process) and just signal the system to do the reboot when able. This simplifies the process and basically keeps things overall more stable (plus, reboots happen for other reasons anyway so having that handling path solid is important anyway so leveraging it is good) |
@m4r1u2, Last week I bootstrapped a prototype (initially heavily re-using kured concepts + code) solution here: https://github.com/jackfrancis/kustodian I wanted to do so to get a feel for whether or not incorporating it into kured (like, maybe making kured more generic and not reboot-specific) would be preferable. My conclusions now are kind of the same, and that I really like that kured follows the unix principle of doing one thing well. cc @bboreham who expressed interest in node maintenance leasing in another thread I would really like to find that something like this is already solved and is community-backed, and has production users. Seems like this is so fundamental a generic solution already exists. But in case it's not the above is sort of my current best effort of how we might generically solve this problem as a k8s community. |
Very cool @jackfrancis. I will do some testing this week |
This issue was automatically considered stale due to lack of activity. Please update it and/or join our slack channels to promote it, before it automatically closes (in 7 days). |
I would love to have an additional sentinel where kured cordan & drain a node (and uncordan afterwords).
Additional sentinel file can be:
/var/run/kured-maintenance
??Use case:
If running attended upgrades or performing other tasks there its recommended to drain a node before start.
The file must be manually created on the node (or by a script or program), and cleanup at the end.
A cleanup will then trigger a uncordan of the node by kured.
touch /var/run/kured-maintenance
rm /var/run/kured-maintenance
The text was updated successfully, but these errors were encountered: