-
Notifications
You must be signed in to change notification settings - Fork 204
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
EC2 instance graceful restart #358
Comments
Technically, the next release might help you (the main branch code can do it, it's not yet released). What you're trying to achieve has never been publicly added to the project (and you might want to document it here, if you feel it's worth sharing). If someone did that, it didn't share the recipe. Here is what I would do:
There are maybe better ways, but I don't know those, and therefore I can't help more than that. |
Thanks for your quick reply! Can you point me to the feature in the next release that might help? As a variant on your suggestion, I could also create a shell script and make it accessible to kured as an attached volume in the kubernetes manifest. |
While it's probably possible, it's not the easiest. We currently wrap the rebootCommand to escape the container namespace. So basically the command is by default something similar to Did you think of an init container, which would drop the necessary script on your hosts instead? That seems the easiest: It will be able to create the script easily, and kured can still trigger it easily. |
I hadn't thought of an init container but that would work in terms of dropping the script on the host. One disadvantage I can think of is AWS permissions. If the script were to execute inside the container namespace then you could use a tool like kube2iam to set the AWS IAM role for execution but I don't think that's an option on the host. Would you be open to a command line option to enable reboot-command execution in the container namespace (e.g. |
I don't mind myself (though I would probably name that argument differently). We need to find whether it makes sense to also have the sentinel (check) in the same namespace or not (and therefore add another argument). Alternatively, we could simply expose that we are doing I would love to discuss this in a next community meeting tbh. In the meantime, feel free to propose a PR, and we can discuss it there : ) |
Exposing the use of In addition, it would be useful to have the ability to run a script after the instance restarts (e.g. to add the instance back to its Auto Scaling group). Does it make sense to add this capability to kured? Or does it make more sense to hack this into the init container script? |
@amorey The problem with exposing For a script after instance restart, it's effectively the role of your node's init system and/or kubernetes. I wouldn't do that in kured, myself. |
All good questions. Let me know what you end up deciding. In the mean time there should be work arounds for the execution context issues.
Makes sense. Thanks for the advice. |
I just came across this kubernetes blog post from today: If graceful node shutdown works as advertised then, in the future, kured should be able to offload node draining to kubernetes it might also take care of some of the issues I'm dealing with now. In the mean time, I can trigger ELB de-registration from within kubernetes by adding a "node.kubernetes.io/exclude-from-external-load-balancers" label to the node. Can you recommend a way to do this with kured? In addition, I need to set a grace period to wait until ELB closes connections. Can I use |
@evrardjp I noticed that kured sets a node lock annotation which is removed after restart. If it were possible to set a node lock label that behaved in the same way then it would be possible to remove nodes from external load balancers gracefully before restart without having to use a custom script (using the "node.kubernetes.io/exclude-from-external-load-balancers" label). Would you be open to including this feature in kured? |
Yes, I read this recently. When kured will be supporting 1.21 and above only, we can probably remove a bunch of code. |
I saw your PR. It's awesome. I like it so far :) Let's discuss it there! |
Previously, kured issued the system reboot command without first removing the node from any connected external load balancers (ELBs). This behavior caused downtime on restart because ELBs send traffic to kube-proxy pods running on nodes until the ELB health checks fail or the node is de-registered explicitly. This patch solves the problem by adding a "node.kubernetes.io/exclude-from -external-load-balancers" label to the node before restart which tells the Kubernetes control plane to de-register the node from any connected ELBs. The node label is removed after restart which causes the control plane to re-register the node with the ELBs. Close kubereboot#358
Previously, kured issued the system reboot command without first removing the node from any connected external load balancers (ELBs). This behavior caused downtime on restart because ELBs send traffic to kube-proxy pods running on nodes until the ELB health checks fail or the node is de-registered explicitly. This patch solves the problem by adding a "node.kubernetes.io/exclude-from -external-load-balancers" label to the node before restart which tells the Kubernetes control plane to de-register the node from any connected ELBs. The node label is removed after restart which causes the control plane to re-register the node with the ELBs. Close kubereboot#358
…ceful removal/addition from external load balancers Previously, kured issued the system reboot command without first removing nodes from any connected external load balancers (ELBs). This behavior caused downtime on restart because ELBs send traffic to kube-proxy pods running on nodes until the ELB health checks fail or the node is de-registered explicitly. This patch solves the problem by adding a command line argument (`label-with-exclude-from-external-lbs`) that, when enabled, adds a "node.kubernetes.io/exclude-from-external-load-balancers" label to nodes undergoing kured reboot. This label tells the Kubernetes control plane to de-register the affected node from any connected ELBs. The node label is removed after restart which causes the control plane to re-register the node with the ELBs. Close kubereboot#358
…ceful removal/addition from external load balancers Previously, kured issued the system reboot command without first removing nodes from any connected external load balancers (ELBs). This behavior caused downtime on restart because ELBs send traffic to kube-proxy pods running on nodes until the ELB health checks fail or the node is de-registered explicitly. This patch solves the problem by adding a command line argument (`label-with-exclude-from-external-lbs`) that, when enabled, adds a "node.kubernetes.io/exclude-from-external-load-balancers" label to nodes undergoing kured reboot. This label tells the Kubernetes control plane to de-register the affected node from any connected ELBs. The node label is removed after restart which causes the control plane to re-register the node with the ELBs. Close kubereboot#358
…ceful removal/addition from external load balancers Previously, kured issued the system reboot command without first removing nodes from any connected external load balancers (ELBs). This behavior caused downtime on restart because ELBs send traffic to kube-proxy pods running on nodes until the ELB health checks fail or the node is de-registered explicitly. This patch solves the problem by adding a command line argument (`label-with-exclude-from-external-lbs`) that, when enabled, adds a "node.kubernetes.io/exclude-from-external-load-balancers" label to nodes undergoing kured reboot. This label tells the Kubernetes control plane to de-register the affected node from any connected ELBs. The node label is removed after restart which causes the control plane to re-register the node with the ELBs. Close kubereboot#358
This issue was automatically considered stale due to lack of activity. Please update it and/or join our slack channels to promote it, before it automatically closes (in 7 days). |
/remove stale |
…ceful removal/addition from external load balancers Previously, kured issued the system reboot command without first removing nodes from any connected external load balancers (ELBs). This behavior caused downtime on restart because ELBs send traffic to kube-proxy pods running on nodes until the ELB health checks fail or the node is de-registered explicitly. This patch solves the problem by adding a command line argument (`label-with-exclude-from-external-lbs`) that, when enabled, adds a "node.kubernetes.io/exclude-from-external-load-balancers" label to nodes undergoing kured reboot. This label tells the Kubernetes control plane to de-register the affected node from any connected ELBs. The node label is removed after restart which causes the control plane to re-register the node with the ELBs. Close kubereboot#358
This issue was automatically considered stale due to lack of activity. Please update it and/or join our slack channels to promote it, before it automatically closes (in 7 days). |
/remove stale |
…ceful removal/addition from external load balancers Previously, kured issued the system reboot command without first removing nodes from any connected external load balancers (ELBs). This behavior caused downtime on restart because ELBs send traffic to kube-proxy pods running on nodes until the ELB health checks fail or the node is de-registered explicitly. This patch solves the problem by adding a command line argument (`label-with-exclude-from-external-lbs`) that, when enabled, adds a "node.kubernetes.io/exclude-from-external-load-balancers" label to nodes undergoing kured reboot. This label tells the Kubernetes control plane to de-register the affected node from any connected ELBs. The node label is removed after restart which causes the control plane to re-register the node with the ELBs. Close kubereboot#358
This issue was automatically considered stale due to lack of activity. Please update it and/or join our slack channels to promote it, before it automatically closes (in 7 days). |
/remove stale |
Is there a recommended way to initiate a graceful restart of EC2 instances from kured? For my particular use case, I'd like to de-register instances from ELBs before rebooting the OS but I can't see a way to do this with the kured command line options.
The text was updated successfully, but these errors were encountered: