Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to Upgrade EKS Worker Nodes and EKS Cluster? #17

Closed
Jeeppler opened this issue Aug 1, 2018 · 44 comments
Closed

How to Upgrade EKS Worker Nodes and EKS Cluster? #17

Jeeppler opened this issue Aug 1, 2018 · 44 comments

Comments

@Jeeppler
Copy link

Jeeppler commented Aug 1, 2018

I was unable to find any documentation on how to upgrade worker nodes for a new Kubernetes version or because of security issues. How will this work with EKS?

The other information I could not find is about how to upgrade the EKS cluster to a new version of Kubernetes. For example, the current version provided by AWS is Kubernetes 1.10, but Kubernetes 1.11 is already available. How does the upgrade strategy for minor and major versions will look like? I know you mentioned that in some of the talks, but there is nothing documented as of now.

@nrdlngr
Copy link
Contributor

nrdlngr commented Aug 14, 2018

Currently only Kubernetes version 1.10.3 is supported for Amazon EKS clusters and nodes. When EKS supports a new version of Kubernetes, the documentation will provide detailed instructions on how to upgrade existing clusters and worker nodes.

@nrdlngr
Copy link
Contributor

nrdlngr commented Aug 14, 2018

Regarding security issues, Amazon EKS uses an Amazon Linux 2 AMI, and you can follow along for known security issues at https://alas.aws.amazon.com/.

@Jeeppler
Copy link
Author

@nrdlngr you answered my Kubernetes upgrade question for now. However, what about upgrading the worker nodes? Do the nodes have to be restarted updates/upgrades? Does AWS update the node images automatically for customer? How does that part work?

It would be really helpful to have more information on how to update/upgrade the user space, container runtime as well as the the kernel space for EKS worker nodes. Having information about security vulnerabilities is just the first step in a mitigating those issues.

@nrdlngr
Copy link
Contributor

nrdlngr commented Aug 22, 2018

Amazon EKS provides new AMIs when we make changes to the OS configuration, but we do not modify existing customer instances. You are responsible for any necessary user space, kernel space, and container runtime upgrades for those nodes.

@nrdlngr nrdlngr closed this as completed Aug 22, 2018
@joshkurz
Copy link

Any comment on when AWS will support a newer version of k8s? Hard to use EKS in production, if I cannot test how the upgrading process works.

@Jeeppler
Copy link
Author

@nrdlngr you did not answer my question at all and I have no idea why you just close this issue, even though the topic of how to upgrade EKS is still open. Please do not close this issue until there is an official EKS cluser and node upgrade guide.

There is a similar question on StackOverflow: How do install security updates on an Amazon Linux AMI EC2 instance?.

  • Amazon Linux uses yum as package manager. To update (for security reason) Amazon Linux a simple yum update is enough. Apparently, Amazon Linux 2 supports Docker through yum as well (see: https://aws.amazon.com/amazon-linux-2/release-notes/#Docker_is_only_in_extras). The remaining question is:

    • How to apply kernel security patches? I assume the easiest way in EKS is to recycle the EC2 node to get the newest AMI image. However, what I could not find is how often AWS publishes an updated version of Amazon Linux AMI (versions with minor kernel releases).
  • Apparently, Amazon Linux 2 did not reach an LTS status until now (see: https://aws.amazon.com/amazon-linux-2/lts-candidate-2-release-notes/). It seems AWS publishes a new version every 6 months. The releases seem to use new LTS kernel releases (jumped from LTS kernel 4.9 in 2017.03 to LTS kernel 4.14 in 2018.03). Can we expect a real LTS release? How will the release schedule will look like in the future? To upgrade EKS nodes from one version of Amazon Linux 2 to another is recycling the node enough?

  • Recycling the node seems to be the easiest way to get upgrades for EKS nodes as far as my understanding goes. However, are there any problems or concerns regarding persistent volume claims? And could AWS please provide some official documentation on how to recycle the EKS nodes, both for upgrading and security reasons.

@nrdlngr
Copy link
Contributor

nrdlngr commented Sep 7, 2018

Reopening until cluster and worker node upgrade guidance is available.

@nrdlngr nrdlngr reopened this Sep 7, 2018
@Jeeppler Jeeppler changed the title How to Upgrade Nodes and Cluster? How to Upgrade EKS Worker Nodes and EKS Cluster? Sep 7, 2018
@tsmgeek
Copy link

tsmgeek commented Nov 5, 2018

Ive added the following to the start of my user-data to makesure new machines have the latest updates before they are added to the cluster.

yum check-update
yumcheck_exit=$?
if [ $yumcheck_exit -eq 100 ]; then
  yum update -y
  shutdown -r now
  rm /var/lib/cloud/instances/*/sem/config_scripts_user
  exit $yumcheck_exit
fi

ie

#!/bin/bash

yum check-update
yumcheck_exit=$?
if [ $yumcheck_exit -eq 100 ]; then
  yum update -y
  shutdown -r now
  rm /var/lib/cloud/instances/*/sem/config_scripts_user
  exit $yumcheck_exit
fi

set -o xtrace
/etc/eks/bootstrap.sh <cluster name> --kubelet-extra-args "--node-labels=spot=<is spot>"
/opt/aws/bin/cfn-signal --exit-code $? \
         --stack  <stack name> \
         --resource NodeGroup  \
         --region <region>

@carlosjgp
Copy link

I'm a little bit concern about EKS Cloudformation documentation... version change requires replacement

https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-resource-eks-cluster.html#cfn-eks-cluster-version

:(

@fyow93
Copy link

fyow93 commented Nov 27, 2018

I want to know how to upgrade eks from eks.1 to eks.2
As eks.2 begins to support Horizontal Pod Auto Scaling and Kubernetes Metrics Server feature.

@Jeeppler
Copy link
Author

@vulcan-lin when was EKS 2 released?

@tsmgeek
Copy link

tsmgeek commented Nov 27, 2018

@vulcan-lin when was EKS 2 released?

https://aws.amazon.com/about-aws/whats-new/2018/08/introducing-amazon-eks-platform-version-2/

@tsmgeek
Copy link

tsmgeek commented Nov 27, 2018

I want to know how to upgrade eks from eks.1 to eks.2
As eks.2 begins to support Horizontal Pod Auto Scaling and Kubernetes Metrics Server feature.

https://docs.aws.amazon.com/eks/latest/userguide/platform-versions.html

@SathyaBhat
Copy link

@vulcan-lin there's no way to initiate a manual upgrade to eks2 (at least, last when I checked with AWS Support). It should be rolled out over time during your maintenance windows

@dhammond22222
Copy link

With the new kubernetes major security vulnerability - what is the best approach to request getting the control plane upgraded? Seems there isn't much of any documentation available?

Reference: https://elastisys.com/2018/12/04/kubernetes-critical-security-flaw-cve-2018-1002105/

@mellena1
Copy link

mellena1 commented Dec 4, 2018

Agreed with @dhammond22222. AKS (azure) and GKE (google cloud) both had updates deployed within hours of the release of the patches. Amazon hasn't even made an announcement for EKS as far as I can tell. Definitely not a good look...

@ahadrana
Copy link

ahadrana commented Dec 4, 2018

Amazon seems to be running v1.10.3-eks on the server, which is way behind 1.10.11. Definitely not good for those of us running production EKS clusters. On top of this, it is not clear if it is possible to lock down ingress into the API server through a security group ?

@ahadrana
Copy link

ahadrana commented Dec 4, 2018

It seems that if API access is secured, then hopefully unauthorized users cannot exploit this privilege escalation bug. However, clarification from EKS team on this issue is urgently required!

@nrdlngr
Copy link
Contributor

nrdlngr commented Dec 4, 2018

AWS has issued a statement regarding CVE-2018-1002105 and Amazon EKS clusters: https://aws.amazon.com/security/security-bulletins/AWS-2018-020/

@Jeeppler
Copy link
Author

Jeeppler commented Dec 6, 2018

Now we have eks.3 as platform. What is EKS 3? Is this related to CVE-2018-1002105?

@fr-sgujrati
Copy link

fr-sgujrati commented Dec 6, 2018

EKS is a joke. We moved from self managed Kubernetes to EKS thinking that we won't have to deal with master any more. But wait, after 3 months in production, while deploying a service, master got 'confused', and refused to accept any kubectl commands. It was apparently due to inconsistency in etcd. It took AWS 4 days to restore the cluster. One of their support engineers told us that if it is not a production cluster, then the best solution is to recreate the cluster!

On top of it, there is no easy way to get master logs. There is not enough documentation. Their EKS Linux AMI doesn't even have NFS utility installed and we can not use ReadWriteMany PV without updating the nodes! We figured it out recently when we needed to create a PV with ReadWriteMany.

AWS is way behind Microsoft/Google managed kubernetes clusters.

@nrdlngr
Copy link
Contributor

nrdlngr commented Dec 6, 2018

@Jeeppler yes. The 1.10.3-eks.3 platform version was released in response to CVE-2018-1002105.

For more information on platform versions, see https://docs.aws.amazon.com/eks/latest/userguide/platform-versions.html.

@nrdlngr
Copy link
Contributor

nrdlngr commented Dec 21, 2018

As of December 12, 2018, Amazon EKS now supports Kubernetes cluster version updates.

We have also provided two worker node update options. Both of these options are appropriate for worker node Kubernetes version updates or security updates (for example, when we release a new AMI to address a Linux or Kubernetes vulnerability).

Thanks for being patient for these features and their supporting documentation!

@Jeeppler, does this satisfy your request?

@kbessas
Copy link

kbessas commented Jan 9, 2019

@nrdlngr What if you manage your infrastructure with CloudFormation? The CloudFormation documentation still mentions a cluster replacement when the version property changes for an EKS cluster. Is the documentation you posted the only available way to do an in-place update of Kubernetes on EKS?

@Jeeppler
Copy link
Author

Jeeppler commented Jan 12, 2019

@nrdlngr

I had went through the upgrade process and want to provide some feedback. I also have some questions which are not answered by the available documents.

The guides are assuming the user is using your CloudFormation templates. I use Terraform.

For my taste, the documents you referenced are written with a lot of details. The core takeaways for me are:

  1. Master nodes are upgraded in-place
  2. Replace worker nodes after master node upgrades
    • by simply terminating the nodes

One important part, which should be it's own document (in my opinion) is upgrading kube-dns to coredns after upgrading from Kubernetes 1.10 to 1.11. The questions, I am not able to answer from the document is why is this necessary? What changed to make this necessary? What is the end of life of kube-dns?
Furthermore, I would like to have separate document for the kube-dns to coredns, which explains the steps regardless if I use a CloudFormation template, Terraform or the AWS CLI.


However, from there are two major question I still do not have an answer to:

  1. What is the best practice to update/upgrade my Amazon Linux AMI's to get patches on a regular basis without having to terminate the entire worker node? (Most Linux distributions patch the system with the help of the package manager [e.g. apt-get dist-upgrade -y for Debian])

  2. When and how often can we expect new major Kubernetes versions to become available on EKS? How long will Kubernetes versions be supported on EKS?

@netrounds-peterg
Copy link

netrounds-peterg commented Jan 18, 2019

I have a few questions regarding cluster upgrades and thought this might be a good place to ask:

  1. Where are new cluster Kubernetes version releases announced? How can I be notified?
    Is it by polling the EKS Platform Versions page?

  2. Where are worker node AMI patch releases announced? How can I be notified?
    It appears to be by following the Amazon Linux Security Center or the table of IDs for the EKS-optimized AMI.

  3. Are worker node AMIs continuously security-patched? Say that I am running on a particular worker node AMI version X. Will X be security-patched and can I then simply terminate instances in my auto scaling group to have them replaced with patched versions of X, OR will I need to update the launch configuration for my auto-scaling group to make use of a newer AMI Y before cycling out instances?

  4. In order to have close-to-zero downtime, would this be a valid upgrade procedure to upgrade my cluster to a new version of Kubernetes, say 1.11 from 1.10?

    • update the control plane to 1.11
    • update the launch cofiguration of the autoscaling group to make use of a 1.11 edition of the worker node ami
    • double the desired size of the autoscaling group to get replacement 1.11 nodes
    • cordon and drain the old 1.10 nodes to evacuate pods to the 1.11 nodes
    • reduce desiredSize by half to kick out the old 1.10 nodes

I have seen something similar to 4. described in different forums but I don't find this procedure mentioned in the official docs, where the Migrate workload to a new worker node group is the method proposed to "gracefully" update the cluster. It seems a bit involved though.

I'm not sure what direction to choose. At the moment, we're using a Terraform-script (that basically follows the official AWS guide by setting up an autoscaling group via CloudFormation) and it would be nice to have an upgrade procedure that plays well with that but I'm not sure if that is doable. Does anyone have experiences to share?

@jayreimers
Copy link

@netrounds-peterg

I had upgraded our cluster using the same steps you noted in 4 without issue and with no downtime.

@Jeeppler
Copy link
Author

@netrounds-peterg I can confirm that the steps you described in 4 work. This is exactly how I did it. With one exception, you do not have to double your cluster size. You can just slightly increase the capacity and terminate the old nodes. I have the same questions as you mentioned in 1-3.

Furthermore, noticed that the docs do not explain very well how to upgrade kube-dns to coredns. Basically, in addition to your questions, I would add a 5 point: How to upgrade or patch components running in the EKS cluster and are installed by default. Those components are for Kubernetes version 1.11: aws-node, kube-proxy and coredns.

@netrounds-peterg
Copy link

For bullet 3. above, I'm not sure it matters. I believe that one can either update worker nodes in-place by SSHing to the nodes and running:

sudo yum update --security

or one can use a procedure similar to 4. to cycle out instances and have them replaced, since that should automatically apply new security patches. A quote from [1]:

Amazon Linux is configured to download and install security updates at launch time.

[1] https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/amazon-linux-ami-basics.html

@netrounds-peterg
Copy link

When relying on Terraform for managing infrastructure, I noticed that to be able to do step 4. above gracefully, one cannot easily rely on the CloudFormation template [1] offered by the official Amazon guide, since in that case upgrading to a new Kubernetes version (typically by setting a kubernetes_version variable from 1.10 to 1.11 and re-running terraform apply) would cause the CloudFormation template to update its NodeImageId which would cause the CloudFormation to automatically recreate the nodes without Kubernetes' awareness (the CloudFormation template creates an Auto Scaling Group with a rolling update policy).

To better support graceful upgrades using procedure 4. I found it more useful to let Terraform create the Auto Scaling Group itself. An excellent guide is provided here.

[1] https://amazon-eks.s3-us-west-2.amazonaws.com/cloudformation/2019-01-09/amazon-eks-nodegroup.yaml

@Jeeppler
Copy link
Author

@netrounds-peterg I use the two modules terraform-aws-eks and AWS VPC Terraform module. The only thing I have to change in the EKS module is the version number and run terraform apply again. Afterwards, I have to increase the size of the autoscaling group and replace the nodes.

@netrounds-peterg
Copy link

@Jeeppler I've considered using terraform-aws-eks but was a bit put off by a few issues related to upgrades:

Anything you've been affected by?

@Jeeppler
Copy link
Author

@netrounds-peterg

  1. Workers on 1.10 even when EKS is provisioned as 1.11 terraform-aws-modules/terraform-aws-eks#246 -> The launch template is new. I did not use the launch template. This feature probably needs another development iteration.
  2. EKS - Kubernetes 1.11 Upgrade terraform-aws-modules/terraform-aws-eks#214 -> This is more a question regarding defaults, rather than an actual problem.

For me the upgrade procedure worked without hitting any issues.

@aleclerc
Copy link

With CVE-2019-5736 upgrading the worker nodes is going to be very much needed. Having this process documented would be great

@nrdlngr
Copy link
Contributor

nrdlngr commented Feb 11, 2019

AWS has issued a statement regarding CVE-2019-5736 and Amazon EKS clusters: https://aws.amazon.com/security/security-bulletins/AWS-2019-002/

The latest patched AMIs are available here: https://docs.aws.amazon.com/eks/latest/userguide/eks-optimized-ami.html

Steps to replace workers with the new AMI are outlined here: https://docs.aws.amazon.com/eks/latest/userguide/update-workers.html

@nrdlngr
Copy link
Contributor

nrdlngr commented Feb 11, 2019

@nrdlngr What if you manage your infrastructure with CloudFormation? The CloudFormation documentation still mentions a cluster replacement when the version property changes for an EKS cluster. Is the documentation you posted the only available way to do an in-place update of Kubernetes on EKS?

Today Amazon EKS cluster Kubernetes version upgrades are not yet supported in CloudFormation; you use the Amazon EKS console or APIs to upgrade a cluster that was created with CloudFormation.

We have a road map item for CloudFormation cluster upgrades here: aws/containers-roadmap#115

Feel free to +1 or comment on that issue to help the service team prioritize this feature request!

@nrdlngr
Copy link
Contributor

nrdlngr commented Feb 11, 2019

@Jeeppler:
"One important part, which should be it's own document (in my opinion) is upgrading kube-dns to coredns after upgrading from Kubernetes 1.10 to 1.11. The questions, I am not able to answer from the document is why is this necessary? What changed to make this necessary? What is the end of life of kube-dns?
Furthermore, I would like to have separate document for the kube-dns to coredns, which explains the steps regardless if I use a CloudFormation template, Terraform or the AWS CLI."

We do have an independent topic for installing coredns on upgraded clusters here: https://docs.aws.amazon.com/eks/latest/userguide/coredns.html

As far as I can tell, this procedure should work on your cluster regardless of how it was created, but please let me know if I'm wrong (I'm not a Terraform expert).

I can't speak to the end of life for kube-dns (we don't own that project). But here is a great blog post about coredns that might answer some of your other questions: https://kubernetes.io/blog/2018/07/10/coredns-ga-for-kubernetes-cluster-dns/

@nrdlngr
Copy link
Contributor

nrdlngr commented Feb 11, 2019

@netrounds-peterg:

  1. Where are new cluster Kubernetes version releases announced? How can I be notified?
    Is it by polling the EKS Platform Versions page?

Probably the best way to watch for Kubernetes version support is in our https://github.com/aws/containers-roadmap GitHub repo. For example, here are the open issues for 1.12 and 1.13:
aws/containers-roadmap#24
aws/containers-roadmap#30

2. Where are worker node AMI patch releases announced? How can I be notified?
It appears to be by following the Amazon Linux Security Center or the table of IDs for the EKS-optimized AMI.

Yes, security issues for Amazon EKS AMIs will always be posted in the Amazon Linux Security Center, because our AMIs are based on Amazon Linux 2.

3. Are worker node AMIs continuously security-patched? Say that I am running on a particular worker node AMI version X. Will X be security-patched and can I then simply terminate instances in my auto scaling group to have them replaced with patched versions of X, OR will I need to update the launch configuration for my auto-scaling group to make use of a newer AMI Y before cycling out instances?

As you noted in a later post, Amazon Linux instances apply existing security patches when they launch.

4. In order to have close-to-zero downtime, would this be a valid upgrade procedure to upgrade my cluster to a new version of Kubernetes, say 1.11 from 1.10?...

That looks like another reasonable approach.

@nrdlngr
Copy link
Contributor

nrdlngr commented Feb 11, 2019

@Jeeppler:

What is the best practice to update/upgrade my Amazon Linux AMI's to get patches on a regular basis without having to terminate the entire worker node? (Most Linux distributions patch the system with the help of the package manager [e.g. apt-get dist-upgrade -y for Debian])

As a general security best practice, we recommend that EKS customers update their configurations to launch new worker nodes from the latest AMI versions when they are released. However, each security issue is different, and as such they will have different remediation steps. For example, kernel vulnerabilities require a reboot after update, so you might as well just replace the node with the new AMI at that point anyways.

Some issues have simpler remediation steps. For example, https://alas.aws.amazon.com/ALAS-2019-1156.html requires a simple yum update docker to fix. If you have a small number of worker nodes, you could log into each one and run that command, but if you have thousands of nodes, that approach is probably not the best.

So I think the best practice would be to replace your worker nodes with our latest AMIs when they are released, but you could choose to review each AMI update on a case-by-case basis and decide for yourself if that is the right approach or to update the instances manually with the yum package manager.

@nrdlngr
Copy link
Contributor

nrdlngr commented Feb 11, 2019

I think I've answered the questions in this issue that are relevant to the original post and many of the follow up questions, so I'm going to close this issue now. If there are any unanswered questions regarding cluster or worker node updates, please feel free to open a new issue.

Thanks!

@nrdlngr nrdlngr closed this as completed Feb 11, 2019
@kylecompassion
Copy link

  • update the control plane to 1.11
  • update the launch cofiguration of the autoscaling group to make use of a 1.11 edition of the worker node ami
  • double the desired size of the autoscaling group to get replacement 1.11 nodes
  • cordon and drain the old 1.10 nodes to evacuate pods to the 1.11 nodes
  • reduce desiredSize by half to kick out the old 1.10 nodes

will EKS always kick out the old 1.10 nodes when newer nodes are available? Is this known to be how EKS or K8s works?

@rastakajakwanna
Copy link

rastakajakwanna commented Jul 23, 2019

I've ended up with this semi-automatic procedure. I hope it helps someone or maybe it gets enhanced over time :) It is just a quick script until there is something more sophisticated available... and it is still work in progress and far from perfect, btw.

Step 0:

  • Upgrade EKS Control plane (AWS Console, CloudFront, Terraform module upgrade, etc.) and propagate the latest k8S target version ami into your asg launch configuration.
  • Make sure your kubectl is configured with the right EKS endpoint (aws eks update-kubeconfig --name <cluster_name> or rewrite the following script and add kubectl argument --kubeconfig <kubeconfig_file>) Script is using the first method automatically.

Step 1:
Worker nodes and base software (core-dns, cni plugin) migration script

cat <<'EKS_UPGRADE_SCRIPT' > /tmp/eks_upgrade.sh
#!/bin/bash

[[ -z $1 ]] && { echo "==== Missing EKS cluster name. Example $0 my-cluster-name ===="; exit 1 ;}
k8s_cluster_name="$1"

set -euo pipefail

# Make sure we have the correct cluster selected
aws eks update-kubeconfig --name ${k8s_cluster_name}

# Get control plane version
k8s_ver="`aws-vault exec sandbox -- kubectl version --short | egrep ^Server | awk '{print $3}' | sed -r 's/(v[0-9.]{1,8})+?.*/\1/g'`"
k8s_worker_ver="`aws-vault exec sandbox -- kubectl get nodes | egrep ^ip | head -1 | awk '{print $5}'`"
echo "===> EKS K8S Plane version: $k8s_ver"
echo "===> EKS Worker version: $k8s_worker_ver"

# Get array of nodes and their instance_id
for n in `kubectl get nodes -o name`; do declare -A I; I["${n/node\//}"]="`aws ec2 describe-instances --output text | grep "${n/node\//}" | egrep ^INSTANCES | awk '{print $9}'`"; done;

# Upgrade CNI plugin to the latest in order to avoid pods unable to allocate IP on new nodes
kubectl apply -f https://raw.githubusercontent.com/aws/amazon-vpc-cni-k8s/release-1.5/config/v1.5/aws-k8s-cni.yaml
# Sometimes this url is referenced (master branch instead of tag): https://raw.githubusercontent.com/aws/amazon-vpc-cni-k8s/master/config/v1.5/aws-k8s-cni.yaml

# Get ASG names
for asg in `aws autoscaling describe-auto-scaling-groups --output text | egrep ^AUTOSCALING | grep ${k8s_cluster_name} | awk '{print $3}'`; do declare -A ASG_DESIRED; ASG_DESIRED["$asg"]="`aws autoscaling describe-auto-scaling-groups --auto-scaling-group-names $asg --output text | egrep ^AUTOSCALING | awk '{print $6}'`"; done

# Scale up ASG to double size temporarily (make sure your auto-scaling group has enough space to do that! That max replicas can be double of the desired. Otherwise you have to edit the procedure below).
for asgm in ${!ASG_DESIRED[@]}; do let "ASG_DESIRED[$asgm] += ASG_DESIRED[$asgm]"; aws autoscaling set-desired-capacity --auto-scaling-group-name ${asgm} --desired-capacity ${ASG_DESIRED[$asgm]}; done

# Let's taint outdated nodes first
for node in "${!I[@]}"; do kubectl taint nodes $node key=value:NoSchedule; sleep 1; done

# Drain and terminate outdated nodes
it="${#I[@]}"; in="1"
for node in "${!I[@]}"; do echo "===> Going to replace ${I[$node]} ($in/$it)"; sleep 2; echo "===> 1/3 Drain node"; kubectl drain $node --ignore-daemonsets --delete-local-data; echo "===> 2/3 Going to terminate node"; aws autoscaling terminate-instance-in-auto-scaling-group --should-decrement-desired-capacity --instance-id ${I[$node]}; while true; do kubectl get pods --all-namespaces; kubectl get nodes; read -p " 3/3 Are nodes and pods ready to continue with another node? (y/n) " ANSWER; [[ $ANSWER == y ]] && break; done ; echo "===> Moving forward ....."; let "in += 1"; done

# Patch cube-proxy
kubectl patch daemonset kube-proxy -n kube-system -p '{"spec": {"template": {"spec": {"containers": [{"image": "602401143452.dkr.ecr.us-west-2.amazonaws.com/eks/kube-proxy:'''${k8s_ver}'''","name":"kube-proxy"}]}}}}'

# Set new coredns image
# Amazon hardcoded, cannot be automated
# Kubernetes 1.13: 1.2.6
# Kubernetes 1.12: 1.2.2
# Kubernetes 1.11: 1.1.3
declare -A CDNS; CDNS[v1.11]="1.1.3"; CDNS[v1.12]="1.2.2"; CDNS[v1.13]="1.2.6"
aws-vault exec sandbox -- kubectl set image --namespace kube-system deployment.apps/coredns coredns=602401143452.dkr.ecr.us-west-2.amazonaws.com/eks/coredns:v${CDNS[${k8s_ver%.*}]}
EKS_UPGRADE_SCRIPT

Step 2:
Make the script executable (chmod +x /tmp/eks_upgrade.sh) and execute the script.

I need to assume admin role before I can proceed, so for me the execution would look like this: aws-vault exec <admin_profile_name> -- /tmp/eks_upgrade.sh <EKS cluster name, e.g.: my-dev-cluster> (I am using aws-vault to protect my api secrets and keep my assumed role in reusable session, basically).

Side note: this procedure and script is based on compilation of the official procedures (https://docs.aws.amazon.com/eks/latest/userguide/migrate-stack.html and https://docs.aws.amazon.com/eks/latest/userguide/update-cluster.html ) + I tried to minimize the outage to the minimum.

@ndreno
Copy link

ndreno commented Jul 31, 2019

Why this issue is closed ? Every solution here is just tweaking and doing some custom IaaS, nothing about a proper SaaS solution...

Please remember the sales punchline : "Amazon EKS runs the Kubernetes management infrastructure for you" <-- not true

@alexlopes
Copy link

Hi everyone, same here, specially about PVCs, I am about to upgrade some nodes and I have PVCs attached to them. In previous upgrades I noticed EKS recreate EBS volumes in "ephemeral" nodes but I am worried about persistent volume claims. Anyone experienced a common case? (can't find anywhere)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests