Skip to content

Latest commit

 

History

History
227 lines (171 loc) · 9.52 KB

3_Installing_Kubernetes_Cluster.asciidoc

File metadata and controls

227 lines (171 loc) · 9.52 KB

Chapter III. Install Kubernetes Cluster

Now it’s time to proceed with the installation of the actual cluster.

In order to do this, we’ll use kubespray’s ansible playbooks, which involves two steps: 1. Telling Ansible where the playbooks will run (in Ansible terms this is called "creating the inventory") 2. Editing some options for the playbooks

1. Create inventory for Ansible playbooks

If using terraform to set up your infra, this should be automatically generated for you, in the /ansible/<installation_name>/ansible_inventory file.

If not, this can also be created using a python script (you might need to install python3 — brew install python3), by running the following set of commands, available in the kubespray guide for building your own inventory.

Once you have your inventory ready, it’s time to let Ansible do the heavy lifting for you.

2. Running Ansible

Please make sure to go through the following steps before actually running Ansible.

We had quite some trouble getting this to work, especially when combined with the use of bastion hosts.

2.1. Using Bastion Hosts

The kubespray deployment example also includes bastion hosts. The idea is you SSH into these hosts and from there (and only from there) you’re allowed to ssh into the rest of your cluster.

For this reason, we have bundled an example of our ssh-bastion.conf in /ansible/cytechmobile/ssh-bastion.conf. Please adapt accordingly.

Note
We use Debian Jessie for our bastion hosts and Ubuntu for our k8s nodes, so note the different usernames.

2.2. Flush cache

Add the --flush-cache option to your ansible run, after destroying and recreating EC2 nodes. Ansible uses some caches that may be interfering with your setup.

2.3. Set ubuntu as bootstrap_os

Make sure to set the right value in your inventory’s group_vars/all.yml file.

HINT: If you don’t and you’re deploying on Ubuntu 16.04, you’ll run into the /usr/bin/python not found error we explore below.

2.4. Set cloud_provider

If you’re deploying on AWS, make sure to set the cloud_provider to aws in your group_vars/all.yml.

3. Start Kubespray

Getting Ansible to set up your Kubernetes cluster is as simple as:

cd ansible/<installation_name> (1)
ansible-playbook \
  --inventory-file=<installation_name>/ansible_inventory \ # (2)
  --become \ # (3)
  --user=ubuntu \ # (4)
  --flush-cache \
  ../kubespray/cluster.yml
  1. By going into this directory, Ansible will pick up the ansible.cfg and ssh-bastion.conf giving you access to your hosts.

  2. The inventory file (i.e. details about your hosts) that you created in the previous section.

  3. According to Ansible help this run operations with become - i.e. run with su

  4. The user to connect as to your kubernetes cluster hosts. I am emphasizing this because I was confused what I should use in the case that the bastion hosts have a different user than the kubernetes hosts.

Note
Leave the efk attribute to false, as we’ll deploy our own E(F)(L)K

3.1. Get SSH keys and sort out your ~/.kube/config

There is currently an open issue about setting up kubectl locally by automating the process of pulling down the keys from the first master node.

As this is not available yet, we combined some of the solutions there and came up with a couple of plays that handle that for you.

  1. Please replace <YOUR_DOMAIN_NAME> in kubectl_setup.yml, with yours.

  2. On your workstation, go to the ./ansible/<installation_name> folder

  3. Ensure you have added the SSH private key of the EC2 key pair you created to your ssh agent.

  4. Run the below command to add to the ~/.kube/config.one file the details to connect to the specific client’s Kubernetes cluster. (Repeat for all the different installations you have performed so that you have ALL kubernetes clusters (of all your installations) in one file):

    export KUBECONFIG=~/.kube/config.one
    
    ansible-playbook \
      --inventory-file=ansible_inventory  \
      --extra-vars=installation_name=<installation_name> \
      --user=ubuntu \
      --flush-cache \
      ../kubectl_setup.yml
    • 4. Then, make sure you switch to the right context:

      kubectl config use-context <installation_name>

4. Troubleshooting

Some issues we came across during the process.

4.1. SSH "Unreachable" errors

These were pretty misleading, cause we did have SSH access. Testing manual access via SSH was fine.

You might see errors like this:

fatal: [kubernetes-mcore-gluster1]: UNREACHABLE! => {"changed": false, "msg": "SSH Error: data could not be sent to remote host \"10.139.92.61\". Make sure this host can be reached over ssh", "unreachable": true}
fatal: [kubernetes-mcore-gluster0]: UNREACHABLE! => {"changed": false, "msg": "SSH Error: data could not be sent to remote host \"10.139.91.152\". Make sure this host can be reached over ssh", "unreachable": true}

We even enabled the Ansible debug logs (by using -vvv on the command line) and we copy-pasted the ssh command, which worked fine outside of Ansible.

A good way to verify you are having the same problem, is by utilizing the Ansible ping module (available out of the box).

`ansible gfs-cluster -m ping -i my_inventory/mcore_hosts -u ubuntu`
kubernetes-mcore-gluster1 | FAILED! => {
    "changed": false,
    "failed": true,
    "module_stderr": "/bin/sh: 1: /usr/bin/python: not found\n",
    "module_stdout": "",
    "msg": "MODULE FAILURE",
    "rc": 127
}
kubernetes-mcore-gluster0 | FAILED! => {
    "changed": false,
    "failed": true,
    "module_stderr": "/bin/sh: 1: /usr/bin/python: not found\n",
    "module_stdout": "",
    "msg": "MODULE FAILURE",
    "rc": 127
}

The issue therefore was not related to SSH at all! In fact the issue was that /usr/bin/python was not available on the hosts.

These hosts are started from the official Ubuntu 16.04 EC2 AMIs, where only python3 exists is available out of the box.

There are 2 solutions:

  1. Set the Ansible python interpreter to python3

E.g. like so (in your inventory file)

[all:vars]
ansible_python_interpreter=/usr/bin/python3
  1. Have Ansible install python 2 for you before gathering facts. This is supported out of the box by the Kubespray repo, as long as you set the bootstrap_os variable to the correct version.

4.1.1. Kube scheduler failures

During some of the initial ansible runs, we got:

RUNNING HANDLER [kubernetes/master : Master | wait for kube-scheduler] ***********************************************************************************************************************************
Wednesday 06 September 2017  12:35:20 +0300 (0:00:00.073)       0:39:54.652 ***
FAILED - RETRYING: Master | wait for kube-scheduler (60 retries left).
FAILED - RETRYING: Master | wait for kube-scheduler (60 retries left).
FAILED - RETRYING: Master | wait for kube-scheduler (60 retries left).
FAILED - RETRYING: Master | wait for kube-scheduler (59 retries left).
FAILED - RETRYING: Master | wait for kube-scheduler (59 retries left).
FAILED - RETRYING: Master | wait for kube-scheduler (59 retries left).
...
fatal: [kubernetes-mcore-master2]: FAILED! => {"attempts": 60, "changed": false, "content": "", "failed": true, "msg": "Status code was not [200]: Request failed: <urlopen error [Errno 111] Connection refused>", "redirected": false, "status": -1, "url": "http://localhost:10251/healthz"}
FAILED - RETRYING: Master | wait for kube-scheduler (3 retries left).
FAILED - RETRYING: Master | wait for kube-scheduler (13 retries left).
FAILED - RETRYING: Master | wait for kube-scheduler (2 retries left).
FAILED - RETRYING: Master | wait for kube-scheduler (12 retries left).
FAILED - RETRYING: Master | wait for kube-scheduler (1 retries left).
FAILED - RETRYING: Master | wait for kube-scheduler (11 retries left).
fatal: [kubernetes-mcore-master1]: FAILED! => {"attempts": 60, "changed": false, "content": "", "failed": true, "msg": "Status code was not [200]: Request failed: <urlopen error [Errno 111] Connection refused>", "redirected": false, "status": -1, "url": "http://localhost:10251/healthz"}
FAILED - RETRYING: Master | wait for kube-scheduler (10 retries left).
FAILED - RETRYING: Master | wait for kube-scheduler (9 retries left).
FAILED - RETRYING: Master | wait for kube-scheduler (8 retries left).
FAILED - RETRYING: Master | wait for kube-scheduler (7 retries left).
FAILED - RETRYING: Master | wait for kube-scheduler (6 retries left).
FAILED - RETRYING: Master | wait for kube-scheduler (5 retries left).
FAILED - RETRYING: Master | wait for kube-scheduler (4 retries left).
FAILED - RETRYING: Master | wait for kube-scheduler (3 retries left).
FAILED - RETRYING: Master | wait for kube-scheduler (2 retries left).
FAILED - RETRYING: Master | wait for kube-scheduler (1 retries left).
fatal: [kubernetes-mcore-master0]: FAILED! => {"attempts": 60, "changed": false, "content": "", "failed": true, "msg": "Status code was not [200]: Request failed: <urlopen error [Errno 111] Connection refused>", "redirected": false, "status": -1, "url": "http://localhost:10251/healthz"}

The problem turned out to be that the EC2 instances did not have enough resources (we were trying out if t2.micro would be enough in terms of memory / compute).

The solution was to upgrade to t2.small.


Wow! You have your Kubernetes cluster set up!! Congrats!! Now, let’s look at a few Additional HA Considerations.