This repository contains a set of playbooks to help facilitate the deployment of OpenShift 4.4 on RHV.
The playbooks/scripts in this repository should help you automate the vast majority of an OpenShift 4.4 UPI deployment on RHV. Be sure to read the requirements section below. My initial installation of OCP on RHV was a little cumbersome, so I opted to automate the majority of the installation to allow for iterative deployments.
The biggest challenge was the installation of the Red Hat Enterprise Linux CoreOS (RHCOS) nodes themselves and that is the focal point of the automation. The playbooks/scripts provided are essentially an automated walk through of the standard baremetal UPI installation instructions but tailored for RHV.
To automate the deployment of RHCOS, the standard boot ISO is modified so the installation automatically starts with specific kernel parameters. The parameters for each node type (bootstrap, masters and workers) are the same with the exception of coreos.inst.ignition_url
. To simplify the process, the boot ISO is made to reference a PHP script that offers up the correct ignition config based on the requesting hosts DNS name. This method allows the same boot ISO to be used for each node type.
Before provisioning the RHCOS nodes a lot of prep work needs to be completed. This includes creating the proper DNS entries for the environment, configuring a DHCP server, configuring a load balancer and configuring a web server to store ignition configs and other installation artifacts. Ansible playbooks are provided to automate much of this process.
- Deployment of RHCOS on RHV
- Creation of all SRV, A and PTR records in IdM
- Deployment of httpd Server for Installation Artifacts and Logic
- Deployment of HAProxy and Applicable Configuration
- Deployment of dhcpd and Applicable Static IP Assignment
- Ordered Starting (i.e. installation) of VMs
To leverage the automation in this guide you need to bring the following:
- RHV Environment (tested on 4.3.9)
- IdM Server with DNS Enabled
- Must have Proper Forward/Reverse Zones Configured
- RHEL 7 Server which will act as a Web Server, Load Balancer and DHCP Server
- Only Repository Requirement is
rhel-7-server-rpms
All hostnames must follow the following format:
- bootstrap.<base domain>
- master0.<base domain>
- masterX.<base domain>
- worker0.<base domain>
- workerX.<base domain>
- Bootstrap SSL Certificate is only valid for 24 hours
- etcd/master naming convention conforms to 0 based index (i.e. master0, master1, master2...not master1, master2, master3)
- qemu-guest-agent is not included in RHCOS (instructions for installed daemonset provided below)
Read through the Installing on baremetal installation documentation before proceeding.
Find a good working directory and clone this repository using the following command:
$ git clone https://github.com/sa-ne/openshift4-rhv-upi.git
Login to your IdM server and make sure a reverse zone is configured for your subnet. My lab has a subnet of 172.16.10.0
so the corresponding reverse zone is called 10.16.172.in-addr.arpa.
. Make sure a forward zone is configured as well. It should be whatever is defined in the base_domain
variable in your Ansible inventory file (rhv-upi.ocp.pwc.umbrella.local
in this example).
An example inventory file is included for Ansible (inventory-example.yml
). Use this file as a baseline. Make sure to configure the appropriate number of master/worker nodes for your deployment.
The following global variables will need to be modified (the default values are what I use in my lab, consider them examples):
Variable | Description |
---|---|
iso_name | The name of the custom ISO file in the RHV ISO domain |
base_domain | The base DNS domain. Not to be confused with the base domain in the UPI instructions. Our base_domain variable in this case is <cluster_name> .<base_domain> |
dhcp_server_dns_servers | DNS server assigned by DHCP server |
dhcp_server_gateway | Gateway assigned by DHCP server |
dhcp_server_subnet_mask | Subnet mask assigned by DHCP server |
dhcp_server_subnet | IP Subnet used to configure dhcpd.conf |
load_balancer_ip | This IP address of your load balancer (the server that HAProxy will be installed on) |
ipa_validate_certs | Enable or disable validation of the certificates for your IdM server (default: yes ) |
installation_directory | The directory that you will be using with openshift-install command for generating ignition files |
For the individual node configuration, be sure to update the hosts in the pg
hostgroup. Several parameters will need to be changed for each host including ip
, storage_domain
and network
. You can also specify mac_address
for each of the VMs in its network
section (if you don't, VMs will obtain their MAC address from cluster's MAC pool automatically). Match up your RHV environment with the inventory file.
Under the webserver
and loadbalancer
group include the FQDN of each host. Also make sure you configure the httpd_port
variable for the web server host. In this example, the web server that will serve up installation artifacts and load balancer (HAProxy) are the same host.
In the directory that contains your cloned copy of this git repo, create an Ansible vault called vault.yml as follows:
$ ansible-vault create vault.yml
The vault requires the following variables. Adjust the values to suit your environment.
---
rhv_hostname: "rhevm.pwc.umbrella.local"
rhv_username: "admin@internal"
rhv_password: "changeme"
ipa_hostname: "idm1.umbrella.local"
ipa_username: "admin"
ipa_password: "changeme"
The OpenShift Installer releases are stored here. Find the installer, right click on the "Download Now" button and select copy link. Then pull the installer using curl (be sure to quote the URL) as shown (linux client used as example):
$ curl -o openshift-install-linux-4.4.3.tar.gz https://mirror.openshift.com/pub/openshift-v4/clients/ocp/latest/openshift-install-linux-4.4.3.tar.gz
Extract the archive and continue.
After you download the installer we need to create our ignition configs using the openshift-install
command. Create a file called install-config.yaml
similar to the one show below. This example shows 3 masters and 4 worker nodes.
apiVersion: v1
baseDomain: ocp.pwc.umbrella.local
compute:
- name: worker
replicas: 4
controlPlane:
name: master
replicas: 3
metadata:
name: rhv-upi
networking:
clusterNetworks:
- cidr: 10.128.0.0/14
hostPrefix: 23
networkType: OpenShiftSDN
serviceNetwork:
- 172.30.0.0/16
platform:
none: {}
fips: false
pullSecret: '{ ... }'
sshKey: 'ssh-rsa ... user@host'
You will need to modify baseDomain, pullSecret and sshKey (be sure to use your public key) with the appropriate values. Next, copy install-config.yaml
into your working directory (/home/chris/upi/rhv-upi
in this example) and run the OpenShift installer as follows to generate your Ignition configs.
Your pull secret can be obtained from the OpenShift start page.
$ ./openshift-install create ignition-configs --dir=/home/chris/upi/rhv-upi
Next we need the RHCOS image. These images are stored here. On our web server, download the RHCOS image (BIOS, not UEFI) to the document root (assuming /var/www/html
).
NOTE: You may be wondering about SELinux contexts since httpd is not installed. Fear not, our playbooks will handle that during the installation phase.
$ sudo mkdir -p /var/www/html
$ sudo curl -o /var/www/html/rhcos-4.4.3-x86_64-metal.x86_64.raw.gz https://mirror.openshift.com/pub/openshift-v4/dependencies/rhcos/latest/latest/rhcos-4.4.3-x86_64-metal.x86_64.raw.gz
Ignition files generated in the previous step will be copied to web server automatically as part of httpd
role. If you intend to skip that role, copy bootstrap.ign, master.ign and worker.ign from your working directory to /var/www/html
on your web server manually now.
We will use a bootable ISO to install RHCOS on our virtual machines. We need to pass several parameters to the kernel (see below). This can be cumbersome, so to speed things along we will generate a single boot ISO that can be used for bootstrap, master and worker nodes. During the installation, a playbook will install the PHP script on your web server. This script will serve up the appropriate ignition config based on the requesting servers DNS name.
Kernel Parameters
Note these parameters are for reference only. Specify the appropriate values for your environment in util/iso-generator.sh
and run the script to generate an ISO specific to your environment.
- coreos.inst=yes
- coreos.inst.install_dev=sda
- coreos.inst.image_url=http://example.com/rhcos-4.4.3-x86_64-metal.raw.gz
- coreos.inst.ignition_url=http://example.com/ignition-downloader.php
Next we need the RHCOS ISO installer (stored here). Download the ISO file as shown. Be sure to check the directory for the latest version.
$ curl -o /tmp/rhcos-4.4.3-x86_64-installer.x86_64.iso https://mirror.openshift.com/pub/openshift-v4/dependencies/rhcos/4.4/latest/rhcos-4.4.3-x86_64-installer.x86_64.iso
A script is provided to recreate an ISO that will automatically boot with the appropriate kernel parameters. Locate the script and modify the variables at the top to suite your environment.
Most parameters can be left alone. You WILL need to change at least the KP_WEBSERVER
variable to point to the web server hosting your ignition configs and RHCOS image.
VERSION=4.4.3-x86_64
ISO_SOURCE=/tmp/rhcos-$VERSION-installer.x86_64.iso
ISO_OUTPUT=/tmp/rhcos-$VERSION-installer.x86_64-auto.iso
DIRECTORY_MOUNT=/tmp/rhcos-$VERSION-installer
DIRECTORY_WORKING=/tmp/rhcos-$VERSION-installer-auto
KP_WEBSERVER=lb.rhv-upi.ocp.pwc.umbrella.local:8888
KP_COREOS_IMAGE=rhcos-$VERSION-metal.x86_64.raw.gz
KP_BLOCK_DEVICE=sda
Running the script (make sure to do this as root) should produce similar output:
<<<<<<< HEAD
(rhv) 0 [email protected]@toaster:~ $ sudo ./util/iso-generator.sh
mount: /tmp/rhcos-4.3.0-x86_64-installer: WARNING: device write-protected, mounted read-only.
=======
(rhv) 0 [email protected]@toaster:~ $ sudo ./iso-generator.sh
mount: /tmp/rhcos-4.4.3-x86_64-installer: WARNING: device write-protected, mounted read-only.
>>>>>>> a8694ee2d2d1a85a14ca24fdc4ea7fd1716632e6
sending incremental file list
README.md
EFI/
EFI/fedora/
EFI/fedora/grub.cfg
images/
images/efiboot.img
images/initramfs.img
images/vmlinuz
isolinux/
isolinux/boot.cat
isolinux/boot.msg
isolinux/isolinux.bin
isolinux/isolinux.cfg
isolinux/ldlinux.c32
isolinux/libcom32.c32
isolinux/libutil.c32
isolinux/vesamenu.c32
sent 75,912,546 bytes received 295 bytes 151,825,682.00 bytes/sec
total size is 75,893,080 speedup is 1.00
Size of boot image is 4 sectors -> No emulation
13.43% done, estimate finish Tue Jun 4 20:28:18 2019
26.88% done, estimate finish Tue Jun 4 20:28:18 2019
40.28% done, estimate finish Tue Jun 4 20:28:18 2019
53.72% done, estimate finish Tue Jun 4 20:28:18 2019
67.12% done, estimate finish Tue Jun 4 20:28:18 2019
80.56% done, estimate finish Tue Jun 4 20:28:18 2019
93.96% done, estimate finish Tue Jun 4 20:28:18 2019
Total translation table size: 2048
Total rockridge attributes bytes: 2086
Total directory bytes: 8192
Path table size(bytes): 66
Max brk space used 1c000
37255 extents written (72 MB)
Copy the ISO to your ISO domain in RHV. After that you can cleanup the /tmp directory by doing rm -rf /tmp/rhcos*
. Make sure to update the iso_name
variable in your Ansible inventory file with the correct name (rhcos-4.4.3-x86_64-installer.x86_64-auto.iso
in this example).
At this point we have completed the staging process and can let Ansible take over.
To kick off the installation, simply run the provision.yml playbook as follows:
$ ansible-playbook -i inventory.yml --ask-vault-pass provision.yml
The order of operations for the provision.yml
playbook is as follows:
- Create DNS Entries in IdM
- Create VMs in RHV
- Create VMs
- Create Disks
- Create NICs
- Configure Load Balancer Host
- Install and Configure dhcpd
- Install and Configure HAProxy
- Install and Configure httpd
- Boot VMs
- Start bootstrap VM and wait for SSH
- Start master VMs and wait for SSH
- Start worker VMs and wait for SSH
Once the playbook completes (should several minutes) continue with the instructions.
If you already have your own DNS, DHCP or Load Balancer you can skip those portions of the automation by passing the appropriate --skip-tags
argument to the ansible-playbook
command.
Each step of the automation is placed in its own role. Each is tagged ipa, dhcpd and haproxy. If you have your own DHCP configured, you can skip that portion as follows:
$ ansible-playbook -i inventory.yml --ask-vault-pass --skip-tags dhcpd provision.yml
All three roles could be skipped using the following command:
$ ansible-playbook -i inventory.yml --ask-vault-pass --skip-tags dhcpd,ipa,haproxy provision.yml
Once the VMs boot RHCOS will be installed and nodes will automatically start configuring themselves. From this point we are essentially following the Baremetal UPI instructions starting with Creating the Cluster.
Run the following command to ensure the bootstrap process completes (be sure to adjust the --dir
flag with your working directory):
$ ./openshift-install --dir=/home/chris/upi/rhv-upi wait-for bootstrap-complete
INFO Waiting up to 30m0s for the Kubernetes API at https://api.rhv-upi.ocp.pwc.umbrella.local:6443...
INFO API v1.17.1+b9b84e0 up
INFO Waiting up to 30m0s for bootstrapping to complete...
INFO It is now safe to remove the bootstrap resources
Once this openshift-install command completes successfully, login to the load balancer and comment out the references to the bootstrap server in /etc/haproxy/haproxy.cfg
. There should be two references, one in the backend configuration backend_22623
and one in the backend configuration backend_6443
. Alternativaly, you can just run this utility playbook to achieve the same:
ansible-playbook -i inventory.yml bootstrap-cleanup.yml
Lastly, refer to the baremetal UPI documentation and complete Logging into the cluster and all remaining steps.
RHCOS includes the open-vm-tools
package by default but does not include qemu-guest-agent
. To work around this, we can run the qemu-ga
daemon via a container using a DaemonSet or by copying the qemu-ga
binary and requisite configuration to RHCOS using a MachineConfig.
Note: The DaemonSet option is not completely working. Guest memory is not reported to RHV. This is a work in progress.
The Dockerfile used to build the container is included in the qemu-guest-agent
directory in this git repository. You will need to download the latest qemu-guest-agent
RPM from access.redhat.com and place it alongside the Dockerfile to successfully build the container. The container was built using buildah as follows:
buildah bud -t qemu-guest-agent --format docker
However, building the container is for documentation purposes only. The DaemonSet is already configured to pull the qemu-guest-agent
container from quay.io.
To start the deployment we need to create the namespace, to do that run the following command:
oc create -f qemu-guest-agent/0-namespace.yaml
Next we need to create the service account:
oc create -f qemu-guest-agent/1-service-account.yaml
The pods will require the privileged SCC, so add the appropriate RBAC as follows:
oc create -f qemu-guest-agent/2-rbac.yaml
Finally deploy the DaemonSet by running:
oc create -f 3-daemonset.yaml
The following files were extracted from the qemu-guest-agent
rpm, base64 encoded and added to the ignition portion of the Machine Config.
- /etc/qemu-ga/fsfreeze-hook
- /etc/sysconfig/qemu-ga
- /etc/udev/rules.d/99-qemu-guest-agent.rules
- /usr/bin/qemu-ga
Since the /usr
filesystem on RHCOS is mounted read-only, the qemu-ga
binary was placed in /opt/qemu-guest-agent/bin/qemu-ga
. Left alone the qemu-guest-agent
service will fail to start because the qemu-ga
binary does not have the appropriate SELinux contexts. To work around this, an additional service named qemu-guest-agent-selinux
is added to force the appropriate contexts before the qemu-guest-agent
services starts. Both services are added via the systemd
portion of the ignition config.
To add the qemu-guest-agent
service to your worker nodes, simply run the following command:
oc create -f 50-worker-qemu-guest-agent.yaml
When applied, the Machine Config Operator will perform a rolling reboot of all worker nodes in your cluster.
Similarly, the qemu-guest-agent
service can be applied to your master nodes using the following command:
oc create -f 50-master-qemu-guest-agent.yaml
This guide will show you how to deploy OpenShift Container Storage (OCS) using a bare metal methodology and local storage.
For typical deployments, OCS will require three dedicated worker nodes with the following VM specs:
- 48GB RAM
- 12 vCPU
- 1 OSD Disk (300Gi)
- 1 MON Disk (10Gi)
The OSD disks can really be any size but 300Gi is used in this example. Also, an example inventory file (ocs/inventory-example-ocs.yaml
) that shows how to add multiple disks to a worker node is included in the root of this repository.
Before we begin an installation, we need to label our OCS nodes with the label cluster.ocs.openshift.io/openshift-storage
. Label each node with the following command:
$ oc label node workerX.rhv-upi.ocp.pwc.umbrella.local cluster.ocs.openshift.io/openshift-storage=''
OCS will use the default storage class (typically gp2
in AWS and VMware deployments) to create the PVCs used for the OSD and MON disks. Since our RHV deployment does not have an existing storage class we will use the Local Storage Operator to create two storage class with PVs backed by block devices on our OCS nodes.
To deploy the OCS operator, run the following command:
$ oc create -f ocs/ocs-operator.yaml
To deploy the Local Storage operator, run the following command (note that this will install the Local Storage Operator and supporting operators in the same namespace as the OCS operator):
$ oc create -f ocs/localstorage-operator.yaml
To verify the operators were successfully installed, run the following:
$ oc get csv -n openshift-storage
NAME DISPLAY VERSION REPLACES PHASE
awss3operator.1.0.1 AWS S3 Operator 1.0.1 awss3operator.1.0.0 Succeeded
local-storage-operator.4.2.15-202001171551 Local Storage 4.2.15-202001171551 Succeeded
ocs-operator.v4.2.1 OpenShift Container Storage 4.2.1 Succeeded
You should see phase Succeeded
for all operators.
Next we will create two storage classes using the Local Storage Operator. One for the OSD disks and another for the MON disks. Two storage classes are used as the OSDs require volumeMode: block
and the MONs require volumeMode: filesystem
.
Login to your worker nodes over SSH and verify the locations of your block devices (this can be done w/ the lsblk
command). In this example, the OSD disks on each node are located at /dev/sdb
and the MON disks are located at /dev/sdc
.
Modify the ocs/storageclass-mon.yaml
and ocs/storageclass-osd.yaml
files to suit your environment. Pay special attention to the nodeSelectors
and devicePaths
field.
Once you have the right values, create the storage classes as follows:
$ oc create -f ocs/storageclass-mon.yaml
$ oc create -f ocs/storageclass-osd.yaml
To verify, check for the appropriate storage classes and persistent volumes as follows (output may vary slightly):
$ oc get sc
NAME PROVISIONER AGE
localstorage-ocs-mon-sc kubernetes.io/no-provisioner 49m
localstorage-ocs-osd-sc kubernetes.io/no-provisioner 49m
$ oc get pv
NAME CAPACITY ACCESS MODES RECLAIM POLICY ... STORAGECLASS REASON AGE
local-pv-1e5a8670 300Gi RWO Delete ... localstorage-ocs-osd-sc 6h53m
local-pv-60e5cc95 300Gi RWO Delete ... localstorage-ocs-osd-sc 6h53m
local-pv-bc51d61e 300Gi RWO Delete ... localstorage-ocs-osd-sc 6h53m
local-pv-6b8a2749 10Gi RWO Delete ... localstorage-ocs-mon-sc 6h53m
local-pv-709ff523 10Gi RWO Delete ... localstorage-ocs-mon-sc 6h53m
local-pv-a8913854 10Gi RWO Delete ... localstorage-ocs-mon-sc 6h53m
Modify the file ocs/storagecluster.yaml
and adjust the storage requests accordingly. These requests must match the underlying PV sizes in the corresponding storage class.
To create the cluster, run the following command:
$ oc create -f ocs/storagecluster.yaml
The installation process should take approximately 5 minutes. Run oc get pods -n openshift-storage -w
to observe the process.
To verify the installation is complete, run the following:
$ oc get storagecluster storagecluster -ojson -n openshift-storage | jq .status
{
"cephBlockPoolsCreated": true,
"cephFilesystemsCreated": true,
"cephObjectStoreUsersCreated": true,
"cephObjectStoresCreated": true,
...
}
All fields should be marked true.
OCS provides RBD and CephFS backed storage classes for use within the cluster. We can leverage the CephFS storage class to create a PVC for the OpenShift registry.
Modify the file ocs/registry-cephfs-pvc.yaml
file and adjust the size of the claim. Then run the following to create the PVC:
$ oc create -f ocs/registry-cephfs-pvc.yaml
To reconfigure the registry to use our new PVC, run the following:
$ oc patch configs.imageregistry.operator.openshift.io/cluster --type merge -p '{"spec":{"managementState":"Managed","storage":{"pvc":{"claim":"registry"}}}}'
Playbooks are also provided to remove VMs from RHV and DNS entries from IdM. To do this, run the retirement playbook as follows:
$ ansible-playbook -i inventory.yml --ask-vault-pass retire.yml