-
Notifications
You must be signed in to change notification settings - Fork 8
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[FEATURE] A refactoring of the rollback code... (#47)
* A refactoring of the rollback code, which also necessitated a refactor of some other variables. Speed is considerably improved in all circumstances, and a couple of edge cases are fixed. Breaks the rollback interface - predelete_role now receives a list of VMs 'hosts_to_remove' to delete, rather than a single VM (was 'host_to_redeploy'). This has potential to massively increase redeploy speed, if clustering affinity is configured. + A new VM tag/label 'lifecycle_state' is created describing the lifecycle state of the VM. It is either 'current', 'retiring' or 'redeployfail'. + A cluster's VMs will now always have the same epoch suffix (even when adding to the cluster) + The '-e clean' functionality has changed; you can now either clean hosts in every 'lifecycle_state' ('-e clean=all'), or optionally just the VMs in one of the above states ('-e clean=retiring'). + Redeploy will assert if the topology has changed (number of VMs does not match reality) + A new global fact 'cluster_hosts_state' is created that contains information on all running VMs with the derived cluster_name; i.e. the _state_ of the cluster. + Variables in 'cluster_hosts_state' are used instead of constantly querying the infrastructure, esp during redeploy. + Alternate redeploy scheme: '_scheme_addallnew_rmdisk_rollback'. + A full mirror of the cluster is deployed. + If the process proceeds correctly: + `predeleterole` is called with a _list_ of the old VMs, in 'hosts_to_remove'. + The old VMs are stopped. + If the process fails for any reason, the old VMs are reinstated, and the new VMs stopped (rollback) + To delete the old VMs, either set '-e canary_tidy_on_success=true', or call redeploy.yml with '-e canary=tidy' + The existing '_scheme_addnewvm_rmdisk_rollback' scheme is refactored to use the new variables. It is functionally similar, but does not terminate the VMs on success. + For each node in the cluster: + Create a new VM + Run `predeleterole` on the previous node as a _list_ (for compatibility), in 'hosts_to_remove'. + Shut down the previous node. + If the process fails for any reason, the old VMs are reinstated, and any new VMs that were built are stopped (rollback) + To delete the old VMs, either set '-e canary_tidy_on_success=true', or call redeploy.yml with '-e canary=tidy' Fixes #25 * Ensure that the 'release' tag/label is consistent within a cluster (e.g. during a scaling deploy); don't allow user to set a different label, and if one is not specified on command line, apply the existing label. * Move location of release_version logic for redeploy * Fix canary_tidy_on_success to apply only when canary is "none" or "finish" * + Add a short sleep to allow DNS operation to complete. Possibly the records are not replicated when the Ansible module returns, but without a small sleep, the dig command will sometimes fail and create a negative cache, which means name won't resolve until the SOA TTL expires. + Remove `delegate_to: localhost` on the dig command, so that it can work if we are running through a bastion host. + If the dig command needs to check an external IP, use 8.8.8.8, otherwise it will default to resolving the cloud DNS and return the internal VPC IP, which will not validate against the ansible_host. + Add some sequence diagrams to show redeploy lifecycle_state for _scheme_addallnew_rmdisk_rollback * + Enable redeploying to larger or smaller clusters. + Prevent from running on a cluster built with older version of clusterverse + Add a new playbook `clusterverse_label_upgrade_v1-v2.yml`, to add necessary labels to an older cluster. + Add skip_release_version_check option. + Make external dns resolver variable * + Change cluster_hosts_flat to cluster_hosts_target + Change nested logging output to print useful trace * Fix for DNS dig check in GCP - only add a '.' to fqdn when there isn't already one at the end. * Only allow canary=tidy to tidy (remove) powered-down VMS. Tidy is meant to clean up after a successful redeploy - if there are non-current machines still powered-up, something is wrong. * Fix for canary_tidy_on_success * Fix merge error in installing file/metricbeat Co-authored-by: Dougal Seeley <[email protected]>
- Loading branch information
Showing
79 changed files
with
1,571 additions
and
2,014 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,4 +1,4 @@ | ||
#!/usr/bin/env python | ||
#!/usr/bin/env python3 | ||
import os | ||
import sys | ||
# import argparse | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -48,14 +48,14 @@ The `cluster.yml` sub-role immutably deploys a cluster from the config defined a | |
### AWS: | ||
``` | ||
ansible-playbook -u ubuntu --private-key=/home/<user>/.ssh/<rsa key> cluster.yml -e buildenv=sandbox -e clusterid=vtp_aws_euw1 [email protected] | ||
ansible-playbook -u ubuntu --private-key=/home/<user>/.ssh/<rsa key> cluster.yml -e buildenv=sandbox -e clusterid=vtp_aws_euw1 [email protected] --tags=clusterverse_clean -e clean=true -e release_version=v1.0.1 | ||
ansible-playbook -u ubuntu --private-key=/home/<user>/.ssh/<rsa key> cluster.yml -e buildenv=sandbox -e clusterid=vtp_aws_euw1 [email protected] -e clean=true -e release_version=v1.0.1 | ||
ansible-playbook -u ubuntu --private-key=/home/<user>/.ssh/<rsa key> cluster.yml -e buildenv=sandbox -e clusterid=vtp_aws_euw1 [email protected] --tags=clusterverse_clean -e clean=_all_ -e release_version=v1.0.1 | ||
ansible-playbook -u ubuntu --private-key=/home/<user>/.ssh/<rsa key> cluster.yml -e buildenv=sandbox -e clusterid=vtp_aws_euw1 [email protected] -e clean=_all_ | ||
``` | ||
### GCP: | ||
``` | ||
ansible-playbook -u <username> --private-key=/home/<user>/.ssh/<rsa key> cluster.yml -e buildenv=sandbox -e clusterid=vtp_gce_euw1 [email protected] | ||
ansible-playbook -u <username> --private-key=/home/<user>/.ssh/<rsa key> cluster.yml -e buildenv=sandbox -e clusterid=vtp_gce_euw1 [email protected] --tags=clusterverse_clean -e clean=true | ||
ansible-playbook -u <username> --private-key=/home/<user>/.ssh/<rsa key> cluster.yml -e buildenv=sandbox -e clusterid=vtp_gce_euw1 [email protected] -e clean=true | ||
ansible-playbook -u <username> --private-key=/home/<user>/.ssh/<rsa key> cluster.yml -e buildenv=sandbox -e clusterid=vtp_gce_euw1 [email protected] --tags=clusterverse_clean -e clean=_all_ -e release_version=v1.0. | ||
ansible-playbook -u <username> --private-key=/home/<user>/.ssh/<rsa key> cluster.yml -e buildenv=sandbox -e clusterid=vtp_gce_euw1 [email protected] -e clean=_all_ | ||
``` | ||
|
||
### Mandatory command-line variables: | ||
|
@@ -67,7 +67,7 @@ ansible-playbook -u <username> --private-key=/home/<user>/.ssh/<rsa key> cluster | |
+ `-e app_class=<proxy>` - Normally defined in `group_vars/<clusterid>/cluster_vars.yml`. The class of application (e.g. 'database', 'webserver'); becomes part of the fqdn | ||
+ `-e release_version=<v1.0.1>` - Identifies the application version that is being deployed. | ||
+ `-e dns_tld_external=<test.example.com>` - Normally defined in `group_vars/<clusterid>/cluster_vars.yml`. | ||
+ `-e clean=true` - Deletes all existing VMs and security groups before creating | ||
+ `-e clean=[current|retiring|redeployfail|_all_]` - Deletes VMs in `lifecycle_state`, or `_all_`, as well as networking and security groups | ||
+ `-e do_package_upgrade=true` - Upgrade the OS packages (not good for determinism) | ||
+ `-e reboot_on_package_upgrade=true` - After updating packages, performs a reboot on all nodes. | ||
+ `-e prometheus_node_exporter_install=false` - Does not install the prometheus node_exporter | ||
|
@@ -76,7 +76,7 @@ ansible-playbook -u <username> --private-key=/home/<user>/.ssh/<rsa key> cluster | |
+ `-e create_gce_network=true` - Create GCP network and subnetwork (probably needed if creating from scratch and using public network) | ||
|
||
### Tags | ||
+ `clusterverse_clean`: Deletes all VMs and security groups (also needs `-e clean=true` on command line) | ||
+ `clusterverse_clean`: Deletes all VMs and security groups (also needs `-e clean=[current|retiring|redeployfail|_all_]` on command line) | ||
+ `clusterverse_create`: Creates only EC2 VMs, based on the hosttype_vars values in group_vars/all/cluster.yml | ||
+ `clusterverse_config`: Updates packages, sets hostname, adds hosts to DNS | ||
|
||
|
@@ -88,18 +88,19 @@ The `redeploy.yml` sub-role will completely redeploy the cluster; this is useful | |
|
||
### AWS: | ||
``` | ||
ansible-playbook -u ubuntu --private-key=/home/<user>/.ssh/<rsa key> redeploy.yml -e buildenv=sandbox -e clusterid=vtp_aws_euw1 [email protected] | ||
ansible-playbook -u ubuntu --private-key=/home/<user>/.ssh/<rsa key> redeploy.yml -e buildenv=sandbox -e clusterid=vtp_aws_euw1 [email protected] -e canary=none | ||
``` | ||
### GCP: | ||
``` | ||
ansible-playbook -u <username> --private-key=/home/<user>/.ssh/<rsa key> redeploy.yml -e buildenv=sandbox -e clusterid=vtp_gce_euw1 [email protected] | ||
ansible-playbook -u <username> --private-key=/home/<user>/.ssh/<rsa key> redeploy.yml -e buildenv=sandbox -e clusterid=vtp_gce_euw1 [email protected] -e canary=none | ||
``` | ||
|
||
### Mandatory command-line variables: | ||
+ `-e clusterid=<vtp_aws_euw1>` - A directory named `clusterid` must be present in `group_vars`. Holds the parameters that define the cluster; enables a multi-tenanted repository. | ||
+ `-e buildenv=<sandbox>` - The environment (dev, stage, etc), which must be an attribute of `cluster_vars` defined in `group_vars/<clusterid>/cluster_vars.yml` | ||
+ `-e canary=['start', 'finish', 'none']` - Specify whether to start or finish a canary deploy, or 'none' deploy | ||
+ `-e canary=['start', 'finish', 'none', 'tidy']` - Specify whether to start or finish a canary deploy, or 'none' deploy | ||
|
||
### Extra variables: | ||
+ `-e 'redeploy_scheme'=<subrole_name>` - The scheme corresponds to one defined in | ||
+ `-e redeploy_scheme=<subrole_name>` - The scheme corresponds to one defined in `roles/clusterverse/redeploy` | ||
+ `-e canary_tidy_on_success=[true|false]` - Whether to run the tidy (remove the replaced VMs and DNS) on successful redeploy | ||
+ `-e myhosttypes="master,slave"`- In redeployment you can define which host type you like to redeploy. If not defined it will redeploy all host types |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,42 @@ | ||
--- | ||
|
||
- name: Clusterverse label upgrade v1-v2 | ||
hosts: localhost | ||
connection: local | ||
gather_facts: true | ||
tasks: | ||
- import_role: | ||
name: 'clusterverse/_dependencies' | ||
|
||
- import_role: | ||
name: 'clusterverse/cluster_hosts' | ||
tasks_from: get_cluster_hosts_state.yml | ||
|
||
- block: | ||
- name: clusterverse_label_upgrade_v1-v2 | Add lifecycle_state and cluster_suffix label to AWS VM | ||
ec2_tag: | ||
aws_access_key: "{{cluster_vars[buildenv].aws_access_key}}" | ||
aws_secret_key: "{{cluster_vars[buildenv].aws_secret_key}}" | ||
region: "{{cluster_vars.region}}" | ||
resource: "{{ item.instance_id }}" | ||
tags: | ||
lifecycle_state: "current" | ||
cluster_suffix: "{{ item.name | regex_replace('^.*-(.*)$', '\\1') }}" | ||
with_items: "{{ hosts_to_relabel }}" | ||
when: cluster_vars.type == "aws" | ||
|
||
- name: clusterverse_label_upgrade_v1-v2 | Add lifecycle_state and cluster_suffix label to GCE VM | ||
gce_labels: | ||
resource_name: "{{item.name}}" | ||
project_id: "{{cluster_vars.project_id}}" | ||
resource_location: "{{item.regionzone}}" | ||
credentials_file: "{{gcp_credentials_file}}" | ||
resource_type: instances | ||
labels: | ||
lifecycle_state: "current" | ||
cluster_suffix: "{{ item.name | regex_replace('^.*-(.*)$', '\\1') }}" | ||
state: present | ||
with_items: "{{ hosts_to_relabel }}" | ||
when: cluster_vars.type == "gce" | ||
vars: | ||
hosts_to_relabel: "{{ cluster_hosts_state | json_query(\"[?!(tagslabels.cluster_suffix) || !(tagslabels.lifecycle_state)]\") }}" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.