Changed autoscaling_group to plain EC2 VMs on AWS. (hitachienergy#2939)

- Replaced AWS auto_scaling_groups with plain EC2 VM creation. - Added proper host sorting how it was implemented for the any and azure providers: hitachienergy#1076 - Sync up features with Azure Terraform implementation - Added support for use_network_security_groups flag hitachienergy#959 - Updated DoD for bugs to reflect changes made for hitachienergy#2832
rafzei · Feb 8, 2022 · 11465af · 11465af
1 parent 8ff1ec1
commit 11465af
Show file tree

Hide file tree

Showing 49 changed files with 315 additions and 529 deletions.
diff --git a/.github/ISSUE_TEMPLATE/bug-report.md b/.github/ISSUE_TEMPLATE/bug-report.md
@@ -35,13 +35,26 @@ Add any other context about the problem here.
 
 **DoD checklist**
 
-* [ ] Changelog updated (if affected version was released)
-* [ ] COMPONENTS.md updated / doesn't need to be updated
-* [ ] Automated tests passed (QA pipelines)
-  * [ ] apply
-  * [ ] upgrade
-* [ ] Case covered by automated test (if possible)
-* [ ] Idempotency tested
-* [ ] Documentation updated / doesn't need to be updated
-* [ ] All conversations in PR resolved
-* [ ] Backport tasks created / doesn't need to be backported
+- Changelog
+  - [ ] updated
+  - [ ] not needed
+- COMPONENTS.md
+  - [ ] updated
+  - [ ] not needed
+- Schema
+  - [ ] updated
+  - [ ] not needed
+- Backport tasks
+  - [ ] created
+  - [ ] not needed
+- Documentation
+  - [ ] added
+  - [ ] updated
+  - [ ] not needed
+- [ ] Feature has automated tests
+- [ ] Automated tests passed (QA pipelines)
+  - [ ] apply
+  - [ ] upgrade
+  - [ ] backup/restore
+- [ ] Idempotency tested
+- [ ] All conversations in PR resolved
diff --git a/cli/engine/providers/aws/APIProxy.py b/cli/engine/providers/aws/APIProxy.py
@@ -2,7 +2,7 @@
 
 from cli.helpers.doc_list_helpers import select_single
 from cli.helpers.objdict_helpers import dict_to_objdict
-from cli.models.AnsibleHostModel import AnsibleHostModel
+from cli.models.AnsibleHostModel import AnsibleOrderedHostModel
 
 
 class APIProxy:
@@ -26,9 +26,7 @@ def get_ips_for_feature(self, component_key):
         cluster_name = self.cluster_model.specification.name.lower()
         look_for_public_ip = self.cluster_model.specification.cloud.use_public_ips
         vpc_id = self.get_vpc_id()
-
-        ec2 = self.session.resource('ec2')
-        running_instances = ec2.instances.filter(
+        running_instances = self.session.resource('ec2').instances.filter(
             Filters=[{
                 'Name': 'instance-state-name',
                 'Values': ['running']
@@ -37,21 +35,28 @@ def get_ips_for_feature(self, component_key):
                     'Values': [vpc_id]
                 },
                 {
-                    'Name': 'tag:'+component_key,
-                    'Values': ['']
+                    'Name': 'tag:component_key',
+                    'Values': [component_key]
                 },
                 {
                     'Name': 'tag:cluster_name',
                     'Values': [cluster_name]
                 }]
         )
 
-        result = []
+        result: List[AnsibleOrderedHostModel] = []
+
         for instance in running_instances:
+            hostname = ''
+            for tag in instance.tags:
+                if tag['Key'] == 'Name':
+                    hostname = tag['Value']
             if look_for_public_ip:
-                result.append(AnsibleHostModel(instance.public_dns_name, instance.public_ip_address))
+                result.append(AnsibleOrderedHostModel(hostname, instance.public_ip_address))
             else:
-                result.append(AnsibleHostModel(instance.private_dns_name, instance.private_ip_address))
+                result.append(AnsibleOrderedHostModel(hostname, instance.private_ip_address))
+
+        result.sort()
         return result
 
     def get_image_id(self, os_full_name):

diff --git a/cli/engine/providers/aws/InfrastructureBuilder.py b/cli/engine/providers/aws/InfrastructureBuilder.py
diff --git a/cli/engine/providers/azure/InfrastructureBuilder.py b/cli/engine/providers/azure/InfrastructureBuilder.py
@@ -24,6 +24,11 @@ def __init__(self, docs, manifest_docs=[]):
         self.docs = docs
         self.manifest_docs = manifest_docs
 
+        # If there are no security groups Ansible provisioning will fail because
+        # SSH is not allowed then with public IPs on Azure.
+        if not(self.use_network_security_groups) and self.use_public_ips:
+            self.logger.warning('Use of security groups has been disabled and public IPs are used. Ansible run will fail because SSH will not be allowed.')
+
         # Check if there is a hostname_domain_extension we already applied and we want to retain.
         # The same as VM images we want to preserve hostname_domain_extension over versions.
         self.hostname_domain_extension = self.cluster_model.specification.cloud.hostname_domain_extension
@@ -61,19 +66,10 @@ def run(self):
             # Set property that controls cloud-init.
             vm_config.specification['use_cloud_init_custom_data'] = cloud_init_custom_data.specification.enabled
 
-            # If there are no security groups Ansible provisioning will fail because
-            # SSH is not allowed then with public IPs on Azure.
-            if not(self.use_network_security_groups) and self.use_public_ips:
-                 self.logger.warning('Use of security groups has been disabled and public IP are used. Ansible run will fail because SSH will not be allowed.')
-
             # For now only one subnet per component.
             if (len(component_value.subnets) > 1):
                 self.logger.warning('On Azure only one subnet per component is supported for now. Taking first and ignoring others.')
 
-            # Add message for ignoring availabiltity zones if present.
-            if 'availability_zone' in component_value.subnets[0]:
-                self.logger.warning('On Azure availability_zones are not supported yet. Ignoring definition.')
-
             subnet_definition = component_value.subnets[0]
             subnet = select_first(infrastructure, lambda item: item.kind == 'infrastructure/subnet' and
                                     item.specification.address_prefix == subnet_definition['address_pool'])

diff --git a/docs/changelogs/CHANGELOG-2.0.md b/docs/changelogs/CHANGELOG-2.0.md
@@ -4,6 +4,7 @@
 
 ### Added
 
+- [#959](https://github.com/epiphany-platform/epiphany/issues/959) - Add usage of use_network_security_groups to disable NSG on AWS
 - [#2701](https://github.com/epiphany-platform/epiphany/issues/2701) - Epicli prepare - generate files in separate directory
 - [#2812](https://github.com/epiphany-platform/epiphany/issues/2812) - Extend K8s config validation
 
@@ -12,6 +13,7 @@
 - [#2653](https://github.com/epiphany-platform/epiphany/issues/2653) - Epicli is failing in air-gapped infra mode
 - [#1569](https://github.com/epiphany-platform/epiphany/issues/1569) - Azure unmanaged disks not supported by Epiphany but there is misleading setting in the default configuration
 - [#2832](https://github.com/epiphany-platform/epiphany/issues/2832) - Make the DoD checklist clear
+- [#2853](https://github.com/epiphany-platform/epiphany/issues/2853) - Change autoscaling_group approach in AWS provider in favor of plain VM creation.
 
 - [#2669](https://github.com/epiphany-platform/epiphany/issues/2669) - Restarting the installation process can cause certificate problems if K8s was not fully configured
 
@@ -41,8 +43,6 @@
 
 ### Breaking changes
 
-- Upgrade of Terraform components in issue [#2825](https://github.com/epiphany-platform/epiphany/issues/2825) will make running re-apply with infrastructure break on existing 1.x clusters. The advice is to deploy a new cluster and migrate data. If needed a manual upgrade path is described [here.](../home/howto/UPGRADE.md#terraform-upgrade-from-epiphany-1.x-to-2.x)
-- Kubernetes container runtime changed. Dockershim and Docker are no longer on Kubernetes hosts.
-- Filebeat docker input replaced by container input with new fields.
+- Upgrade of Terraform components in issue [#2825](https://github.com/epiphany-platform/epiphany/issues/2825) and [#2853](https://github.com/epiphany-platform/epiphany/issues/2853) will make running re-apply with infrastructure break on existing 1.x clusters. The advice is to deploy a new cluster and migrate data. If needed a manual upgrade path is described [here.](../home/howto/UPGRADE.md#terraform-upgrade-from-epiphany-1.x-to-2.x)
 
 ### Known issues
diff --git a/docs/home/ARM.md b/docs/home/ARM.md
@@ -279,66 +279,52 @@ specification:
       count: 2
       machine: kafka-machine-arm
       subnets:
-        - availability_zone: eu-west-1a
-          address_pool: 10.1.5.0/24
+        - address_pool: 10.1.5.0/24
     kubernetes_master:
       count: 1
       machine: kubernetes-master-machine-arm
       subnets:
-        - availability_zone: eu-west-1a
-          address_pool: 10.1.1.0/24
-        - availability_zone: eu-west-1b
-          address_pool: 10.1.2.0/24
+        - address_pool: 10.1.1.0/24
     kubernetes_node:
       count: 3
       machine: kubernetes-node-machine-arm
       subnets:
-        - availability_zone: eu-west-1a
-          address_pool: 10.1.1.0/24
-        - availability_zone: eu-west-1b
-          address_pool: 10.1.2.0/24
+        - address_pool: 10.1.1.0/24
     load_balancer:
       count: 1
       machine: lb-machine-arm
       subnets:
-        - availability_zone: eu-west-1a
-          address_pool: 10.1.7.0/24
+        - address_pool: 10.1.7.0/24
     logging:
       count: 2
       machine: logging-machine-arm
       subnets:
-        - availability_zone: eu-west-1a
-          address_pool: 10.1.3.0/24
+        - address_pool: 10.1.3.0/24
     monitoring:
       count: 1
       machine: monitoring-machine-arm
       subnets:
-        - availability_zone: eu-west-1a
-          address_pool: 10.1.4.0/24
+        - address_pool: 10.1.4.0/24
     postgresql:
       count: 1
       machine: postgresql-machine-arm
       subnets:
-        - availability_zone: eu-west-1a
-          address_pool: 10.1.6.0/24
+        - address_pool: 10.1.6.0/24
     rabbitmq:
       count: 2
       machine: rabbitmq-machine-arm
       subnets:
-        - availability_zone: eu-west-1a
-          address_pool: 10.1.8.0/24
+        - address_pool: 10.1.8.0/24
     opendistro_for_elasticsearch:
       count: 1
       machine: opendistro-machine-arm
       subnets:
-        - availability_zone: eu-west-1a
-          address_pool: 10.1.10.0/24
+        - address_pool: 10.1.10.0/24
     repository:
       count: 1
       machine: repository-machine-arm
       subnets:
-        - availability_zone: eu-west-1a
-          address_pool: 10.1.11.0/24
+        - address_pool: 10.1.11.0/24
 ---
 kind: infrastructure/virtual-machine
 title: "Virtual Machine Infra"

diff --git a/docs/home/howto/SECURITY_GROUPS.md b/docs/home/howto/SECURITY_GROUPS.md
@@ -253,26 +253,19 @@ specification:
       machine: repository-machine
       configuration: default
       subnets:
-      - availability_zone: eu-central-1a
-        address_pool: 10.1.11.0/24
+      - address_pool: 10.1.11.0/24
     kubernetes_master:
       count: 1
       machine: kubernetes-master-machine
       configuration: default
       subnets:
-      - availability_zone: eu-central-1a
-        address_pool: 10.1.1.0/24
-      - availability_zone: eu-central-1b
-        address_pool: 10.1.2.0/24
+      - address_pool: 10.1.1.0/24
     kubernetes_node:
       count: 2
       machine: kubernetes-node-machine
       configuration: default
       subnets:
-      - availability_zone: eu-central-1a
-        address_pool: 10.1.1.0/24
-      - availability_zone: eu-central-1b
-        address_pool: 10.1.2.0/24
+      - address_pool: 10.1.1.0/24
     logging:
       count: 0
     monitoring:

diff --git a/docs/home/howto/UPGRADE.md b/docs/home/howto/UPGRADE.md
@@ -428,6 +428,7 @@ From Epiphany 1.x to 2.x the Terraform stack received the following major update
 - Terraform 0.12.6 to 1.1.3
 - Azurerm provider 1.38.0 to 2.91.0
 - AWS provider 2.26 to 3.71.0
+- Removal of auto-scaling-groups in favor of plain EC2 instances on AWS.
 
 These introduce some breaking changes which will require manual steps for upgrading an existing 1.x clusters. As this is not straight forward we recommend deploying a new cluster on 2.x and migrating data instead.
 
@@ -521,72 +522,4 @@ General steps:
 
 ### AWS
 
-Notes:
-- If you made any manual changes to your cluster infrastructure outside of Terraform this might cause issues.
-- Only run `terraform apply` if `terraform plan` shows your infrastructure does not match the configuration.
-- Manual Terraform ugrade up to v1.0.x should be completed before running `epicli apply` command with Epiphany 2.x.
-- Terraform can be installed as a binary package or by using package managers, see more: https://learn.hashicorp.com/tutorials/terraform/install-cli
-
-#### v0.12.6 => v0.13.x
-
-The official documentation can be found here: https://www.terraform.io/language/upgrade-guides/0-13
-
-General steps:
-- Download the latest Terraform v0.13.x: https://releases.hashicorp.com/terraform/
-- Run the following sets of commands in the `build/clustername/terraform` folder and follow the steps if asked:
-  ```shell
-  terraform init
-  terraform 0.13upgrade
-  terraform plan
-  terraform apply (if needed)
-  ```
-
-#### v0.13.x => v0.14.x
-
-The official documentation can be found here: https://www.terraform.io/language/upgrade-guides/0-14
-
-General steps:
-- Download the latest Terraform v0.14.x: https://releases.hashicorp.com/terraform/
-- Run the following sets of commands in the `build/clustername/terraform` folder and follow the steps if asked:
-  ```shell
-  terraform init
-  terraform plan
-  terraform apply (if needed)
-  ```
-
-#### v0.14.x => v1.0.x
-
-Note: From v0.14.x we can upgrade straight to v1.0.x. No need to upgrade to v0.15.x first.
-
-The official documentation can be found here: https://www.terraform.io/language/upgrade-guides/1-0
-
-General steps:
-- Download the latest Terraform v1.0.x: https://releases.hashicorp.com/terraform/
-- Run the following sets of commands in the `build/clustername/terraform` folder and follow the steps if asked:
-  ```shell
-  terraform init
-  terraform plan
-  terraform apply (if needed)
-  ```
-#### v1.0.x => v1.1.3
-
-In this step we also force the upgrade from AWS provider 2.26 to 3.71.0 which requires a few more steps to resolve some pending issues.
-At this point, the steps assume that you are already running Epiphany 2.x image.
-
-The official documentation can be found here: https://www.terraform.io/language/upgrade-guides/1-1
-
-General steps:
-- Run epicli to generate the new AWS provider Terraform scripts:
-  ```shell
-  epicli apply -f data.yml
-  ```
-  After the Terraform scripts generation `terraform init ...` will result in the following error:
-  `Error: Failed to query available provider packages`
-- To fix the issue from previous step manually run from the epicli container in `build/clustername/terraform`:
-  ```shell
-  terraform init -upgrade
-  ```
-- Now re-run epicli again:
-  ```shell
-  epicli apply -f data.yml
-  ```
+The Terraform for AWS deployments between Epiphany 1.x and 2.x is not compatible and migration is not possible without destruction of the enviroment. The only options is to deploy a new cluster and migrate the data.