Skip to content

Commit

Permalink
Changed autoscaling_group to plain EC2 VMs on AWS. (hitachienergy#2939)
Browse files Browse the repository at this point in the history
- Replaced AWS auto_scaling_groups with plain EC2 VM creation.
- Added proper host sorting how it was implemented for the any and azure providers: hitachienergy#1076
- Sync up features with Azure Terraform implementation
-  Added support for use_network_security_groups flag hitachienergy#959
- Updated DoD for bugs to reflect changes made for hitachienergy#2832
  • Loading branch information
seriva authored and rafzei committed Feb 8, 2022
1 parent 8ff1ec1 commit 11465af
Show file tree
Hide file tree
Showing 49 changed files with 315 additions and 529 deletions.
33 changes: 23 additions & 10 deletions .github/ISSUE_TEMPLATE/bug-report.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,13 +35,26 @@ Add any other context about the problem here.

**DoD checklist**

* [ ] Changelog updated (if affected version was released)
* [ ] COMPONENTS.md updated / doesn't need to be updated
* [ ] Automated tests passed (QA pipelines)
* [ ] apply
* [ ] upgrade
* [ ] Case covered by automated test (if possible)
* [ ] Idempotency tested
* [ ] Documentation updated / doesn't need to be updated
* [ ] All conversations in PR resolved
* [ ] Backport tasks created / doesn't need to be backported
- Changelog
- [ ] updated
- [ ] not needed
- COMPONENTS.md
- [ ] updated
- [ ] not needed
- Schema
- [ ] updated
- [ ] not needed
- Backport tasks
- [ ] created
- [ ] not needed
- Documentation
- [ ] added
- [ ] updated
- [ ] not needed
- [ ] Feature has automated tests
- [ ] Automated tests passed (QA pipelines)
- [ ] apply
- [ ] upgrade
- [ ] backup/restore
- [ ] Idempotency tested
- [ ] All conversations in PR resolved
23 changes: 14 additions & 9 deletions cli/engine/providers/aws/APIProxy.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

from cli.helpers.doc_list_helpers import select_single
from cli.helpers.objdict_helpers import dict_to_objdict
from cli.models.AnsibleHostModel import AnsibleHostModel
from cli.models.AnsibleHostModel import AnsibleOrderedHostModel


class APIProxy:
Expand All @@ -26,9 +26,7 @@ def get_ips_for_feature(self, component_key):
cluster_name = self.cluster_model.specification.name.lower()
look_for_public_ip = self.cluster_model.specification.cloud.use_public_ips
vpc_id = self.get_vpc_id()

ec2 = self.session.resource('ec2')
running_instances = ec2.instances.filter(
running_instances = self.session.resource('ec2').instances.filter(
Filters=[{
'Name': 'instance-state-name',
'Values': ['running']
Expand All @@ -37,21 +35,28 @@ def get_ips_for_feature(self, component_key):
'Values': [vpc_id]
},
{
'Name': 'tag:'+component_key,
'Values': ['']
'Name': 'tag:component_key',
'Values': [component_key]
},
{
'Name': 'tag:cluster_name',
'Values': [cluster_name]
}]
)

result = []
result: List[AnsibleOrderedHostModel] = []

for instance in running_instances:
hostname = ''
for tag in instance.tags:
if tag['Key'] == 'Name':
hostname = tag['Value']
if look_for_public_ip:
result.append(AnsibleHostModel(instance.public_dns_name, instance.public_ip_address))
result.append(AnsibleOrderedHostModel(hostname, instance.public_ip_address))
else:
result.append(AnsibleHostModel(instance.private_dns_name, instance.private_ip_address))
result.append(AnsibleOrderedHostModel(hostname, instance.private_ip_address))

result.sort()
return result

def get_image_id(self, os_full_name):
Expand Down
178 changes: 81 additions & 97 deletions cli/engine/providers/aws/InfrastructureBuilder.py

Large diffs are not rendered by default.

14 changes: 5 additions & 9 deletions cli/engine/providers/azure/InfrastructureBuilder.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,11 @@ def __init__(self, docs, manifest_docs=[]):
self.docs = docs
self.manifest_docs = manifest_docs

# If there are no security groups Ansible provisioning will fail because
# SSH is not allowed then with public IPs on Azure.
if not(self.use_network_security_groups) and self.use_public_ips:
self.logger.warning('Use of security groups has been disabled and public IPs are used. Ansible run will fail because SSH will not be allowed.')

# Check if there is a hostname_domain_extension we already applied and we want to retain.
# The same as VM images we want to preserve hostname_domain_extension over versions.
self.hostname_domain_extension = self.cluster_model.specification.cloud.hostname_domain_extension
Expand Down Expand Up @@ -61,19 +66,10 @@ def run(self):
# Set property that controls cloud-init.
vm_config.specification['use_cloud_init_custom_data'] = cloud_init_custom_data.specification.enabled

# If there are no security groups Ansible provisioning will fail because
# SSH is not allowed then with public IPs on Azure.
if not(self.use_network_security_groups) and self.use_public_ips:
self.logger.warning('Use of security groups has been disabled and public IP are used. Ansible run will fail because SSH will not be allowed.')

# For now only one subnet per component.
if (len(component_value.subnets) > 1):
self.logger.warning('On Azure only one subnet per component is supported for now. Taking first and ignoring others.')

# Add message for ignoring availabiltity zones if present.
if 'availability_zone' in component_value.subnets[0]:
self.logger.warning('On Azure availability_zones are not supported yet. Ignoring definition.')

subnet_definition = component_value.subnets[0]
subnet = select_first(infrastructure, lambda item: item.kind == 'infrastructure/subnet' and
item.specification.address_prefix == subnet_definition['address_pool'])
Expand Down
6 changes: 3 additions & 3 deletions docs/changelogs/CHANGELOG-2.0.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@

### Added

- [#959](https://github.com/epiphany-platform/epiphany/issues/959) - Add usage of use_network_security_groups to disable NSG on AWS
- [#2701](https://github.com/epiphany-platform/epiphany/issues/2701) - Epicli prepare - generate files in separate directory
- [#2812](https://github.com/epiphany-platform/epiphany/issues/2812) - Extend K8s config validation

Expand All @@ -12,6 +13,7 @@
- [#2653](https://github.com/epiphany-platform/epiphany/issues/2653) - Epicli is failing in air-gapped infra mode
- [#1569](https://github.com/epiphany-platform/epiphany/issues/1569) - Azure unmanaged disks not supported by Epiphany but there is misleading setting in the default configuration
- [#2832](https://github.com/epiphany-platform/epiphany/issues/2832) - Make the DoD checklist clear
- [#2853](https://github.com/epiphany-platform/epiphany/issues/2853) - Change autoscaling_group approach in AWS provider in favor of plain VM creation.

- [#2669](https://github.com/epiphany-platform/epiphany/issues/2669) - Restarting the installation process can cause certificate problems if K8s was not fully configured

Expand Down Expand Up @@ -41,8 +43,6 @@

### Breaking changes

- Upgrade of Terraform components in issue [#2825](https://github.com/epiphany-platform/epiphany/issues/2825) will make running re-apply with infrastructure break on existing 1.x clusters. The advice is to deploy a new cluster and migrate data. If needed a manual upgrade path is described [here.](../home/howto/UPGRADE.md#terraform-upgrade-from-epiphany-1.x-to-2.x)
- Kubernetes container runtime changed. Dockershim and Docker are no longer on Kubernetes hosts.
- Filebeat docker input replaced by container input with new fields.
- Upgrade of Terraform components in issue [#2825](https://github.com/epiphany-platform/epiphany/issues/2825) and [#2853](https://github.com/epiphany-platform/epiphany/issues/2853) will make running re-apply with infrastructure break on existing 1.x clusters. The advice is to deploy a new cluster and migrate data. If needed a manual upgrade path is described [here.](../home/howto/UPGRADE.md#terraform-upgrade-from-epiphany-1.x-to-2.x)

### Known issues
34 changes: 10 additions & 24 deletions docs/home/ARM.md
Original file line number Diff line number Diff line change
Expand Up @@ -279,66 +279,52 @@ specification:
count: 2
machine: kafka-machine-arm
subnets:
- availability_zone: eu-west-1a
address_pool: 10.1.5.0/24
- address_pool: 10.1.5.0/24
kubernetes_master:
count: 1
machine: kubernetes-master-machine-arm
subnets:
- availability_zone: eu-west-1a
address_pool: 10.1.1.0/24
- availability_zone: eu-west-1b
address_pool: 10.1.2.0/24
- address_pool: 10.1.1.0/24
kubernetes_node:
count: 3
machine: kubernetes-node-machine-arm
subnets:
- availability_zone: eu-west-1a
address_pool: 10.1.1.0/24
- availability_zone: eu-west-1b
address_pool: 10.1.2.0/24
- address_pool: 10.1.1.0/24
load_balancer:
count: 1
machine: lb-machine-arm
subnets:
- availability_zone: eu-west-1a
address_pool: 10.1.7.0/24
- address_pool: 10.1.7.0/24
logging:
count: 2
machine: logging-machine-arm
subnets:
- availability_zone: eu-west-1a
address_pool: 10.1.3.0/24
- address_pool: 10.1.3.0/24
monitoring:
count: 1
machine: monitoring-machine-arm
subnets:
- availability_zone: eu-west-1a
address_pool: 10.1.4.0/24
- address_pool: 10.1.4.0/24
postgresql:
count: 1
machine: postgresql-machine-arm
subnets:
- availability_zone: eu-west-1a
address_pool: 10.1.6.0/24
- address_pool: 10.1.6.0/24
rabbitmq:
count: 2
machine: rabbitmq-machine-arm
subnets:
- availability_zone: eu-west-1a
address_pool: 10.1.8.0/24
- address_pool: 10.1.8.0/24
opendistro_for_elasticsearch:
count: 1
machine: opendistro-machine-arm
subnets:
- availability_zone: eu-west-1a
address_pool: 10.1.10.0/24
- address_pool: 10.1.10.0/24
repository:
count: 1
machine: repository-machine-arm
subnets:
- availability_zone: eu-west-1a
address_pool: 10.1.11.0/24
- address_pool: 10.1.11.0/24
---
kind: infrastructure/virtual-machine
title: "Virtual Machine Infra"
Expand Down
13 changes: 3 additions & 10 deletions docs/home/howto/SECURITY_GROUPS.md
Original file line number Diff line number Diff line change
Expand Up @@ -253,26 +253,19 @@ specification:
machine: repository-machine
configuration: default
subnets:
- availability_zone: eu-central-1a
address_pool: 10.1.11.0/24
- address_pool: 10.1.11.0/24
kubernetes_master:
count: 1
machine: kubernetes-master-machine
configuration: default
subnets:
- availability_zone: eu-central-1a
address_pool: 10.1.1.0/24
- availability_zone: eu-central-1b
address_pool: 10.1.2.0/24
- address_pool: 10.1.1.0/24
kubernetes_node:
count: 2
machine: kubernetes-node-machine
configuration: default
subnets:
- availability_zone: eu-central-1a
address_pool: 10.1.1.0/24
- availability_zone: eu-central-1b
address_pool: 10.1.2.0/24
- address_pool: 10.1.1.0/24
logging:
count: 0
monitoring:
Expand Down
71 changes: 2 additions & 69 deletions docs/home/howto/UPGRADE.md
Original file line number Diff line number Diff line change
Expand Up @@ -428,6 +428,7 @@ From Epiphany 1.x to 2.x the Terraform stack received the following major update
- Terraform 0.12.6 to 1.1.3
- Azurerm provider 1.38.0 to 2.91.0
- AWS provider 2.26 to 3.71.0
- Removal of auto-scaling-groups in favor of plain EC2 instances on AWS.
These introduce some breaking changes which will require manual steps for upgrading an existing 1.x clusters. As this is not straight forward we recommend deploying a new cluster on 2.x and migrating data instead.
Expand Down Expand Up @@ -521,72 +522,4 @@ General steps:
### AWS
Notes:
- If you made any manual changes to your cluster infrastructure outside of Terraform this might cause issues.
- Only run `terraform apply` if `terraform plan` shows your infrastructure does not match the configuration.
- Manual Terraform ugrade up to v1.0.x should be completed before running `epicli apply` command with Epiphany 2.x.
- Terraform can be installed as a binary package or by using package managers, see more: https://learn.hashicorp.com/tutorials/terraform/install-cli
#### v0.12.6 => v0.13.x
The official documentation can be found here: https://www.terraform.io/language/upgrade-guides/0-13
General steps:
- Download the latest Terraform v0.13.x: https://releases.hashicorp.com/terraform/
- Run the following sets of commands in the `build/clustername/terraform` folder and follow the steps if asked:
```shell
terraform init
terraform 0.13upgrade
terraform plan
terraform apply (if needed)
```
#### v0.13.x => v0.14.x
The official documentation can be found here: https://www.terraform.io/language/upgrade-guides/0-14
General steps:
- Download the latest Terraform v0.14.x: https://releases.hashicorp.com/terraform/
- Run the following sets of commands in the `build/clustername/terraform` folder and follow the steps if asked:
```shell
terraform init
terraform plan
terraform apply (if needed)
```
#### v0.14.x => v1.0.x
Note: From v0.14.x we can upgrade straight to v1.0.x. No need to upgrade to v0.15.x first.
The official documentation can be found here: https://www.terraform.io/language/upgrade-guides/1-0
General steps:
- Download the latest Terraform v1.0.x: https://releases.hashicorp.com/terraform/
- Run the following sets of commands in the `build/clustername/terraform` folder and follow the steps if asked:
```shell
terraform init
terraform plan
terraform apply (if needed)
```
#### v1.0.x => v1.1.3
In this step we also force the upgrade from AWS provider 2.26 to 3.71.0 which requires a few more steps to resolve some pending issues.
At this point, the steps assume that you are already running Epiphany 2.x image.
The official documentation can be found here: https://www.terraform.io/language/upgrade-guides/1-1
General steps:
- Run epicli to generate the new AWS provider Terraform scripts:
```shell
epicli apply -f data.yml
```
After the Terraform scripts generation `terraform init ...` will result in the following error:
`Error: Failed to query available provider packages`
- To fix the issue from previous step manually run from the epicli container in `build/clustername/terraform`:
```shell
terraform init -upgrade
```
- Now re-run epicli again:
```shell
epicli apply -f data.yml
```
The Terraform for AWS deployments between Epiphany 1.x and 2.x is not compatible and migration is not possible without destruction of the enviroment. The only options is to deploy a new cluster and migrate the data.
Loading

0 comments on commit 11465af

Please sign in to comment.