Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Changed autoscaling_group to plain EC2 VMs on AWS. #2939

Merged
merged 34 commits into from
Feb 8, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
34 commits
Select commit Hold shift + click to select a range
80ba4ea
Initial rewrite of auto-scaling-groups to plain ec2 vms
seriva Jan 28, 2022
5c361af
Added VM validation
seriva Jan 28, 2022
ecec916
Removed/fixed unittests
seriva Jan 28, 2022
945db62
Updated documentation.
seriva Jan 28, 2022
3de0800
Added host sorting
seriva Jan 28, 2022
c01b5cf
Minor validation fix.
seriva Jan 28, 2022
0d9e3e1
Fix terraform recreation.
seriva Jan 28, 2022
b9562f8
Fix for inventory ordering.
seriva Jan 28, 2022
cd69120
Added support for use_network_security_groups
seriva Jan 29, 2022
7b3a2a7
Updated changelog
seriva Jan 31, 2022
11221a3
Fixed typo in every TF template.
seriva Jan 31, 2022
00d5da0
Minor spacing fixes in tests.
seriva Jan 31, 2022
97360d3
Add tests for AWS host ordering.
seriva Jan 31, 2022
6bef2c8
Merge branch 'develop' into feature/2853
seriva Jan 31, 2022
a04ba36
Fix minor typo
seriva Feb 1, 2022
574e8a1
Update cli/engine/providers/aws/InfrastructureBuilder.py
seriva Feb 2, 2022
65e17af
Update cli/engine/providers/azure/InfrastructureBuilder.py
seriva Feb 2, 2022
3572726
Fixed minor typo.
seriva Feb 2, 2022
a610f64
Fixed typo.
seriva Feb 4, 2022
e6ff948
Fixed machine naming
seriva Feb 4, 2022
ca10171
Fixed names for security groups
seriva Feb 4, 2022
4b3d3e7
Minor typo fix.
seriva Feb 7, 2022
508a5c9
Add os volume name label
seriva Feb 7, 2022
8b30849
Synced tags across resources.
seriva Feb 7, 2022
7e29c25
Added support for disks
seriva Feb 7, 2022
3e9a5ae
Tagged datadisks
seriva Feb 7, 2022
d970e9c
Minor fix for device name
seriva Feb 7, 2022
472714a
Added name component for data disks
seriva Feb 7, 2022
d2dab59
Sync disk naming with Azure disk naming.
seriva Feb 7, 2022
1ed31a9
Fixed line indentations;)
seriva Feb 8, 2022
119eccd
Fixed to typo.
seriva Feb 8, 2022
1423b57
Removed whitespace.
seriva Feb 8, 2022
fa0d7fd
Updated DoD for bug reports.
seriva Feb 8, 2022
51e9332
Use index0 over index for datadisks
seriva Feb 8, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
33 changes: 23 additions & 10 deletions .github/ISSUE_TEMPLATE/bug-report.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,13 +35,26 @@ Add any other context about the problem here.

**DoD checklist**

* [ ] Changelog updated (if affected version was released)
* [ ] COMPONENTS.md updated / doesn't need to be updated
* [ ] Automated tests passed (QA pipelines)
* [ ] apply
* [ ] upgrade
* [ ] Case covered by automated test (if possible)
* [ ] Idempotency tested
* [ ] Documentation updated / doesn't need to be updated
* [ ] All conversations in PR resolved
* [ ] Backport tasks created / doesn't need to be backported
- Changelog
- [ ] updated
- [ ] not needed
- COMPONENTS.md
- [ ] updated
- [ ] not needed
- Schema
- [ ] updated
- [ ] not needed
- Backport tasks
- [ ] created
- [ ] not needed
- Documentation
- [ ] added
- [ ] updated
- [ ] not needed
- [ ] Feature has automated tests
- [ ] Automated tests passed (QA pipelines)
- [ ] apply
- [ ] upgrade
- [ ] backup/restore
- [ ] Idempotency tested
- [ ] All conversations in PR resolved
23 changes: 14 additions & 9 deletions cli/engine/providers/aws/APIProxy.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

from cli.helpers.doc_list_helpers import select_single
from cli.helpers.objdict_helpers import dict_to_objdict
from cli.models.AnsibleHostModel import AnsibleHostModel
from cli.models.AnsibleHostModel import AnsibleOrderedHostModel


class APIProxy:
Expand All @@ -26,9 +26,7 @@ def get_ips_for_feature(self, component_key):
cluster_name = self.cluster_model.specification.name.lower()
look_for_public_ip = self.cluster_model.specification.cloud.use_public_ips
vpc_id = self.get_vpc_id()

ec2 = self.session.resource('ec2')
running_instances = ec2.instances.filter(
running_instances = self.session.resource('ec2').instances.filter(
Filters=[{
'Name': 'instance-state-name',
'Values': ['running']
Expand All @@ -37,21 +35,28 @@ def get_ips_for_feature(self, component_key):
'Values': [vpc_id]
},
{
'Name': 'tag:'+component_key,
'Values': ['']
'Name': 'tag:component_key',
'Values': [component_key]
},
{
'Name': 'tag:cluster_name',
'Values': [cluster_name]
}]
)

result = []
result: List[AnsibleOrderedHostModel] = []

for instance in running_instances:
hostname = ''
for tag in instance.tags:
if tag['Key'] == 'Name':
hostname = tag['Value']
if look_for_public_ip:
result.append(AnsibleHostModel(instance.public_dns_name, instance.public_ip_address))
result.append(AnsibleOrderedHostModel(hostname, instance.public_ip_address))
else:
result.append(AnsibleHostModel(instance.private_dns_name, instance.private_ip_address))
result.append(AnsibleOrderedHostModel(hostname, instance.private_ip_address))

result.sort()
return result

def get_image_id(self, os_full_name):
Expand Down
178 changes: 81 additions & 97 deletions cli/engine/providers/aws/InfrastructureBuilder.py

Large diffs are not rendered by default.

14 changes: 5 additions & 9 deletions cli/engine/providers/azure/InfrastructureBuilder.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,11 @@ def __init__(self, docs, manifest_docs=[]):
self.docs = docs
self.manifest_docs = manifest_docs

# If there are no security groups Ansible provisioning will fail because
# SSH is not allowed then with public IPs on Azure.
if not(self.use_network_security_groups) and self.use_public_ips:
self.logger.warning('Use of security groups has been disabled and public IPs are used. Ansible run will fail because SSH will not be allowed.')

# Check if there is a hostname_domain_extension we already applied and we want to retain.
# The same as VM images we want to preserve hostname_domain_extension over versions.
self.hostname_domain_extension = self.cluster_model.specification.cloud.hostname_domain_extension
Expand Down Expand Up @@ -61,19 +66,10 @@ def run(self):
# Set property that controls cloud-init.
vm_config.specification['use_cloud_init_custom_data'] = cloud_init_custom_data.specification.enabled

# If there are no security groups Ansible provisioning will fail because
# SSH is not allowed then with public IPs on Azure.
if not(self.use_network_security_groups) and self.use_public_ips:
self.logger.warning('Use of security groups has been disabled and public IP are used. Ansible run will fail because SSH will not be allowed.')

# For now only one subnet per component.
if (len(component_value.subnets) > 1):
self.logger.warning('On Azure only one subnet per component is supported for now. Taking first and ignoring others.')

# Add message for ignoring availabiltity zones if present.
if 'availability_zone' in component_value.subnets[0]:
self.logger.warning('On Azure availability_zones are not supported yet. Ignoring definition.')

subnet_definition = component_value.subnets[0]
subnet = select_first(infrastructure, lambda item: item.kind == 'infrastructure/subnet' and
item.specification.address_prefix == subnet_definition['address_pool'])
Expand Down
4 changes: 3 additions & 1 deletion docs/changelogs/CHANGELOG-2.0.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,13 +4,15 @@

### Added

- [#959](https://github.com/epiphany-platform/epiphany/issues/959) - Add usage of use_network_security_groups to disable NSG on AWS
- [#2701](https://github.com/epiphany-platform/epiphany/issues/2701) - Epicli prepare - generate files in separate directory

### Fixed

- [#2653](https://github.com/epiphany-platform/epiphany/issues/2653) - Epicli is failing in air-gapped infra mode
- [#1569](https://github.com/epiphany-platform/epiphany/issues/1569) - Azure unmanaged disks not supported by Epiphany but there is misleading setting in the default configuration
- [#2832](https://github.com/epiphany-platform/epiphany/issues/2832) - Make the DoD checklist clear
- [#2853](https://github.com/epiphany-platform/epiphany/issues/2853) - Change autoscaling_group approach in AWS provider in favor of plain VM creation.

### Updated

Expand All @@ -32,6 +34,6 @@

### Breaking changes

- Upgrade of Terraform components in issue [#2825](https://github.com/epiphany-platform/epiphany/issues/2825) will make running re-apply with infrastructure break on existing 1.x clusters. The advice is to deploy a new cluster and migrate data. If needed a manual upgrade path is described [here.](../home/howto/UPGRADE.md#terraform-upgrade-from-epiphany-1.x-to-2.x)
- Upgrade of Terraform components in issue [#2825](https://github.com/epiphany-platform/epiphany/issues/2825) and [#2853](https://github.com/epiphany-platform/epiphany/issues/2853) will make running re-apply with infrastructure break on existing 1.x clusters. The advice is to deploy a new cluster and migrate data. If needed a manual upgrade path is described [here.](../home/howto/UPGRADE.md#terraform-upgrade-from-epiphany-1.x-to-2.x)

### Known issues
34 changes: 10 additions & 24 deletions docs/home/ARM.md
Original file line number Diff line number Diff line change
Expand Up @@ -279,66 +279,52 @@ specification:
count: 2
machine: kafka-machine-arm
subnets:
- availability_zone: eu-west-1a
address_pool: 10.1.5.0/24
- address_pool: 10.1.5.0/24
kubernetes_master:
count: 1
machine: kubernetes-master-machine-arm
subnets:
- availability_zone: eu-west-1a
address_pool: 10.1.1.0/24
- availability_zone: eu-west-1b
address_pool: 10.1.2.0/24
- address_pool: 10.1.1.0/24
kubernetes_node:
count: 3
machine: kubernetes-node-machine-arm
subnets:
- availability_zone: eu-west-1a
address_pool: 10.1.1.0/24
- availability_zone: eu-west-1b
address_pool: 10.1.2.0/24
- address_pool: 10.1.1.0/24
load_balancer:
count: 1
machine: lb-machine-arm
subnets:
- availability_zone: eu-west-1a
address_pool: 10.1.7.0/24
- address_pool: 10.1.7.0/24
logging:
count: 2
machine: logging-machine-arm
subnets:
- availability_zone: eu-west-1a
address_pool: 10.1.3.0/24
- address_pool: 10.1.3.0/24
monitoring:
count: 1
machine: monitoring-machine-arm
subnets:
- availability_zone: eu-west-1a
address_pool: 10.1.4.0/24
- address_pool: 10.1.4.0/24
postgresql:
count: 1
machine: postgresql-machine-arm
subnets:
- availability_zone: eu-west-1a
address_pool: 10.1.6.0/24
- address_pool: 10.1.6.0/24
rabbitmq:
count: 2
machine: rabbitmq-machine-arm
subnets:
- availability_zone: eu-west-1a
address_pool: 10.1.8.0/24
- address_pool: 10.1.8.0/24
opendistro_for_elasticsearch:
count: 1
machine: opendistro-machine-arm
subnets:
- availability_zone: eu-west-1a
address_pool: 10.1.10.0/24
- address_pool: 10.1.10.0/24
repository:
count: 1
machine: repository-machine-arm
subnets:
- availability_zone: eu-west-1a
address_pool: 10.1.11.0/24
- address_pool: 10.1.11.0/24
---
kind: infrastructure/virtual-machine
title: "Virtual Machine Infra"
Expand Down
13 changes: 3 additions & 10 deletions docs/home/howto/SECURITY_GROUPS.md
Original file line number Diff line number Diff line change
Expand Up @@ -253,26 +253,19 @@ specification:
machine: repository-machine
configuration: default
subnets:
- availability_zone: eu-central-1a
address_pool: 10.1.11.0/24
- address_pool: 10.1.11.0/24
kubernetes_master:
count: 1
machine: kubernetes-master-machine
configuration: default
subnets:
- availability_zone: eu-central-1a
address_pool: 10.1.1.0/24
- availability_zone: eu-central-1b
address_pool: 10.1.2.0/24
- address_pool: 10.1.1.0/24
kubernetes_node:
count: 2
machine: kubernetes-node-machine
configuration: default
subnets:
- availability_zone: eu-central-1a
address_pool: 10.1.1.0/24
- availability_zone: eu-central-1b
address_pool: 10.1.2.0/24
- address_pool: 10.1.1.0/24
logging:
count: 0
monitoring:
Expand Down
71 changes: 2 additions & 69 deletions docs/home/howto/UPGRADE.md
Original file line number Diff line number Diff line change
Expand Up @@ -428,6 +428,7 @@ From Epiphany 1.x to 2.x the Terraform stack received the following major update
- Terraform 0.12.6 to 1.1.3
- Azurerm provider 1.38.0 to 2.91.0
- AWS provider 2.26 to 3.71.0
- Removal of auto-scaling-groups in favor of plain EC2 instances on AWS.

These introduce some breaking changes which will require manual steps for upgrading an existing 1.x clusters. As this is not straight forward we recommend deploying a new cluster on 2.x and migrating data instead.

Expand Down Expand Up @@ -521,72 +522,4 @@ General steps:

### AWS

Notes:
- If you made any manual changes to your cluster infrastructure outside of Terraform this might cause issues.
- Only run `terraform apply` if `terraform plan` shows your infrastructure does not match the configuration.
- Manual Terraform ugrade up to v1.0.x should be completed before running `epicli apply` command with Epiphany 2.x.
- Terraform can be installed as a binary package or by using package managers, see more: https://learn.hashicorp.com/tutorials/terraform/install-cli

#### v0.12.6 => v0.13.x

The official documentation can be found here: https://www.terraform.io/language/upgrade-guides/0-13

General steps:
- Download the latest Terraform v0.13.x: https://releases.hashicorp.com/terraform/
- Run the following sets of commands in the `build/clustername/terraform` folder and follow the steps if asked:
```shell
terraform init
terraform 0.13upgrade
terraform plan
terraform apply (if needed)
```

#### v0.13.x => v0.14.x

The official documentation can be found here: https://www.terraform.io/language/upgrade-guides/0-14

General steps:
- Download the latest Terraform v0.14.x: https://releases.hashicorp.com/terraform/
- Run the following sets of commands in the `build/clustername/terraform` folder and follow the steps if asked:
```shell
terraform init
terraform plan
terraform apply (if needed)
```

#### v0.14.x => v1.0.x

Note: From v0.14.x we can upgrade straight to v1.0.x. No need to upgrade to v0.15.x first.

The official documentation can be found here: https://www.terraform.io/language/upgrade-guides/1-0

General steps:
- Download the latest Terraform v1.0.x: https://releases.hashicorp.com/terraform/
- Run the following sets of commands in the `build/clustername/terraform` folder and follow the steps if asked:
```shell
terraform init
terraform plan
terraform apply (if needed)
```
#### v1.0.x => v1.1.3

In this step we also force the upgrade from AWS provider 2.26 to 3.71.0 which requires a few more steps to resolve some pending issues.
At this point, the steps assume that you are already running Epiphany 2.x image.

The official documentation can be found here: https://www.terraform.io/language/upgrade-guides/1-1

General steps:
- Run epicli to generate the new AWS provider Terraform scripts:
```shell
epicli apply -f data.yml
```
After the Terraform scripts generation `terraform init ...` will result in the following error:
`Error: Failed to query available provider packages`
- To fix the issue from previous step manually run from the epicli container in `build/clustername/terraform`:
```shell
terraform init -upgrade
```
- Now re-run epicli again:
```shell
epicli apply -f data.yml
```
The Terraform for AWS deployments between Epiphany 1.x and 2.x is not compatible and migration is not possible without destruction of the enviroment. The only options is to deploy a new cluster and migrate the data.
Loading