-
Notifications
You must be signed in to change notification settings - Fork 0
Add a bastion host and place instances + kube apiserver behind it #1
Changes from all commits
ed22caa
d28d0d3
3264259
349b888
84830f3
6f0bef4
67f6443
efcbdde
bf2b102
58af913
ce83787
d3a9913
0f9d958
852baa2
0920644
dcc5a1e
ddb62f4
c65c6b9
609e828
75bdaba
3f2a94f
61424d5
644ef35
4c6ca61
b90776d
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,169 @@ | ||
resource "aws_autoscaling_group" "bastion" { | ||
name = "${var.cluster_name}-bastion ${aws_launch_configuration.bastion.name}" | ||
|
||
# count | ||
desired_capacity = "${var.bastion_count}" | ||
min_size = "${var.bastion_count}" | ||
max_size = "${var.bastion_count}" | ||
default_cooldown = 30 | ||
health_check_grace_period = 30 | ||
|
||
# network | ||
vpc_zone_identifier = ["${aws_subnet.private.*.id}"] | ||
|
||
# template | ||
launch_configuration = "${aws_launch_configuration.bastion.name}" | ||
|
||
# target groups to which instances should be added | ||
target_group_arns = [ | ||
"${aws_lb_target_group.bastion.id}" | ||
] | ||
|
||
min_elb_capacity = 1 | ||
|
||
lifecycle { | ||
# override the default destroy and replace update behavior | ||
create_before_destroy = true | ||
ignore_changes = ["image_id"] | ||
} | ||
|
||
tags = [{ | ||
key = "Name" | ||
value = "${var.cluster_name}-bastion" | ||
propagate_at_launch = true | ||
}] | ||
} | ||
|
||
resource "aws_launch_configuration" "bastion" { | ||
image_id = "${data.aws_ami.coreos.image_id}" | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. since this is not running kubernetes or docker, maybe use AWS linux AMI? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That means using different entirely different configuration strategy than the other machines. Most importantly I copied the container linux configs from controllers/workers. Configuring AWS linux is different enough that I really don't think it's worth it. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I do. |
||
instance_type = "${var.bastion_type}" | ||
|
||
user_data = "${data.ct_config.bastion_ign.rendered}" | ||
|
||
# network | ||
security_groups = [ | ||
"${aws_security_group.bastion_external.id}" | ||
] | ||
|
||
lifecycle { | ||
// Override the default destroy and replace update behavior | ||
create_before_destroy = true | ||
} | ||
} | ||
|
||
data "template_file" "bastion_config" { | ||
template = "${file("${path.module}/cl/bastion.yaml.tmpl")}" | ||
|
||
vars = { | ||
ssh_authorized_keys_list = "[ ${join(", ", var.ssh_authorized_keys)} ]" | ||
} | ||
} | ||
|
||
data "ct_config" "bastion_ign" { | ||
content = "${data.template_file.bastion_config.rendered}" | ||
pretty_print = false | ||
} | ||
|
||
resource "aws_security_group" "bastion_external" { | ||
name_prefix = "${var.cluster_name}-bastion-external-" | ||
description = "Allows access to the bastion from the internet" | ||
|
||
vpc_id = "${aws_vpc.network.id}" | ||
|
||
tags { | ||
Name = "${var.cluster_name}-bastion-external" | ||
} | ||
|
||
ingress { | ||
protocol = "tcp" | ||
from_port = 22 | ||
to_port = 22 | ||
cidr_blocks = ["0.0.0.0/0"] | ||
} | ||
|
||
egress { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. needed? isn't this default? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In AWS yes, but terraform overrides that (which I prefer) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. good to know, thanks |
||
protocol = -1 | ||
from_port = 0 | ||
to_port = 0 | ||
cidr_blocks = ["0.0.0.0/0"] | ||
} | ||
|
||
lifecycle { | ||
create_before_destroy = true | ||
} | ||
} | ||
|
||
resource "aws_security_group" "bastion_internal" { | ||
name_prefix = "${var.cluster_name}-bastion-internal-" | ||
description = "Allows access to a host from the bastion" | ||
|
||
vpc_id = "${aws_vpc.network.id}" | ||
|
||
tags { | ||
Name = "${var.cluster_name}-bastion-internal" | ||
} | ||
|
||
ingress { | ||
protocol = "tcp" | ||
from_port = 22 | ||
to_port = 22 | ||
security_groups = ["${aws_security_group.bastion_external.id}"] | ||
} | ||
|
||
lifecycle { | ||
create_before_destroy = true | ||
} | ||
} | ||
|
||
resource "aws_lb" "bastion" { | ||
name = "${var.cluster_name}-bastion" | ||
load_balancer_type = "network" | ||
|
||
subnets = ["${aws_subnet.public.*.id}"] | ||
} | ||
|
||
|
||
resource "aws_lb_listener" "bastion" { | ||
load_balancer_arn = "${aws_lb.bastion.arn}" | ||
protocol = "TCP" | ||
port = "22" | ||
|
||
default_action { | ||
type = "forward" | ||
target_group_arn = "${aws_lb_target_group.bastion.arn}" | ||
} | ||
} | ||
|
||
resource "aws_lb_target_group" "bastion" { | ||
name = "${var.cluster_name}-bastion" | ||
vpc_id = "${aws_vpc.network.id}" | ||
target_type = "instance" | ||
|
||
protocol = "TCP" | ||
port = 22 | ||
|
||
health_check { | ||
protocol = "TCP" | ||
port = 22 | ||
|
||
healthy_threshold = 3 | ||
unhealthy_threshold = 3 | ||
|
||
interval = 10 | ||
} | ||
} | ||
|
||
resource "aws_route53_record" "bastion" { | ||
depends_on = ["aws_autoscaling_group.bastion"] | ||
|
||
zone_id = "${var.dns_zone_id}" | ||
|
||
name = "${format("bastion.%s.%s.", var.cluster_name, var.dns_zone)}" | ||
type = "A" | ||
|
||
alias { | ||
name = "${aws_lb.bastion.dns_name}" | ||
zone_id = "${aws_lb.bastion.zone_id}" | ||
evaluate_target_health = false | ||
} | ||
} |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
--- | ||
passwd: | ||
users: | ||
- name: core | ||
ssh_authorized_keys: ${ssh_authorized_keys_list} |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -7,37 +7,61 @@ resource "aws_route53_record" "apiserver" { | |
|
||
# AWS recommends their special "alias" records for ELBs | ||
alias { | ||
name = "${aws_elb.apiserver.dns_name}" | ||
zone_id = "${aws_elb.apiserver.zone_id}" | ||
name = "${aws_lb.apiserver.dns_name}" | ||
zone_id = "${aws_lb.apiserver.zone_id}" | ||
evaluate_target_health = true | ||
} | ||
} | ||
|
||
# Controller Network Load Balancer | ||
resource "aws_elb" "apiserver" { | ||
name = "${var.cluster_name}-apiserver" | ||
subnets = ["${aws_subnet.public.*.id}"] | ||
security_groups = ["${aws_security_group.controller.id}"] | ||
|
||
listener { | ||
lb_port = 443 | ||
lb_protocol = "tcp" | ||
instance_port = 443 | ||
instance_protocol = "tcp" | ||
# Network Load Balancer for apiservers | ||
resource "aws_lb" "apiserver" { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. "apiserver" could create a lot of confusion with our actual API. Let's maybe rename to "k8s_apiserver" ? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Definitely possible but I think we need to save this discussion. I'd rather wait on changing other bits of typhoon that don't have to do with private networking if we have concerns about naming confusion. |
||
name = "${var.cluster_name}-apiserver" | ||
load_balancer_type = "network" | ||
internal = true | ||
|
||
subnets = ["${aws_subnet.private.*.id}"] | ||
} | ||
|
||
# Forward HTTP traffic to controllers | ||
resource "aws_lb_listener" "apiserver-https" { | ||
load_balancer_arn = "${aws_lb.apiserver.arn}" | ||
protocol = "TCP" | ||
port = "443" | ||
|
||
default_action { | ||
type = "forward" | ||
target_group_arn = "${aws_lb_target_group.controllers.arn}" | ||
} | ||
} | ||
|
||
instances = ["${aws_instance.controllers.*.id}"] | ||
# Target group of controllers | ||
resource "aws_lb_target_group" "controllers" { | ||
name = "${var.cluster_name}-controllers" | ||
vpc_id = "${aws_vpc.network.id}" | ||
target_type = "ip" | ||
|
||
protocol = "TCP" | ||
port = 443 | ||
|
||
# Kubelet HTTP health check | ||
health_check { | ||
target = "SSL:443" | ||
healthy_threshold = 2 | ||
unhealthy_threshold = 4 | ||
timeout = 5 | ||
interval = 6 | ||
protocol = "TCP" | ||
port = 443 | ||
|
||
# NLBs required to use same healthy and unhealthy thresholds | ||
healthy_threshold = 3 | ||
unhealthy_threshold = 3 | ||
|
||
# Interval between health checks required to be 10 or 30 | ||
interval = 10 | ||
} | ||
} | ||
|
||
# Attach controller instances to apiserver NLB | ||
resource "aws_lb_target_group_attachment" "controllers" { | ||
count = "${var.controller_count}" | ||
|
||
idle_timeout = 3600 | ||
connection_draining = true | ||
connection_draining_timeout = 300 | ||
target_group_arn = "${aws_lb_target_group.controllers.arn}" | ||
target_id = "${element(aws_instance.controllers.*.private_ip, count.index)}" | ||
port = 443 | ||
} |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -17,7 +17,7 @@ resource "aws_internet_gateway" "gateway" { | |
tags = "${map("Name", "${var.cluster_name}")}" | ||
} | ||
|
||
resource "aws_route_table" "default" { | ||
resource "aws_route_table" "public" { | ||
vpc_id = "${aws_vpc.network.id}" | ||
|
||
route { | ||
|
@@ -30,7 +30,7 @@ resource "aws_route_table" "default" { | |
gateway_id = "${aws_internet_gateway.gateway.id}" | ||
} | ||
|
||
tags = "${map("Name", "${var.cluster_name}")}" | ||
tags = "${map("Name", "${var.cluster_name}-public")}" | ||
} | ||
|
||
# Subnets (one per availability zone) | ||
|
@@ -52,6 +52,60 @@ resource "aws_subnet" "public" { | |
resource "aws_route_table_association" "public" { | ||
count = "${length(data.aws_availability_zones.all.names)}" | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. define There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is repeated a few times and I'm looking to minimize the diff here. Similar to another comment I'd like to keep private networking and general refactoring/modifications to typhoon separate. |
||
|
||
route_table_id = "${aws_route_table.default.id}" | ||
route_table_id = "${aws_route_table.public.id}" | ||
subnet_id = "${element(aws_subnet.public.*.id, count.index)}" | ||
} | ||
|
||
resource "aws_subnet" "private" { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @hulbert please review as well |
||
count = "${length(data.aws_availability_zones.all.names)}" | ||
|
||
vpc_id = "${aws_vpc.network.id}" | ||
availability_zone = "${data.aws_availability_zones.all.names[count.index]}" | ||
|
||
cidr_block = "${cidrsubnet(var.host_cidr, 4, count.index + 8)}" | ||
ipv6_cidr_block = "${cidrsubnet(aws_vpc.network.ipv6_cidr_block, 8, count.index + 8)}" | ||
assign_ipv6_address_on_creation = true | ||
|
||
tags = "${map("Name", "${var.cluster_name}-private-${count.index}")}" | ||
} | ||
|
||
resource "aws_route_table" "private" { | ||
vpc_id = "${aws_vpc.network.id}" | ||
|
||
route { | ||
cidr_block = "0.0.0.0/0" | ||
nat_gateway_id = "${aws_nat_gateway.nat.id}" | ||
} | ||
|
||
route { | ||
ipv6_cidr_block = "::/0" | ||
egress_only_gateway_id = "${aws_egress_only_internet_gateway.egress_igw.id}" | ||
} | ||
|
||
tags = "${map("Name", "${var.cluster_name}-private")}" | ||
} | ||
|
||
resource "aws_route_table_association" "private" { | ||
count = "${length(data.aws_availability_zones.all.names)}" | ||
|
||
route_table_id = "${aws_route_table.private.id}" | ||
subnet_id = "${element(aws_subnet.private.*.id, count.index)}" | ||
} | ||
|
||
|
||
resource "aws_eip" "nat" { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. nat? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think you may need the EIP to explicitly depend on the IGW There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Can you expand?
Backwards (gateways use an EIP and need it allocated first). The egress gateway is actually for ipv6 and the NAT gateway is for ipv4. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
why name the elastic ip nat? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
from my understanding an epi address is accessible via the igw and not the other way around There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Oh. Because it's going to be associated with the NAT gateway. I don't need to reference it anywhere else. The NAT gateway references it as There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sorry, missing a word up there. The NAT gateway consumes the EIP. The Internet gateway allows instances in the public subnet (including the NAT) to access the internet. I believe the confusing bit is that the internet gateway provides NAT for instances in the public subnet. And the NAT gateway and egress-only internet gateway provide routing for instances within the private subnet. |
||
vpc = true | ||
} | ||
|
||
resource "aws_nat_gateway" "nat" { | ||
depends_on = [ | ||
"aws_internet_gateway.gateway", | ||
] | ||
|
||
allocation_id = "${aws_eip.nat.id}" | ||
subnet_id = "${element(aws_subnet.public.*.id, 0)}" | ||
} | ||
|
||
resource "aws_egress_only_internet_gateway" "egress_igw" { | ||
vpc_id = "${aws_vpc.network.id}" | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To avoid applying updates via AMI changes. Otherwise a new AMI means terraform will want to destroy everything and re-create it. We can (and should) automatically apply patches to existing instances instead of updating to new AMIs as a general practice.
Typhoon recommends https://github.com/coreos/container-linux-update-operator
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a decision that goes way beyond the scope of this PR, and has not been our policy up until today. Let's not get in the habit of making policy changes snuck in as implementation details in a PR.
The bastion instance is not part of the kubernetes cluster, so I don't see how the operator reference is relevant?
AMI updates are infrequent so I don't think doing a rolling update of the ASG (bringing instances down 1 by 1 and creating new ones) is bad practice, and makes infrastructure more immutable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More info, per our discussion just now. This is consistent with how the controllers and workers are managed in typhoon. The data source will return the latest stable AMI.
Typhoon tells terraform to ignore these changes in order to not require a destroy/recreate any time there's an update. There's no way to defer these changes—any
terraform apply
will want to make them and it's easy to get be forced into recreating resources when you want to apply another change.We don't have to do in place updates either. We can:
Like some of the other things you've mentioned, totally valid, but I think we should land private networking and come back to it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CloudFormation provides additional functionality for updating autoscaling groups (including rolling updates). Terraform chose not to try to replicate this, more here: hashicorp/terraform#1552
One option (probably mentioned in that thread somewhere) is to use terraform to create a CloudFormation stack solely to take advantage of
UpdatePolicy
for ASGs.