-
Notifications
You must be signed in to change notification settings - Fork 0
Add a bastion host and place instances + kube apiserver behind it #1
Conversation
* it's the equivalent of a nat gateway, but for ipv6 * a nat gw will return an error if used as an ipv6 route
Can't use NLB because: * During bootstrap a single instance is in service * NLB cannot direct traffic to the originating instance * It preserves IPs Can't use ALB because: * Health check is HTTP/HTTPS * apiserver will reject health checks w/ 401 (unauthorized) * kubernetes/kubernetes#43784
Here is the subnet layout:
Changes still coming:
|
@eladidan Complete with a load balancer/ASG now, with |
* copy-secrets hangs for ~1 min waiting for an instance * it succeeds but it's more clear to wait for the asg before trying
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bendrucker very superficail review. Let's meet and discuss this.
I'm concerned about how we're going to manage keys and users, and want to make sure we don't regress in terms of functionality with what we have with tunnelbox, though do not want to add tons of scope here - I'd like to align on long term vision here.
lifecycle { | ||
# override the default destroy and replace update behavior | ||
create_before_destroy = true | ||
ignore_changes = ["image_id"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To avoid applying updates via AMI changes. Otherwise a new AMI means terraform will want to destroy everything and re-create it. We can (and should) automatically apply patches to existing instances instead of updating to new AMIs as a general practice.
Typhoon recommends https://github.com/coreos/container-linux-update-operator
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can (and should) automatically apply patches to existing instances instead of updating to new AMIs as a general practice.
this is a decision that goes way beyond the scope of this PR, and has not been our policy up until today. Let's not get in the habit of making policy changes snuck in as implementation details in a PR.
The bastion instance is not part of the kubernetes cluster, so I don't see how the operator reference is relevant?
AMI updates are infrequent so I don't think doing a rolling update of the ASG (bringing instances down 1 by 1 and creating new ones) is bad practice, and makes infrastructure more immutable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More info, per our discussion just now. This is consistent with how the controllers and workers are managed in typhoon. The data source will return the latest stable AMI.
Typhoon tells terraform to ignore these changes in order to not require a destroy/recreate any time there's an update. There's no way to defer these changes—any terraform apply
will want to make them and it's easy to get be forced into recreating resources when you want to apply another change.
We don't have to do in place updates either. We can:
- Deploy a new cluster when we want to use new AMIs. Changes are only ignored for the given instance of a module.
- Create a variable (map) containing AMI IDs per region and then updating that manually. That's the most consistent with aether/CloudFormation.
Like some of the other things you've mentioned, totally valid, but I think we should land private networking and come back to it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think doing a rolling update of the ASG (bringing instances down 1 by 1 and creating new ones) is bad practice
CloudFormation provides additional functionality for updating autoscaling groups (including rolling updates). Terraform chose not to try to replicate this, more here: hashicorp/terraform#1552
One option (probably mentioned in that thread somewhere) is to use terraform to create a CloudFormation stack solely to take advantage of UpdatePolicy
for ASGs.
|
||
lifecycle { | ||
# override the default destroy and replace update behavior | ||
create_before_destroy = true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what does this mean?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
create_before_destroy
overrides the default terraform lifecycle for resource replacement. Normally terraform will destroy the old resource and then create a new one. Setting that attribute means it will instead launch the replacement resource before destroying the previous one.
In this case, a change requiring replacement of the ASG will be rolled out by creating a new ASG and then destroying the old one once the new one is created.
} | ||
|
||
resource "aws_launch_configuration" "bastion" { | ||
image_id = "${data.aws_ami.coreos.image_id}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since this is not running kubernetes or docker, maybe use AWS linux AMI?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That means using different entirely different configuration strategy than the other machines. Most importantly I copied the container linux configs from controllers/workers. Configuring AWS linux is different enough that I really don't think it's worth it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do.
We would want different auditing tools, and different dependencies installed on bastion host.
I understand the extra scope here, so willing to push back to different PR, but since purpose of this instance is vastly different, and is publically accessible over ssh (via NLB) then we should specialize the instance to that purpose
cidr_blocks = ["0.0.0.0/0"] | ||
} | ||
|
||
egress { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
needed? isn't this default?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In AWS yes, but terraform overrides that (which I prefer)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good to know, thanks
name = "${format("bastion.%s.%s.", var.cluster_name, var.dns_zone)}" | ||
type = "A" | ||
|
||
# AWS recommends their special "alias" records for ELBs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
modify comment, LB not ELB
subnet_id = "${element(aws_subnet.public.*.id, count.index)}" | ||
} | ||
|
||
resource "aws_subnet" "private" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hulbert please review as well
@@ -52,6 +52,60 @@ resource "aws_subnet" "public" { | |||
resource "aws_route_table_association" "public" { | |||
count = "${length(data.aws_availability_zones.all.names)}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
define ${length(data.aws_availability_zones.all.names)}
as local
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is repeated a few times and I'm looking to minimize the diff here. Similar to another comment I'd like to keep private networking and general refactoring/modifications to typhoon separate.
instance_port = 443 | ||
instance_protocol = "tcp" | ||
# Network Load Balancer for apiservers | ||
resource "aws_lb" "apiserver" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"apiserver" could create a lot of confusion with our actual API. Let's maybe rename to "k8s_apiserver" ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Definitely possible but I think we need to save this discussion. I'd rather wait on changing other bits of typhoon that don't have to do with private networking if we have concerns about naming confusion.
} | ||
|
||
|
||
resource "aws_eip" "nat" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nat?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you may need the EIP to explicitly depend on the IGW
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nat?
Can you expand?
I think you may need the EIP to explicitly depend on the IGW
Backwards (gateways use an EIP and need it allocated first). The egress gateway is actually for ipv6 and the NAT gateway is for ipv4.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you expand?
why name the elastic ip nat?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Backwards (gateways use an EIP and need it allocated first). The egress gateway is actually for ipv6 and the NAT gateway is for ipv4.
from my understanding an epi address is accessible via the igw and not the other way around
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh. Because it's going to be associated with the NAT gateway. I don't need to reference it anywhere else. The NAT gateway references it as allocation_id
and that "claims" it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, missing a word up there. The NAT gateway consumes the EIP. The Internet gateway allows instances in the public subnet (including the NAT) to access the internet.
I believe the confusing bit is that the internet gateway provides NAT for instances in the public subnet. And the NAT gateway and egress-only internet gateway provide routing for instances within the private subnet.
|
||
resource "aws_egress_only_internet_gateway" "egress_igw" { | ||
vpc_id = "${aws_vpc.network.id}" | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
newline
I'd love to have tunnelbox provide the primary day to day egress for engineers connecting to the cluster. Definitely a discussion for tomorrow. Tunnelbox works great for port forwarding but it doesn't seem straightforward to use it as a jump host for SSH. I think that would mean handling SSH certificate auth on the instances and getting replicating the complicated interactions with vault that tunnelbox relies on. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bendrucker this looks awesome.
Assuming that the network diagram will go in separate PR, this looks gtg!
* using indented arrays means the template call means matching indentation
@eladidan Tested multiple keys and caught a bug there. The yaml file is indented so joining over |
This adds private networking to our typhoon fork. It makes the following changes relative to upstream typhoon:
This change set is the minimum diff from upstream to achieve private networking. Next steps:
0.0.0.0/0
. We can restrict them to our VPC CIDR block.