Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add VPC endpoints #93

Merged
merged 23 commits into from
Jan 26, 2021
Merged

Add VPC endpoints #93

merged 23 commits into from
Jan 26, 2021

Conversation

jsf9k
Copy link
Member

@jsf9k jsf9k commented Jan 21, 2021

🗣 Description

This pull request:

  • Adds several VPC endpoints for various AWS services
  • Configures the operations and private subnets to make use of said VPC endpoints
  • Shares a single VPC route table among the private subnets. Since the private subnets do not each use a different NAT gateway in a different operations subnet, there is no need for them to have separate VPC route tables.

(It would be nice to remove the NAT gateway from the operations subnet, since the private subnets no longer require it to communicate with AWS services, but they still require it to pull the Docker images from Docker Hub for the Guacamole Docker composition.)

💭 Motivation and context

Resolves #94.

🧪 Testing

I deployed these changes to env0 in our staging COOL environment and verified that the cloud-init scripts, CloudWatch agent, and SSM agent are all running without errors on all private and operations instances. I also verified that the FreeIPA initialization for the guacamole instance still works.

I also redeployed with the external route removed from the operations subnet's route table and verified that everything still worked, with the exception of the downloading of the Docker images required for the Guacamole Docker composition. This serves as an ironclad check that all AWS API calls are being made to the VPC endpoints and none are being made to the public endpoints.

✅ Checklist

  • This PR has an informative and human-readable title.
  • Changes are limited to a single goal - eschew scope creep!
  • All future TODOs are captured in issues, which are referenced in code comments.
  • All relevant type-of-change labels have been added.
  • I have read the CONTRIBUTING document.
  • These code changes follow cisagov code standards.
  • All new and existing tests pass.

jsf9k added 10 commits January 21, 2021 09:28
Now that we are no longer using a separate NAT gateway for each
private subnet, the private subnets' route tables are all the same.
* Remove egress via 443 to 0.0.0.0/0
* Allow egress via 443 to the STS and SSM security groups
* Allow egress via 443 to the S3 gateway endpoint
These are necessary for the CloudWatch Agent to work.  This agent is responsible for forwarding our system logs to CloudWatch.
This is necessary because adding the VPC endpoints to the VPCs makes
the VPC-internal DNS for sts.us-east-1.awsamazon.com, for example,
resolve to the internal IP of the VPC endpoint instead of a public AWS
IP.
The operations subnet could still access S3 via the internet gateway,
but it may as well use the gateway endpoint since it has to use the
interface endpoints for all the other AWS services.

Also rename some route-related Terraform resources to avoid collision
between operations resources and private resources.
@jsf9k jsf9k added breaking change This issue or pull request involves changes to existing functionality improvement This issue or pull request will add or improve functionality, maintainability, or ease of use labels Jan 21, 2021
@jsf9k jsf9k self-assigned this Jan 21, 2021
@jsf9k jsf9k requested a review from dav3r January 21, 2021 19:17
jsf9k added 2 commits January 21, 2021 14:33
This reverts commit 6b091f0.

I realized that the guacamole instance still needs the NAT gateway and
external web access in order to download the Docker images used in the
guacamole docker composition.
This is necessary so that the guacamole instance can download the
Docker images required for the guacamole Docker composition.
@jsf9k jsf9k marked this pull request as ready for review January 21, 2021 21:29
@jsf9k jsf9k requested a review from felddy as a code owner January 21, 2021 21:29
@jsf9k jsf9k requested review from hillaryj and mcdonnnj January 21, 2021 21:29
@dav3r
Copy link
Member

dav3r commented Jan 21, 2021

  • Removes the NAT gateway from the operations subnet, since the private subnets no longer requires it to communicate with AWS services.

The description of this PR should be amended to remove this bullet, due to 597a541.

Also, doesn't this mean that a16b36a needs to be reverted as well?

@jsf9k
Copy link
Member Author

jsf9k commented Jan 22, 2021

  • Removes the NAT gateway from the operations subnet, since the private subnets no longer requires it to communicate with AWS services.

The description of this PR should be amended to remove this bullet, due to 597a541.

Also, doesn't this mean that a16b36a needs to be reverted as well?

I amended the PR description.

The commit a16b36a does not need to be reverted. As it turns out, the assessment VPCs were never set up like the shared services VPC with N public and N private subnets spread across AZs. In that case we have each private subnet route its external traffic to the public subnet in the same AZ, so it makes sense to have separate route tables for each private subnet. In this case the idea of separate route tables was implemented but not the more important idea of cross-AZ redundancy.

Sharp eyes!

jsf9k added 4 commits January 21, 2021 22:43
…S calls

STS used to be un-regioned, like S3, but now it is regioned.  This is
the one case where boto3 _does not_ do the right thing when you set
the region.  We have to set the region-specific endpint URL manually.

This is important since the STS VPC endpoint _only_ sets a local DNS
record to override the _local region's_ public STS endpoint.  If we
don't do this then boto3 will reach out to the _global_
https://sts.amazonaws.com URL, and that DNS entry will still point to
an external IP.

See this boto/boto3#1859 for more information about boto3's perverse
behavior in the case of STS.
…STS calls

STS used to be un-regioned, like S3, but now it is regioned.  The AWS
CLI by default uses the global endpoint URL https://sts.amazonaws.com
instead of the region-specific one, even when you specify the region.
Therefore, we have to set the region-specific endpint URL manually.

This is important since the STS VPC endpoint _only_ sets a local DNS
record to override the _local region's_ public STS endpoint.  If we
don't specify the endpoint URL then the AWS CLI will reach out to the
_global_ https://sts.amazonaws.com URL, and that DNS entry will still
point to an external IP.
sts_endpoint_sg_rules.tf Outdated Show resolved Hide resolved

# Normally we would use a for_each here, but the private subnets are
# _all in the same AZ_ in this case. Why even have multiple subnets
# if you're not going to spread them around? Harumph!
Copy link
Contributor

@hillaryj hillaryj Jan 22, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also this bit from "Blazing Saddles":
Harumph

Copy link
Contributor

@hillaryj hillaryj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One suggestion which is only a comment change, and otherwise: 🏆

Copy link
Member

@dav3r dav3r left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent work and really great job on the comments! 🐎

...except for one comment that needs a slight revision.

private_routing.tf Outdated Show resolved Hide resolved
jsf9k and others added 3 commits January 22, 2021 10:27
I noticed that previously we were using the name "aws_region" in the
cloud-init templates, whereas I had used the name "region" in the code
that I added.  For consistency's sake, I changed the name in the code
that I added.
@@ -139,7 +139,7 @@ resource "aws_network_acl_rule" "private_egress_to_operations_via_vnc" {
network_acl_id = aws_network_acl.private[each.value].id
egress = true
protocol = "tcp"
rule_number = 120 + index(var.private_subnet_cidr_blocks, each.value)
rule_number = 130 + index(var.private_subnet_cidr_blocks, each.value)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧙 Seems magical. Is this counting by 10 jazz useful? Does this break if index ≥ 10?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it breaks if index >= 10. Our numbering of the ACL rules has always sucked. Do you have any thoughts as to how it can be improved?

subnets.tf Outdated Show resolved Hide resolved
Copy link
Member

@felddy felddy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wonderful. This makes a lot of sense. 💯
I had a question about preloading the Docker images into the Guac AMI so that we can remove the NAT gateway. We should discuss that. 🐳 🥑

My motto was always to keep swinging. Whether I was in a slump or feeling badly or having trouble off the field, the only thing to do was keep swinging. --Hank Aaron ⚾

Copy link
Member

@mcdonnnj mcdonnnj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a pretty big lift and some strong work 💪💪💪
I had some suggestions/questions, but I was also curious if we have a standard on capitalization of protocols? We seem to use for example https and HTTPS without clear purpose.

# https://sts.amazonaws.com URL, and that DNS entry will still point
# to an external IP.
#
# See this link for more information about boto3's perverse behavior
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

boto3's perverse behavior

🤣

operations_acl_rules.tf Outdated Show resolved Hide resolved
operations_acl_rules.tf Outdated Show resolved Hide resolved
operations_acl_rules.tf Outdated Show resolved Hide resolved
operations_acl_rules.tf Outdated Show resolved Hide resolved
private_acl_rules.tf Outdated Show resolved Hide resolved
private_acl_rules.tf Outdated Show resolved Hide resolved
vpc_endpoints.tf Show resolved Hide resolved
desktop_gateway_sg_rules.tf Outdated Show resolved Hide resolved
@jsf9k
Copy link
Member Author

jsf9k commented Jan 26, 2021

I was also curious if we have a standard on capitalization of protocols? We seem to use for example https and HTTPS without clear purpose.

We do not have such a standard, but you are welcome to propose one. Capitalization is probably the way to go.

@jsf9k jsf9k requested a review from mcdonnnj January 26, 2021 15:02
@mcdonnnj
Copy link
Member

I was also curious if we have a standard on capitalization of protocols? We seem to use for example https and HTTPS without clear purpose.

We do not have such a standard, but you are welcome to propose one. Capitalization is probably the way to go.

If I'm proposing, then yes, I would vote for defaulting to capitalization for protocols.

@jsf9k
Copy link
Member Author

jsf9k commented Jan 26, 2021

I was also curious if we have a standard on capitalization of protocols? We seem to use for example https and HTTPS without clear purpose.

We do not have such a standard, but you are welcome to propose one. Capitalization is probably the way to go.

If I'm proposing, then yes, I would vote for defaulting to capitalization for protocols.

You could add something to this long-running PR.

@jsf9k jsf9k merged commit 08cf22e into develop Jan 26, 2021
@jsf9k jsf9k deleted the improvement/vpc-endpoints branch January 26, 2021 17:35
cisagovbot pushed a commit that referenced this pull request Jan 11, 2022
…re-commit_version

Update the `ansible-lint` Version in the pre-commit Configuration
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking change This issue or pull request involves changes to existing functionality improvement This issue or pull request will add or improve functionality, maintainability, or ease of use
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Use VPC gateway and interface endpoints to remove HTTPS access and resources where possible
5 participants