feature request: Disclose all external sites/URLs needed for deployment #1339

chrisdag · 2019-10-02T15:07:44Z

We run Parallelcluster at scale in hardened VPCs that are not granted unrestricted access to the internet. Each external destination needs to be documented and whitelisted in a firewall or on a proxy server.

Prior versions of cfncluster/parallelcluster just talked to APIs and pulled templates, scripts and data from s3:// so this was easy to support and configure security rules around.

But now we have deployments of the latest version failing because it looks like the bootstrap process is trying to pull artifacts or code from github and our firewall is killing those sessions. The end result is our users see frustrating rollbacks on deployment.

As a longer term feature request can we ask that some sort of doc be created that lists the required external destinations for deployment success?

And shorter term if someone could refresh my memory on what parts of the source code I should check by hand to gather a list of hosts to whitelist on our firewall that would be great. From memory I think that some of the deployment and bootstrapping scripts are not actually in this project but are in a different project?

Thanks!

lukeseawalker · 2019-10-03T13:05:16Z

Hi @chrisdag,
we surely understand the needs to create a "private" cluster without public dependencies (or at least with minimal external internet access) and your requests make perfectly sense.
I'm going to tag this ticket with the enhancement label.

For the shorter term request, the nodes are provisioned using the Chef recipes that you can find here.
But instead to go through all the code, I would suggest to you another way to list the external dependencies, which is to create a cluster in a private subnet where the rule 0.0.0.0/0 is associated to a proxy instance placed in a public subnet (with internet access - rule 0.0.0.0/0 associated to a Internet Gateway). In this proxy instance, install a proxy server and look at its log during the cluster creation to discover all the called endpoints.

Thanks
L

chrisdag · 2019-11-14T19:19:19Z

Hi @lukeseawalker I finally got the time to do what you suggested -- I stood up a Squid logging proxy in a public subnet and deployed v2.4.1 through it and captured all of the logs.

The short summary is that If I strip out all of the OS patching, EPEL and Python PyPi traffic the list of API and external destinations is super small right now:

cloudformation.<region>.amazonaws.com:443
autoscaling.<region>.amazonaws.com:443
dynamodb.<region>.amazonaws.com:443
ec2.<region>.amazonaws.com:443
queue.amazonaws.com:443
s3.amazonaws.com/ec2metadata/ec2-metadata:80
<region>-aws-parallelcluster.s3.amazonaws.com:443
github.com:443

I'm able to ignore all the Patch, OS, Update and PyPi traffic because we use pre_install bootstrap scripts to override all software repos anyway with links that point to internal private mirrors or Nexus Repository managers (for PyPi).

From our perspective the offending entry was github.com as our firewall was clearly blocking that -- that ended up being the only destination responsible for our rollback and deploy failures.

And other than that the only unusual thing to see was the request to the ec2-metadata link was a pure HTTP request made over TCP:80 -- one of the few non-HTTPS connections to an amazon operated destination.

I wrote a much longer blog post about this and have the full squid logs available over at https://bioteam.net/2019/11/aws-parallelcluster-private-deployment-in-hardened-vpcs/

sean-smith · 2019-11-14T23:22:56Z

@chrisdag Really enjoyed reading your blogpost. Excellent work!

Btw the github call comes from pyenv, it appears we're calling it at runtime even though it's packaged with the ami. See https://github.com/aws/aws-parallelcluster-cookbook/blob/develop/resources/install_pyenv.rb#L20 Sorry about that!

We owe you some 🍻

lukeseawalker · 2019-11-15T09:43:28Z

Hi @chrisdag,
thank you very much for the thorough investigation! It was very nice to read your blog post.

L

enrico-usai · 2024-05-23T14:24:20Z

Resolving this, we enabled cluster creation in subnets with no internet access as part of 3.1.1 release and added list of VPC endpoints and instructions in official documentation.

lukeseawalker added the enhancement label Oct 3, 2019

enrico-usai closed this as completed May 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feature request: Disclose all external sites/URLs needed for deployment #1339

feature request: Disclose all external sites/URLs needed for deployment #1339

chrisdag commented Oct 2, 2019

lukeseawalker commented Oct 3, 2019

chrisdag commented Nov 14, 2019

sean-smith commented Nov 14, 2019

lukeseawalker commented Nov 15, 2019

enrico-usai commented May 23, 2024

feature request: Disclose all external sites/URLs needed for deployment #1339

feature request: Disclose all external sites/URLs needed for deployment #1339

Comments

chrisdag commented Oct 2, 2019

lukeseawalker commented Oct 3, 2019

chrisdag commented Nov 14, 2019

sean-smith commented Nov 14, 2019

lukeseawalker commented Nov 15, 2019

enrico-usai commented May 23, 2024