Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature request: Disclose all external sites/URLs needed for deployment #1339

Closed
chrisdag opened this issue Oct 2, 2019 · 5 comments
Closed

Comments

@chrisdag
Copy link

chrisdag commented Oct 2, 2019

We run Parallelcluster at scale in hardened VPCs that are not granted unrestricted access to the internet. Each external destination needs to be documented and whitelisted in a firewall or on a proxy server.

Prior versions of cfncluster/parallelcluster just talked to APIs and pulled templates, scripts and data from s3:// so this was easy to support and configure security rules around.

But now we have deployments of the latest version failing because it looks like the bootstrap process is trying to pull artifacts or code from github and our firewall is killing those sessions. The end result is our users see frustrating rollbacks on deployment.

As a longer term feature request can we ask that some sort of doc be created that lists the required external destinations for deployment success?

And shorter term if someone could refresh my memory on what parts of the source code I should check by hand to gather a list of hosts to whitelist on our firewall that would be great. From memory I think that some of the deployment and bootstrapping scripts are not actually in this project but are in a different project?

Thanks!

@lukeseawalker
Copy link
Contributor

Hi @chrisdag,
we surely understand the needs to create a "private" cluster without public dependencies (or at least with minimal external internet access) and your requests make perfectly sense.
I'm going to tag this ticket with the enhancement label.

For the shorter term request, the nodes are provisioned using the Chef recipes that you can find here.
But instead to go through all the code, I would suggest to you another way to list the external dependencies, which is to create a cluster in a private subnet where the rule 0.0.0.0/0 is associated to a proxy instance placed in a public subnet (with internet access - rule 0.0.0.0/0 associated to a Internet Gateway). In this proxy instance, install a proxy server and look at its log during the cluster creation to discover all the called endpoints.

Thanks
L

@chrisdag
Copy link
Author

Hi @lukeseawalker I finally got the time to do what you suggested -- I stood up a Squid logging proxy in a public subnet and deployed v2.4.1 through it and captured all of the logs.

The short summary is that If I strip out all of the OS patching, EPEL and Python PyPi traffic the list of API and external destinations is super small right now:

  • cloudformation.<region>.amazonaws.com:443
  • autoscaling.<region>.amazonaws.com:443
  • dynamodb.<region>.amazonaws.com:443
  • ec2.<region>.amazonaws.com:443
  • queue.amazonaws.com:443
  • s3.amazonaws.com/ec2metadata/ec2-metadata:80
  • <region>-aws-parallelcluster.s3.amazonaws.com:443
  • github.com:443

I'm able to ignore all the Patch, OS, Update and PyPi traffic because we use pre_install bootstrap scripts to override all software repos anyway with links that point to internal private mirrors or Nexus Repository managers (for PyPi).

From our perspective the offending entry was github.com as our firewall was clearly blocking that -- that ended up being the only destination responsible for our rollback and deploy failures.

And other than that the only unusual thing to see was the request to the ec2-metadata link was a pure HTTP request made over TCP:80 -- one of the few non-HTTPS connections to an amazon operated destination.

I wrote a much longer blog post about this and have the full squid logs available over at https://bioteam.net/2019/11/aws-parallelcluster-private-deployment-in-hardened-vpcs/

@sean-smith
Copy link
Contributor

@chrisdag Really enjoyed reading your blogpost. Excellent work!

Btw the github call comes from pyenv, it appears we're calling it at runtime even though it's packaged with the ami. See https://github.com/aws/aws-parallelcluster-cookbook/blob/develop/resources/install_pyenv.rb#L20 Sorry about that!

We owe you some 🍻

@lukeseawalker
Copy link
Contributor

Hi @chrisdag,
thank you very much for the thorough investigation! It was very nice to read your blog post.

L

@enrico-usai
Copy link
Contributor

Resolving this, we enabled cluster creation in subnets with no internet access as part of 3.1.1 release and added list of VPC endpoints and instructions in official documentation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants