Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for Elastic Fabric Adapter #1593

Closed
iankouls-aws opened this issue May 9, 2023 · 0 comments · Fixed by #1594
Closed

Add support for Elastic Fabric Adapter #1593

iankouls-aws opened this issue May 9, 2023 · 0 comments · Fixed by #1594

Comments

@iankouls-aws
Copy link
Contributor

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or other comments that do not add relevant new information or questions, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

What is the outcome that you are trying to reach?

Explain how to create AWS EKS clusters with Elastic Fabric Adapter enabled, using Terraform.

Describe the solution you would like

Since Elastic Fabric Adapter is and AWS-specific network interface, share a clear example of using AWS EKS Blueprints to provision a cluster with a node group that has EFA enabled and its proper operation can be verified. This solution should be well-documented and reusable for use-cases like distributed model training and HPC jobs, that require high-performance networking.

Describe alternatives you have considered

As an alternative, I considered using the example of self-manged node group, provided in the terraform-aws-eks repository here: https://github.com/terraform-aws-modules/terraform-aws-eks/blob/master/examples/self_managed_node_group/main.tf#L149
That example is limited and does not satisfy the requirements of the solution described above.

Additional context

In order to function properly, Elastic Fabric Adapter requires the following:

  1. EFA must be enabled through the launch template of the EC2 instance where the adapter is attached
  2. The EFA software must be installed on the EC2 instance
  3. The EC2 instance security group must allow all incoming and outgoing traffic to itself
  4. When multiple EC2 instances are used they must all be within the same availability zone
  5. For best performance, all EC2 instances that communicate with each other via EFA, should be clustered together using a Placement Group.

An AWS EKS Blueprints template that satisfies all of these requirements will simplify the provisioning of clusters with high-performance networking capabilities.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant