You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
Please do not leave "+1" or other comments that do not add relevant new information or questions, they generate extra noise for issue followers and do not help prioritize the request
If you are interested in working on this issue or have submitted a pull request, please leave a comment
What is the outcome that you are trying to reach?
Explain how to create AWS EKS clusters with Elastic Fabric Adapter enabled, using Terraform.
Describe the solution you would like
Since Elastic Fabric Adapter is and AWS-specific network interface, share a clear example of using AWS EKS Blueprints to provision a cluster with a node group that has EFA enabled and its proper operation can be verified. This solution should be well-documented and reusable for use-cases like distributed model training and HPC jobs, that require high-performance networking.
In order to function properly, Elastic Fabric Adapter requires the following:
EFA must be enabled through the launch template of the EC2 instance where the adapter is attached
The EFA software must be installed on the EC2 instance
The EC2 instance security group must allow all incoming and outgoing traffic to itself
When multiple EC2 instances are used they must all be within the same availability zone
For best performance, all EC2 instances that communicate with each other via EFA, should be clustered together using a Placement Group.
An AWS EKS Blueprints template that satisfies all of these requirements will simplify the provisioning of clusters with high-performance networking capabilities.
The text was updated successfully, but these errors were encountered:
Community Note
What is the outcome that you are trying to reach?
Explain how to create AWS EKS clusters with Elastic Fabric Adapter enabled, using Terraform.
Describe the solution you would like
Since Elastic Fabric Adapter is and AWS-specific network interface, share a clear example of using AWS EKS Blueprints to provision a cluster with a node group that has EFA enabled and its proper operation can be verified. This solution should be well-documented and reusable for use-cases like distributed model training and HPC jobs, that require high-performance networking.
Describe alternatives you have considered
As an alternative, I considered using the example of self-manged node group, provided in the terraform-aws-eks repository here: https://github.com/terraform-aws-modules/terraform-aws-eks/blob/master/examples/self_managed_node_group/main.tf#L149
That example is limited and does not satisfy the requirements of the solution described above.
Additional context
In order to function properly, Elastic Fabric Adapter requires the following:
An AWS EKS Blueprints template that satisfies all of these requirements will simplify the provisioning of clusters with high-performance networking capabilities.
The text was updated successfully, but these errors were encountered: