-
Notifications
You must be signed in to change notification settings - Fork 45
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #480 from sauronalexander/main
Eureka module deployment for eureka
- Loading branch information
Showing
34 changed files
with
1,276 additions
and
19 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,18 @@ | ||
### Description | ||
This deployment deploys the modules for blog post: How to expansively train Robot Learning on AWS leveraging rewards functions generated by LLM | ||
|
||
In summary, it deploys the following components: | ||
- Networking | ||
- Creates a new VPC and public/private subnets to host EKS cluster and FSx | ||
- Bucket | ||
- A data bucket used to store input/output | ||
- EKS cluster | ||
- The core component which is used to schedule and deploy training/simulation workloads | ||
- FSx | ||
- The external hard drive for training. They will be mounted to training containers. The data will be synced to S3. | ||
- ECR | ||
- This deployment will deploy two ECRs. One to store robotic training image and one to store DCV (high performance remote desktop streaming tool). | ||
- DCV components | ||
- This includes building an DCV image and K8S resources which will stream simulation/training applications running in EKS to local dev environment. | ||
- Eureka | ||
- This is the core component to train robotic simulations, which sets up the correct permission to control job running and talk to LLMs. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
name: eureka-data-bucket | ||
path: git::https://github.com/awslabs/idf-modules.git//modules/storage/buckets?ref=release/1.6.0 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,109 @@ | ||
name: eks | ||
|
||
path: git::https://github.com/awslabs/idf-modules.git//modules/compute/eks?ref=release/1.6.0&depth=1 | ||
dataFiles: | ||
- filePath: git::https://github.com/awslabs/idf-modules.git//data/eks_dockerimage-replication/versions/1.29.yaml?ref=release/1.6.0 | ||
- filePath: git::https://github.com/awslabs/idf-modules.git//data/eks_dockerimage-replication/versions/default.yaml?ref=release/1.6.0 | ||
parameters: | ||
- name: vpc-id | ||
valueFrom: | ||
moduleMetadata: | ||
group: optionals | ||
name: networking | ||
key: VpcId | ||
- name: controlplane-subnet-ids | ||
valueFrom: | ||
moduleMetadata: | ||
group: optionals | ||
name: networking | ||
key: PrivateSubnetIds | ||
- name: dataplane-subnet-ids | ||
valueFrom: | ||
moduleMetadata: | ||
group: optionals | ||
name: networking | ||
key: PrivateSubnetIds | ||
- name: eks-admin-role-name | ||
value: Admin | ||
- name: eks-poweruser-role-name | ||
value: PowerUser | ||
- name: eks-read-only-role-name | ||
value: ReadOnly | ||
- name: eks-version | ||
value: "1.29" | ||
- name: eks-compute | ||
value: | ||
eks_nodegroup_config: | ||
- eks_ng_name: ng-gpu | ||
eks_node_quantity: 2 | ||
eks_node_max_quantity: 4 | ||
eks_node_min_quantity: 2 | ||
eks_node_disk_size: 100 | ||
eks_node_instance_type: "g5.2xlarge" | ||
use_gpu_ami: True | ||
eks_node_labels: | ||
usage: gpu | ||
eks_node_spot: False | ||
eks_secrets_envelope_encryption: False | ||
eks_api_endpoint_private: False | ||
- name: eks-addons | ||
value: | ||
deploy_aws_lb_controller: True # We deploy it unless set to False | ||
deploy_external_dns: False # We deploy it unless set to False | ||
deploy_aws_ebs_csi: False # We deploy it unless set to False | ||
deploy_aws_efs_csi: False # We deploy it unless set to False | ||
deploy_aws_fsx_csi: True # We deploy it unless set to False | ||
deploy_cluster_autoscaler: False # We deploy it unless set to False | ||
deploy_metrics_server: True # We deploy it unless set to False | ||
deploy_secretsmanager_csi: False # We deploy it unless set to False | ||
deploy_external_secrets: False | ||
deploy_cloudwatch_container_insights_metrics: True # We deploy it unless set to False | ||
deploy_cloudwatch_container_insights_logs: True | ||
cloudwatch_container_insights_logs_retention_days: 7 | ||
deploy_adot: False | ||
deploy_amp: False | ||
deploy_grafana_for_amp: False | ||
deploy_kured: False | ||
deploy_calico: False | ||
deploy_nginx_controller: | ||
value: False | ||
nginx_additional_annotations: | ||
nginx.ingress.kubernetes.io/whitelist-source-range: "100.64.0.0/10,10.0.0.0/8" | ||
deploy_kyverno: | ||
value: False | ||
kyverno_policies: | ||
validate: | ||
- block-ephemeral-containers | ||
- block-stale-images | ||
- block-updates-deletes | ||
- check-deprecated-apis | ||
- disallow-cri-sock-mount | ||
- disallow-custom-snippets | ||
- disallow-empty-ingress-host | ||
- disallow-helm-tiller | ||
- disallow-latest-tag | ||
- disallow-localhost-services | ||
- disallow-secrets-from-env-vars | ||
- ensure-probes-different | ||
- ingress-host-match-tls | ||
- limit-hostpath-vols | ||
- prevent-naked-pods | ||
- require-drop-cap-net-raw | ||
- require-emptydir-requests-limits | ||
- require-labels | ||
- require-pod-requests-limits | ||
- require-probes | ||
- restrict-annotations | ||
- restrict-automount-sa-token | ||
- restrict-binding-clusteradmin | ||
- restrict-clusterrole-nodesproxy | ||
- restrict-escalation-verbs-roles | ||
- restrict-ingress-classes | ||
- restrict-ingress-defaultbackend | ||
- restrict-node-selection | ||
- restrict-path | ||
- restrict-service-external-ips | ||
- restrict-wildcard-resources | ||
- restrict-wildcard-verbs | ||
- unique-ingress-host-and-path | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,49 @@ | ||
name: dcv-eks | ||
path: modules/visualization/dcv-eks | ||
parameters: | ||
- name: dcv-namespace | ||
value: dcv | ||
- name: dcv-nodeport | ||
value: 31980 | ||
- name: dcv-image-uri | ||
valueFrom: | ||
moduleMetadata: | ||
group: dcv-image | ||
name: dcv-image | ||
key: DCVImageUri | ||
- name: eks-cluster-admin-role-arn | ||
valueFrom: | ||
moduleMetadata: | ||
group: core | ||
name: eks | ||
key: EksClusterAdminRoleArn | ||
- name: eks-cluster-name | ||
valueFrom: | ||
moduleMetadata: | ||
group: core | ||
name: eks | ||
key: EksClusterName | ||
- name: eks-oidc-arn | ||
valueFrom: | ||
moduleMetadata: | ||
group: core | ||
name: eks | ||
key: EksOidcArn | ||
- name: eks-cluster-open-id-connect-issuer | ||
valueFrom: | ||
moduleMetadata: | ||
group: core | ||
name: eks | ||
key: EksClusterOpenIdConnectIssuer | ||
- name: eks-cluster-security-group-id | ||
valueFrom: | ||
moduleMetadata: | ||
group: core | ||
name: eks | ||
key: EksClusterSecurityGroupId | ||
- name: eks-node-role-arn | ||
valueFrom: | ||
moduleMetadata: | ||
group: core | ||
name: eks | ||
key: EksNodeRoleArn |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
name: dcv-image | ||
path: modules/visualization/dcv-image | ||
parameters: | ||
- name: dcv-ecr-repository-name | ||
valueFrom: | ||
moduleMetadata: | ||
group: storage | ||
name: dcv | ||
key: EcrRepositoryName |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,28 @@ | ||
name: robotic-training-on-eks | ||
toolchainRegion: us-east-1 | ||
groups: | ||
- name: optionals | ||
path: manifests/robotic-training-on-eks/optionals.yaml | ||
- name: buckets | ||
path: manifests/robotic-training-on-eks/buckets.yaml | ||
- name: core | ||
path: manifests/robotic-training-on-eks/core-modules.yaml | ||
- name: storage | ||
path: manifests/robotic-training-on-eks/storage.yaml | ||
- name: dcv-image | ||
path: manifests/robotic-training-on-eks/dcv-image.yaml | ||
- name: dcv-eks | ||
path: manifests/robotic-training-on-eks/dcv-eks.yaml | ||
- name: eureka | ||
path: manifests/robotic-training-on-eks/eureka.yaml | ||
targetAccountMappings: | ||
- alias: primary | ||
accountId: | ||
valueFrom: | ||
envVariable: PRIMARY_ACCOUNT | ||
default: true | ||
parametersGlobal: | ||
dockerCredentialsSecret: aws-addf-docker-credentials | ||
regionMappings: | ||
- region: us-east-1 | ||
default: true |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,53 @@ | ||
name: eureka | ||
path: modules/simulations/eureka | ||
parameters: | ||
- name: eks-cluster-admin-role-arn | ||
valueFrom: | ||
moduleMetadata: | ||
group: core | ||
name: eks | ||
key: EksClusterAdminRoleArn | ||
- name: eks-cluster-name | ||
valueFrom: | ||
moduleMetadata: | ||
group: core | ||
name: eks | ||
key: EksClusterName | ||
- name: eks-oidc-arn | ||
valueFrom: | ||
moduleMetadata: | ||
group: core | ||
name: eks | ||
key: EksOidcArn | ||
- name: eks-cluster-open-id-connect-issuer | ||
valueFrom: | ||
moduleMetadata: | ||
group: core | ||
name: eks | ||
key: EksClusterOpenIdConnectIssuer | ||
- name: application-ecr-name | ||
valueFrom: | ||
moduleMetadata: | ||
group: storage | ||
name: robotic-applications | ||
key: EcrRepositoryName | ||
- name: sqs-name | ||
value: "training-queue" | ||
- name: fsx-volume-handle | ||
valueFrom: | ||
moduleMetadata: | ||
group: storage | ||
name: fsx | ||
key: FSxLustreFileSystemId | ||
- name: fsx-mount-point | ||
valueFrom: | ||
moduleMetadata: | ||
group: storage | ||
name: fsx | ||
key: FSxLustreMountName | ||
- name: data-bucket-name | ||
valueFrom: | ||
moduleMetadata: | ||
group: buckets | ||
name: eureka-data-bucket | ||
key: ArtifactsBucketName |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
name: networking | ||
path: git::https://github.com/awslabs/idf-modules.git//modules/network/basic-cdk | ||
parameters: | ||
- name: internet-accessible | ||
value: true |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,43 @@ | ||
name: fsx | ||
path: git::https://github.com/awslabs/idf-modules.git//modules/storage/fsx-lustre?ref=release/1.6.0&depth=1 | ||
parameters: | ||
- name: vpc_id | ||
valueFrom: | ||
moduleMetadata: | ||
group: optionals | ||
name: networking | ||
key: VpcId | ||
- name: private_subnet_ids | ||
valueFrom: | ||
moduleMetadata: | ||
group: optionals | ||
name: networking | ||
key: PublicSubnetIds | ||
- name: fs_deployment_type | ||
value: PERSISTENT_2 | ||
- name: data_bucket_name | ||
valueFrom: | ||
moduleMetadata: | ||
group: buckets | ||
name: eureka-data-bucket | ||
key: ArtifactsBucketName | ||
- name: import_path | ||
valueFrom: | ||
moduleMetadata: | ||
group: buckets | ||
name: eureka-data-bucket | ||
key: ArtifactsBucketName | ||
- name: storage_throughput | ||
value: 500 | ||
--- | ||
name: robotic-applications | ||
path: git::https://github.com/awslabs/idf-modules.git//modules/storage/ecr?ref=release/1.6.0 | ||
parameter: | ||
- name: image-tag-mutability | ||
value: "MUTABLE" | ||
--- | ||
name: dcv | ||
path: git::https://github.com/awslabs/idf-modules.git//modules/storage/ecr?ref=release/1.6.0 | ||
parameter: | ||
- name: image-tag-mutability | ||
value: "MUTABLE" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,46 @@ | ||
|
||
# examples/eureka | ||
|
||
|
||
## Description | ||
This module setups the environment for running robotic training and simulation. | ||
|
||
- It creates a FSx static provisioning k8s resource | ||
- FSx is used as high performance data storage for storing training inputs/outputs | ||
- It creates an IAM role for simulation | ||
- It allows pods to assume this role to get data from s3, FSx. Playing around with LLMs in Amazon Bedrock. | ||
- It builds an application image | ||
- This will be a ROS2 image which contains necessary environment for training. | ||
- It creates a Amazon SQS message queue | ||
- The queue is used control tasks sent by controller. The task controller will send tasks configs to the message queue and workers will get data from message queue. | ||
|
||
|
||
## Inputs/Outputs | ||
|
||
### Input Paramenters | ||
|
||
#### Required | ||
- `eks-cluster-admin-role-arn` - the role which creates the eks cluster | ||
- `eks-cluster-name` - the name of the EKS cluster | ||
- `eks-oidc-arn` - full ARN of the OIDC provider | ||
- `eks-cluster-open-id-connect-issuer` - OIDC provider URI | ||
- `application-ecr-name`: the name of the ecr which will store images containing simulation/training logics | ||
- `sqs-name`: the name of the sqs we are creating | ||
- `fsx-volume-handle`: file system id from the fsx created by dependency module | ||
- `fsx-mount-point`: mount point of the fsx created by dependency module | ||
- `data-bucket-name`: the name of the bucket which stores all simulation/trianing data | ||
|
||
### Module Metadata Outputs | ||
|
||
- `IamRoleArn`: IAM Role Arn which contains necessary permissions for EKS pods to assume and run simulation/training | ||
- `ApplicationImageUri`: The application image which contains simulation/training logics and will be running in EKS | ||
- `SqsUrl`: The url of the sqs which where task controllers will enqueue and workers will dequeue | ||
|
||
#### Output Example | ||
|
||
```json | ||
{ | ||
"IamRoleArn": "arn:aws:iam::123456789012:role/addf-eureka-simulation-role", | ||
"ApplicationImageUri": "123456789012.dkr.ecr.us-west-2.amazonaws.com/robotic-applications:ubuntu-ros2", | ||
"SqsUrl": "https://sqs.us-west-2.amazonaws.com/123456789012/MyQueue" | ||
} |
Oops, something went wrong.