Skip to content

Commit

Permalink
Merge branch 'awslabs:main' into mlflow
Browse files Browse the repository at this point in the history
  • Loading branch information
ovaleanu authored Dec 12, 2023
2 parents 9a61a73 + 241a241 commit 3867a89
Show file tree
Hide file tree
Showing 214 changed files with 11,013 additions and 4,071 deletions.
4 changes: 2 additions & 2 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
repos:
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v4.4.0
rev: v4.5.0
hooks:
- id: trailing-whitespace
args: ['--markdown-linebreak-ext=md']
Expand All @@ -10,7 +10,7 @@ repos:
- id: detect-aws-credentials
args: ['--allow-missing-credentials']
- repo: https://github.com/antonbabenko/pre-commit-terraform
rev: v1.81.0
rev: v1.83.5
hooks:
- id: terraform_fmt
- id: terraform_docs
Expand Down
2 changes: 2 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
![Data on EKS](website/static/img/doeks-logo-green.png)
# [Data on Amazon EKS (DoEKS)](https://awslabs.github.io/data-on-eks/)
(pronounce Do.eks)


[![plan-examples](https://github.com/awslabs/data-on-eks/actions/workflows/plan-examples.yml/badge.svg?branch=main)](https://github.com/awslabs/data-on-eks/actions/workflows/plan-examples.yml)

Expand Down
1 change: 1 addition & 0 deletions ai-ml/emr-spark-rapids/eks.tf
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ module "eks" {
cluster_name = local.name
cluster_version = var.eks_cluster_version

#WARNING: Avoid using this option (cluster_endpoint_public_access = true) in preprod or prod accounts. This feature is designed for sandbox accounts, simplifying cluster deployment and testing.
cluster_endpoint_public_access = true # if true, Your cluster API server is accessible from the internet. You can, optionally, limit the CIDR blocks that can access the public endpoint.

vpc_id = module.vpc.vpc_id
Expand Down
62 changes: 62 additions & 0 deletions ai-ml/jark-stack/terraform/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
# JupyterHub, Argo, Ray, Kubernetes

Docs coming soon...

<!-- BEGINNING OF PRE-COMMIT-TERRAFORM DOCS HOOK -->
## Requirements

| Name | Version |
|------|---------|
| <a name="requirement_terraform"></a> [terraform](#requirement\_terraform) | >= 1.0.0 |
| <a name="requirement_aws"></a> [aws](#requirement\_aws) | >= 3.72 |
| <a name="requirement_helm"></a> [helm](#requirement\_helm) | >= 2.4.1 |
| <a name="requirement_http"></a> [http](#requirement\_http) | >= 3.3 |
| <a name="requirement_kubectl"></a> [kubectl](#requirement\_kubectl) | >= 1.14 |
| <a name="requirement_kubernetes"></a> [kubernetes](#requirement\_kubernetes) | >= 2.10 |
| <a name="requirement_random"></a> [random](#requirement\_random) | >= 3.1 |

## Providers

| Name | Version |
|------|---------|
| <a name="provider_aws"></a> [aws](#provider\_aws) | >= 3.72 |
| <a name="provider_kubernetes"></a> [kubernetes](#provider\_kubernetes) | >= 2.10 |

## Modules

| Name | Source | Version |
|------|--------|---------|
| <a name="module_data_addons"></a> [data\_addons](#module\_data\_addons) | aws-ia/eks-data-addons/aws | ~> 1.1 |
| <a name="module_ebs_csi_driver_irsa"></a> [ebs\_csi\_driver\_irsa](#module\_ebs\_csi\_driver\_irsa) | terraform-aws-modules/iam/aws//modules/iam-role-for-service-accounts-eks | ~> 5.20 |
| <a name="module_eks"></a> [eks](#module\_eks) | terraform-aws-modules/eks/aws | ~> 19.15 |
| <a name="module_eks_blueprints_addons"></a> [eks\_blueprints\_addons](#module\_eks\_blueprints\_addons) | aws-ia/eks-blueprints-addons/aws | ~> 1.2 |
| <a name="module_vpc"></a> [vpc](#module\_vpc) | terraform-aws-modules/vpc/aws | ~> 5.0 |

## Resources

| Name | Type |
|------|------|
| [kubernetes_annotations.disable_gp2](https://registry.terraform.io/providers/hashicorp/kubernetes/latest/docs/resources/annotations) | resource |
| [kubernetes_config_map_v1.notebook](https://registry.terraform.io/providers/hashicorp/kubernetes/latest/docs/resources/config_map_v1) | resource |
| [kubernetes_namespace_v1.jupyterhub](https://registry.terraform.io/providers/hashicorp/kubernetes/latest/docs/resources/namespace_v1) | resource |
| [kubernetes_secret_v1.huggingface_token](https://registry.terraform.io/providers/hashicorp/kubernetes/latest/docs/resources/secret_v1) | resource |
| [kubernetes_storage_class.default_gp3](https://registry.terraform.io/providers/hashicorp/kubernetes/latest/docs/resources/storage_class) | resource |
| [aws_eks_cluster_auth.this](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/data-sources/eks_cluster_auth) | data source |

## Inputs

| Name | Description | Type | Default | Required |
|------|-------------|------|---------|:--------:|
| <a name="input_eks_cluster_version"></a> [eks\_cluster\_version](#input\_eks\_cluster\_version) | EKS Cluster version | `string` | `"1.27"` | no |
| <a name="input_huggingface_token"></a> [huggingface\_token](#input\_huggingface\_token) | Hugging Face Secret Token | `string` | `"DUMMY_TOKEN_REPLACE_ME"` | no |
| <a name="input_name"></a> [name](#input\_name) | Name of the VPC and EKS Cluster | `string` | `"jark-stack"` | no |
| <a name="input_region"></a> [region](#input\_region) | region | `string` | `"us-west-2"` | no |
| <a name="input_secondary_cidr_blocks"></a> [secondary\_cidr\_blocks](#input\_secondary\_cidr\_blocks) | Secondary CIDR blocks to be attached to VPC | `list(string)` | <pre>[<br> "100.64.0.0/16"<br>]</pre> | no |
| <a name="input_vpc_cidr"></a> [vpc\_cidr](#input\_vpc\_cidr) | VPC CIDR. This should be a valid private (RFC 1918) CIDR range | `string` | `"10.1.0.0/21"` | no |

## Outputs

| Name | Description |
|------|-------------|
| <a name="output_configure_kubectl"></a> [configure\_kubectl](#output\_configure\_kubectl) | Configure kubectl: make sure you're logged in with the correct AWS profile and run the following command to update your kubeconfig |
<!-- END OF PRE-COMMIT-TERRAFORM DOCS HOOK -->
186 changes: 186 additions & 0 deletions ai-ml/jark-stack/terraform/addons.tf
Original file line number Diff line number Diff line change
@@ -0,0 +1,186 @@
#---------------------------------------------------------------
# GP3 Encrypted Storage Class
#---------------------------------------------------------------
resource "kubernetes_annotations" "disable_gp2" {
annotations = {
"storageclass.kubernetes.io/is-default-class" : "false"
}
api_version = "storage.k8s.io/v1"
kind = "StorageClass"
metadata {
name = "gp2"
}
force = true

depends_on = [module.eks.eks_cluster_id]
}

resource "kubernetes_storage_class" "default_gp3" {
metadata {
name = "gp3"
annotations = {
"storageclass.kubernetes.io/is-default-class" : "true"
}
}

storage_provisioner = "ebs.csi.aws.com"
reclaim_policy = "Delete"
allow_volume_expansion = true
volume_binding_mode = "WaitForFirstConsumer"
parameters = {
fsType = "ext4"
encrypted = true
type = "gp3"
}

depends_on = [kubernetes_annotations.disable_gp2]
}

#---------------------------------------------------------------
# IRSA for EBS CSI Driver
#---------------------------------------------------------------
module "ebs_csi_driver_irsa" {
source = "terraform-aws-modules/iam/aws//modules/iam-role-for-service-accounts-eks"
version = "~> 5.20"
role_name_prefix = format("%s-%s-", local.name, "ebs-csi-driver")
attach_ebs_csi_policy = true
oidc_providers = {
main = {
provider_arn = module.eks.oidc_provider_arn
namespace_service_accounts = ["kube-system:ebs-csi-controller-sa"]
}
}
tags = local.tags
}

#---------------------------------------------------------------
# EKS Blueprints Addons
#---------------------------------------------------------------
module "eks_blueprints_addons" {
source = "aws-ia/eks-blueprints-addons/aws"
version = "~> 1.2"

cluster_name = module.eks.cluster_name
cluster_endpoint = module.eks.cluster_endpoint
cluster_version = module.eks.cluster_version
oidc_provider_arn = module.eks.oidc_provider_arn

#---------------------------------------
# Amazon EKS Managed Add-ons
#---------------------------------------
eks_addons = {
aws-ebs-csi-driver = {
service_account_role_arn = module.ebs_csi_driver_irsa.iam_role_arn
}
coredns = {
preserve = true
}
kube-proxy = {
preserve = true
}
# VPC CNI uses worker node IAM role policies
vpc-cni = {
preserve = true
}
}

#---------------------------------------
# AWS Load Balancer Controller Add-on
#---------------------------------------
enable_aws_load_balancer_controller = true
# turn off the mutating webhook for services because we are using
# service.beta.kubernetes.io/aws-load-balancer-type: external
aws_load_balancer_controller = {
set = [{
name = "enableServiceMutatorWebhook"
value = "false"
}]
}

#---------------------------------------
# Ingress Nginx Add-on
#---------------------------------------
enable_ingress_nginx = true
ingress_nginx = {
values = [templatefile("${path.module}/helm-values/ingress-nginx-values.yaml", {})]
}

helm_releases = {
#---------------------------------------
# NVIDIA Device Plugin Add-on
#---------------------------------------
nvidia-device-plugin = {
description = "A Helm chart for NVIDIA Device Plugin"
namespace = "nvidia-device-plugin"
create_namespace = true
chart = "nvidia-device-plugin"
chart_version = "0.14.0"
repository = "https://nvidia.github.io/k8s-device-plugin"
values = [file("${path.module}/helm-values/nvidia-values.yaml")]
}
}
}

#---------------------------------------------------------------
# Data on EKS Kubernetes Addons
#---------------------------------------------------------------
module "data_addons" {
source = "aws-ia/eks-data-addons/aws"
version = "~> 1.1" # ensure to update this to the latest/desired version

oidc_provider_arn = module.eks.oidc_provider_arn

#---------------------------------------------------------------
# JupyterHub Add-on
#---------------------------------------------------------------
enable_jupyterhub = true
jupyterhub_helm_config = {
namespace = kubernetes_namespace_v1.jupyterhub.id
create_namespace = false
values = [file("${path.module}/helm-values/jupyterhub-values.yaml")]
}

#---------------------------------------------------------------
# KubeRay Operator Add-on
#---------------------------------------------------------------
enable_kuberay_operator = true

depends_on = [
kubernetes_secret_v1.huggingface_token,
kubernetes_config_map_v1.notebook
]
}


#---------------------------------------------------------------
# Additional Resources
#---------------------------------------------------------------

resource "kubernetes_namespace_v1" "jupyterhub" {
metadata {
name = "jupyterhub"
}
}


resource "kubernetes_secret_v1" "huggingface_token" {
metadata {
name = "hf-token"
namespace = kubernetes_namespace_v1.jupyterhub.id
}

data = {
token = var.huggingface_token
}
}

resource "kubernetes_config_map_v1" "notebook" {
metadata {
name = "notebook"
namespace = kubernetes_namespace_v1.jupyterhub.id
}

data = {
"dogbooth.ipynb" = file("${path.module}/src/notebook/dogbooth.ipynb")
}
}
74 changes: 74 additions & 0 deletions ai-ml/jark-stack/terraform/cleanup.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,74 @@
#!/bin/bash

read -p "Enter the region: " region
export AWS_DEFAULT_REGION=$region

echo "Destroying RayService..."

# Delete the Ingress/SVC before removing the addons
TMPFILE=$(mktemp)
terraform -chdir=$SCRIPTDIR output -raw configure_kubectl > "$TMPFILE"
# check if TMPFILE contains the string "No outputs found"
if [[ ! $(cat $TMPFILE) == *"No outputs found"* ]]; then
echo "No outputs found, skipping kubectl delete"
source "$TMPFILE"
kubectl delete -f src/service/ray-service.yaml
fi


# List of Terraform modules to apply in sequence
targets=(
"module.data_addons"
"module.eks_blueprints_addons"
"module.eks"
"module.vpc"
)

# Destroy modules in sequence
for target in "${targets[@]}"
do
echo "Destroying module $target..."
destroy_output=$(terraform destroy -target="$target" -var="region=$region" -auto-approve 2>&1 | tee /dev/tty)
if [[ ${PIPESTATUS[0]} -eq 0 && $destroy_output == *"Destroy complete"* ]]; then
echo "SUCCESS: Terraform destroy of $target completed successfully"
else
echo "FAILED: Terraform destroy of $target failed"
exit 1
fi
done

echo "Destroying Load Balancers..."

for arn in $(aws resourcegroupstaggingapi get-resources \
--resource-type-filters elasticloadbalancing:loadbalancer \
--tag-filters "Key=elbv2.k8s.aws/cluster,Values=jark-stack" \
--query 'ResourceTagMappingList[].ResourceARN' \
--output text); do \
aws elbv2 delete-load-balancer --load-balancer-arn "$arn"; \
done

echo "Destroying Target Groups..."
for arn in $(aws resourcegroupstaggingapi get-resources \
--resource-type-filters elasticloadbalancing:targetgroup \
--tag-filters "Key=elbv2.k8s.aws/cluster,Values=jark-stack" \
--query 'ResourceTagMappingList[].ResourceARN' \
--output text); do \
aws elbv2 delete-target-group --target-group-arn "$arn"; \
done

echo "Destroying Security Groups..."
for sg in $(aws ec2 describe-security-groups \
--filters "Name=tag:elbv2.k8s.aws/cluster,Values=jark-stack" \
--query 'SecurityGroups[].GroupId' --output text); do \
aws ec2 delete-security-group --group-id "$sg"; \
done

## Final destroy to catch any remaining resources
echo "Destroying remaining resources..."
destroy_output=$(terraform destroy -var="region=$region"-auto-approve 2>&1 | tee /dev/tty)
if [[ ${PIPESTATUS[0]} -eq 0 && $destroy_output == *"Destroy complete"* ]]; then
echo "SUCCESS: Terraform destroy of all modules completed successfully"
else
echo "FAILED: Terraform destroy of all modules failed"
exit 1
fi
Loading

0 comments on commit 3867a89

Please sign in to comment.