Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve Vertex mlops blueprint #1337

Merged
merged 11 commits into from
Apr 24, 2023
Merged
Show file tree
Hide file tree
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 3 additions & 2 deletions blueprints/cloud-operations/network-dashboard/src/main.py
Original file line number Diff line number Diff line change
Expand Up @@ -80,8 +80,9 @@ def do_discovery(resources):
resources[result.type][result.id][result.key] = result.data
else:
resources[result.type][result.id] = result.data
LOGGER.info('discovery end {}'.format(
{k: len(v) for k, v in resources.items() if not isinstance(v, str)}))
LOGGER.info('discovery end {}'.format({
k: len(v) for k, v in resources.items() if not isinstance(v, str)
}))


def do_init(resources, discovery_root, monitoring_project, folders=None,
Expand Down
2 changes: 1 addition & 1 deletion blueprints/data-solutions/bq-ml/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -98,5 +98,5 @@ module "test" {
prefix = "prefix"
}

# tftest modules=9 resources=47
# tftest modules=9 resources=48
```
2 changes: 1 addition & 1 deletion blueprints/data-solutions/data-playground/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -86,5 +86,5 @@ module "test" {
parent = "folders/467898377"
}
}
# tftest modules=8 resources=40
# tftest modules=8 resources=41
```
51 changes: 20 additions & 31 deletions blueprints/data-solutions/vertex-mlops/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -52,63 +52,52 @@ This blueprint can be used as a building block for setting up an end2end ML Ops

| name | description | type | required | default |
|---|---|:---:|:---:|:---:|
| [project_id](variables.tf#L101) | Project id, references existing project if `project_create` is null. | <code>string</code> | ✓ | |
| [notebooks](variables.tf#L73) | Vertex AI workbenchs to be deployed. Service Account runtime/instances deployed. | <code title="map&#40;object&#40;&#123;&#10; type &#61; string&#10; machine_type &#61; optional&#40;string, &#34;n1-standard-4&#34;&#41;&#10; internal_ip_only &#61; optional&#40;bool, true&#41;&#10; idle_shutdown &#61; optional&#40;bool, false&#41;&#10; owner &#61; optional&#40;string&#41;&#10;&#125;&#41;&#41;">map&#40;object&#40;&#123;&#8230;&#125;&#41;&#41;</code> | ✓ | |
| [project_config](variables.tf#L100) | Provide 'billing_account_id' value if project creation is needed, uses existing 'project_id' if null. Parent is in 'folders/nnn' or 'organizations/nnn' format. | <code title="object&#40;&#123;&#10; billing_account_id &#61; optional&#40;string&#41;&#10; parent &#61; optional&#40;string&#41;&#10; project_id &#61; string&#10;&#125;&#41;">object&#40;&#123;&#8230;&#125;&#41;</code> | ✓ | |
| [bucket_name](variables.tf#L18) | GCS bucket name to store the Vertex AI artifacts. | <code>string</code> | | <code>null</code> |
| [dataset_name](variables.tf#L24) | BigQuery Dataset to store the training data. | <code>string</code> | | <code>null</code> |
| [groups](variables.tf#L30) | Name of the groups ([email protected]) to apply opinionated IAM permissions. | <code title="object&#40;&#123;&#10; gcp-ml-ds &#61; string&#10; gcp-ml-eng &#61; string&#10; gcp-ml-viewer &#61; string&#10;&#125;&#41;">object&#40;&#123;&#8230;&#125;&#41;</code> | | <code title="&#123;&#10; gcp-ml-ds &#61; null&#10; gcp-ml-eng &#61; null&#10; gcp-ml-viewer &#61; null&#10;&#125;">&#123;&#8230;&#125;</code> |
| [groups](variables.tf#L30) | Name of the groups ([email protected]) to apply opinionated IAM permissions. | <code title="object&#40;&#123;&#10; gcp-ml-ds &#61; optional&#40;string&#41;&#10; gcp-ml-eng &#61; optional&#40;string&#41;&#10; gcp-ml-viewer &#61; optional&#40;string&#41;&#10;&#125;&#41;">object&#40;&#123;&#8230;&#125;&#41;</code> | | <code title="&#123;&#10; gcp-ml-ds &#61; null&#10; gcp-ml-eng &#61; null&#10; gcp-ml-viewer &#61; null&#10;&#125;">&#123;&#8230;&#125;</code> |
| [identity_pool_claims](variables.tf#L45) | Claims to be used by Workload Identity Federation (i.e.: attribute.repository/ORGANIZATION/REPO). If a not null value is provided, then google_iam_workload_identity_pool resource will be created. | <code>string</code> | | <code>null</code> |
| [labels](variables.tf#L51) | Labels to be assigned at project level. | <code>map&#40;string&#41;</code> | | <code>&#123;&#125;</code> |
| [location](variables.tf#L57) | Location used for multi-regional resources. | <code>string</code> | | <code>&#34;eu&#34;</code> |
| [network_config](variables.tf#L63) | Shared VPC network configurations to use. If null networks will be created in projects with preconfigured values. | <code title="object&#40;&#123;&#10; host_project &#61; string&#10; network_self_link &#61; string&#10; subnet_self_link &#61; string&#10;&#125;&#41;">object&#40;&#123;&#8230;&#125;&#41;</code> | | <code>null</code> |
| [notebooks](variables.tf#L73) | Vertex AI workbenchs to be deployed. | <code title="map&#40;object&#40;&#123;&#10; owner &#61; string&#10; region &#61; string&#10; subnet &#61; string&#10; internal_ip_only &#61; optional&#40;bool, false&#41;&#10; idle_shutdown &#61; optional&#40;bool&#41;&#10;&#125;&#41;&#41;">map&#40;object&#40;&#123;&#8230;&#125;&#41;&#41;</code> | | <code>&#123;&#125;</code> |
| [prefix](variables.tf#L86) | Prefix used for the project id. | <code>string</code> | | <code>null</code> |
| [project_create](variables.tf#L92) | Provide values if project creation is needed, uses existing project if null. Parent is in 'folders/nnn' or 'organizations/nnn' format. | <code title="object&#40;&#123;&#10; billing_account_id &#61; string&#10; parent &#61; string&#10;&#125;&#41;">object&#40;&#123;&#8230;&#125;&#41;</code> | | <code>null</code> |
| [project_services](variables.tf#L106) | List of core services enabled on all projects. | <code>list&#40;string&#41;</code> | | <code title="&#91;&#10; &#34;aiplatform.googleapis.com&#34;,&#10; &#34;artifactregistry.googleapis.com&#34;,&#10; &#34;bigquery.googleapis.com&#34;,&#10; &#34;cloudbuild.googleapis.com&#34;,&#10; &#34;compute.googleapis.com&#34;,&#10; &#34;datacatalog.googleapis.com&#34;,&#10; &#34;dataflow.googleapis.com&#34;,&#10; &#34;iam.googleapis.com&#34;,&#10; &#34;monitoring.googleapis.com&#34;,&#10; &#34;notebooks.googleapis.com&#34;,&#10; &#34;secretmanager.googleapis.com&#34;,&#10; &#34;servicenetworking.googleapis.com&#34;,&#10; &#34;serviceusage.googleapis.com&#34;&#10;&#93;">&#91;&#8230;&#93;</code> |
| [region](variables.tf#L126) | Region used for regional resources. | <code>string</code> | | <code>&#34;europe-west4&#34;</code> |
| [repo_name](variables.tf#L132) | Cloud Source Repository name. null to avoid to create it. | <code>string</code> | | <code>null</code> |
| [sa_mlops_name](variables.tf#L138) | Name for the MLOPs Service Account. | <code>string</code> | | <code>&#34;sa-mlops&#34;</code> |
| [prefix](variables.tf#L94) | Prefix used for the project id. | <code>string</code> | | <code>null</code> |
| [region](variables.tf#L114) | Region used for regional resources. | <code>string</code> | | <code>&#34;europe-west4&#34;</code> |
| [repo_name](variables.tf#L120) | Cloud Source Repository name. null to avoid to create it. | <code>string</code> | | <code>null</code> |
| [service_encryption_keys](variables.tf#L126) | Cloud KMS to use to encrypt different services. Key location should match service region. | <code title="object&#40;&#123;&#10; aiplatform &#61; optional&#40;string&#41;&#10; bq &#61; optional&#40;string&#41;&#10; notebooks &#61; optional&#40;string&#41;&#10; secretmanager &#61; optional&#40;string&#41;&#10; storage &#61; optional&#40;string&#41;&#10;&#125;&#41;">object&#40;&#123;&#8230;&#125;&#41;</code> | | <code>&#123;&#125;</code> |

## Outputs

| name | description | sensitive |
|---|---|:---:|
| [github](outputs.tf#L33) | Github Configuration. | |
| [notebook](outputs.tf#L39) | Vertex AI managed notebook details. | |
| [project](outputs.tf#L44) | The project resource as return by the `project` module. | |
| [project_id](outputs.tf#L49) | Project ID. | |
| [github](outputs.tf#L30) | Github Configuration. | |
| [notebook](outputs.tf#L35) | Vertex AI notebooks ids. | |
| [project](outputs.tf#L43) | The project resource as return by the `project` module. | |

<!-- END TFDOC -->

## TODO

- Add support for User Managed Notebooks, SA permission option and non default SA for Single User mode.
- Improve default naming for local VPC and Cloud NAT

## Test

```hcl
module "test" {
source = "./fabric/blueprints/data-solutions/vertex-mlops/"
labels = {
"env" : "dev",
"team" : "ml"
"env" = "dev",
"team" = "ml"
}
bucket_name = "test-dev"
dataset_name = "test"
bucket_name = "gcs-test"
dataset_name = "bq-test"
identity_pool_claims = "attribute.repository/ORGANIZATION/REPO"
notebooks = {
"myworkbench" : {
"owner" : "[email protected]",
"region" : "europe-west4",
"subnet" : "default",
"myworkbench" = {
type = "USER_MANAGED"
}
}
prefix = "pref"
project_id = "test-dev"
project_create = {
prefix = "pref-dev"
project_config = {
billing_account_id = "000000-123456-123456"
parent = "folders/111111111111"
project_id = "test-dev"
}
}
# tftest modules=12 resources=57
# tftest modules=13 resources=65
```
10 changes: 5 additions & 5 deletions blueprints/data-solutions/vertex-mlops/ci-cd.tf
Original file line number Diff line number Diff line change
Expand Up @@ -44,14 +44,11 @@ module "artifact_registry" {
project_id = module.project.project_id
location = var.region
format = "DOCKER"
# iam = {
# "roles/artifactregistry.admin" = ["group:[email protected]"]
# }
}

module "service-account-github" {
source = "../../../modules/iam-service-account"
name = "sa-github"
name = "${var.prefix}-sa-github"
project_id = module.project.project_id
iam = var.identity_pool_claims == null ? {} : { "roles/iam.workloadIdentityUser" = ["principalSet://iam.googleapis.com/${google_iam_workload_identity_pool.github_pool[0].name}/${var.identity_pool_claims}"] }
}
Expand All @@ -63,6 +60,9 @@ module "secret-manager" {
secrets = {
github-key = [var.region]
}
encryption_key = {
"${var.region}" = var.service_encryption_keys.secretmanager
}
iam = {
github-key = {
"roles/secretmanager.secretAccessor" = [
Expand All @@ -71,4 +71,4 @@ module "secret-manager" {
]
}
}
}
}
89 changes: 53 additions & 36 deletions blueprints/data-solutions/vertex-mlops/main.tf
Original file line number Diff line number Diff line change
Expand Up @@ -64,8 +64,7 @@ locals {
}
)

service_encryption_keys = var.service_encryption_keys
shared_vpc_project = try(var.network_config.host_project, null)
shared_vpc_project = try(var.network_config.host_project, null)

subnet = (
local.use_shared_vpc
Expand Down Expand Up @@ -109,20 +108,20 @@ module "gcs-bucket" {
location = var.region
storage_class = "REGIONAL"
versioning = false
encryption_key = try(local.service_encryption_keys.storage, null)
encryption_key = var.service_encryption_keys.storage
}

# Default bucket for Cloud Build to prevent error: "'us' violates constraint ‘gcp.resourceLocations’"
# https://stackoverflow.com/questions/53206667/cloud-build-fails-with-resource-location-constraint
module "gcs-bucket-cloudbuild" {
source = "../../../modules/gcs"
project_id = module.project.project_id
name = "${var.project_id}_cloudbuild"
name = "${var.prefix}_cloudbuild"
prefix = var.prefix
location = var.region
storage_class = "REGIONAL"
versioning = false
encryption_key = try(local.service_encryption_keys.storage, null)
encryption_key = var.service_encryption_keys.storage
}

module "bq-dataset" {
Expand All @@ -131,7 +130,7 @@ module "bq-dataset" {
project_id = module.project.project_id
id = var.dataset_name
location = var.region
encryption_key = try(local.service_encryption_keys.bq, null)
encryption_key = var.service_encryption_keys.bq
}

module "vpc-local" {
Expand Down Expand Up @@ -190,19 +189,28 @@ module "cloudnat" {

module "project" {
source = "../../../modules/project"
name = var.project_id
parent = try(var.project_create.parent, null)
billing_account = try(var.project_create.billing_account_id, null)
project_create = var.project_create != null
name = var.project_config.project_id
parent = var.project_config.parent
billing_account = var.project_config.billing_account_id
project_create = var.project_config.billing_account_id != null
prefix = var.prefix
group_iam = local.group_iam
iam = {
"roles/aiplatform.user" = [module.service-account-mlops.iam_email]
"roles/aiplatform.user" = [
module.service-account-mlops.iam_email,
module.service-account-notebook.iam_email
]
"roles/artifactregistry.reader" = [module.service-account-mlops.iam_email]
"roles/artifactregistry.writer" = [module.service-account-github.iam_email]
"roles/bigquery.dataEditor" = [module.service-account-mlops.iam_email]
"roles/bigquery.jobUser" = [module.service-account-mlops.iam_email]
"roles/bigquery.user" = [module.service-account-mlops.iam_email]
"roles/bigquery.dataEditor" = [
module.service-account-mlops.iam_email,
module.service-account-notebook.iam_email
]
"roles/bigquery.jobUser" = [
module.service-account-mlops.iam_email,
module.service-account-notebook.iam_email
]
"roles/bigquery.user" = [module.service-account-mlops.iam_email, module.service-account-notebook.iam_email]
"roles/cloudbuild.builds.editor" = [
module.service-account-mlops.iam_email,
module.service-account-github.iam_email
Expand All @@ -213,6 +221,8 @@ module "project" {
"roles/dataflow.worker" = [module.service-account-mlops.iam_email]
"roles/iam.serviceAccountUser" = [
module.service-account-mlops.iam_email,
module.service-account-notebook.iam_email,
module.service-account-github.iam_email,
"serviceAccount:${module.project.service_accounts.robots.cloudbuild}"
]
"roles/monitoring.metricWriter" = [module.service-account-mlops.iam_email]
Expand All @@ -223,28 +233,41 @@ module "project" {
]
"roles/storage.admin" = [
module.service-account-mlops.iam_email,
module.service-account-github.iam_email
module.service-account-github.iam_email,
module.service-account-notebook.iam_email
]
}
labels = var.labels

org_policies = {
# Example of applying a project wide policy
# "compute.requireOsLogin" = {
# rules = [{ enforce = false }]
# }
}

service_encryption_key_ids = {
bq = [try(local.service_encryption_keys.bq, null)]
compute = [try(local.service_encryption_keys.compute, null)]
cloudbuild = [try(local.service_encryption_keys.storage, null)]
notebooks = [try(local.service_encryption_keys.compute, null)]
storage = [try(local.service_encryption_keys.storage, null)]
aiplatform = [var.service_encryption_keys.aiplatform]
bq = [var.service_encryption_keys.bq]
cloudbuild = [var.service_encryption_keys.storage]
notebooks = [var.service_encryption_keys.notebooks]
secretmanager = [var.service_encryption_keys.secretmanager]
storage = [var.service_encryption_keys.storage]
}
services = var.project_services


services = [
"aiplatform.googleapis.com",
"artifactregistry.googleapis.com",
"bigquery.googleapis.com",
"bigquerystorage.googleapis.com",
"cloudbuild.googleapis.com",
"compute.googleapis.com",
"datacatalog.googleapis.com",
"dataflow.googleapis.com",
"iam.googleapis.com",
"ml.googleapis.com",
"monitoring.googleapis.com",
"notebooks.googleapis.com",
"secretmanager.googleapis.com",
"servicenetworking.googleapis.com",
"serviceusage.googleapis.com",
"stackdriver.googleapis.com",
"storage.googleapis.com",
"storage-component.googleapis.com"
]
shared_vpc_service_config = local.shared_vpc_project == null ? null : {
attach = true
host_project = local.shared_vpc_project
Expand All @@ -254,11 +277,8 @@ module "project" {

module "service-account-mlops" {
source = "../../../modules/iam-service-account"
name = var.sa_mlops_name
name = "${var.prefix}-sa-mlops"
project_id = module.project.project_id
iam = {
"roles/iam.serviceAccountUser" = [module.service-account-github.iam_email]
}
}

resource "google_project_iam_member" "shared_vpc" {
Expand All @@ -268,11 +288,8 @@ resource "google_project_iam_member" "shared_vpc" {
member = "serviceAccount:${module.project.service_accounts.robots.notebooks}"
}


resource "google_sourcerepo_repository" "code-repo" {
count = var.repo_name == null ? 0 : 1
name = var.repo_name
project = module.project.project_id
}


60 changes: 0 additions & 60 deletions blueprints/data-solutions/vertex-mlops/notebooks.tf

This file was deleted.

Loading