-
Notifications
You must be signed in to change notification settings - Fork 910
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #1337 from GoogleCloudPlatform/lcaggio/vertex-01
Improve Vertex mlops blueprint
- Loading branch information
Showing
13 changed files
with
469 additions
and
242 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -98,5 +98,5 @@ module "test" { | |
prefix = "prefix" | ||
} | ||
# tftest modules=9 resources=47 | ||
# tftest modules=9 resources=48 | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,24 +1,30 @@ | ||
# MLOps with Vertex AI | ||
|
||
## Introduction | ||
## Tagline | ||
|
||
Create a Vertex AI environment needed for MLOps. | ||
|
||
## Detailed | ||
|
||
This example implements the infrastructure required to deploy an end-to-end [MLOps process](https://services.google.com/fh/files/misc/practitioners_guide_to_mlops_whitepaper.pdf) using [Vertex AI](https://cloud.google.com/vertex-ai) platform. | ||
|
||
## GCP resources | ||
## Architecture | ||
|
||
The blueprint will deploy all the required resources to have a fully functional MLOPs environment containing: | ||
|
||
- Vertex Workbench (for the experimentation environment) | ||
- GCP Project (optional) to host all the resources | ||
- Isolated VPC network and a subnet to be used by Vertex and Dataflow. Alternatively, an external Shared VPC can be configured using the `network_config`variable. | ||
- Firewall rule to allow the internal subnet communication required by Dataflow | ||
- Cloud NAT required to reach the internet from the different computing resources (Vertex and Dataflow) | ||
- GCS buckets to host Vertex AI and Cloud Build Artifacts. By default the buckets will be regional and should match the Vertex AI region for the different resources (i.e. Vertex Managed Dataset) and processes (i.e. Vertex trainining) | ||
- BigQuery Dataset where the training data will be stored. This is optional, since the training data could be already hosted in an existing BigQuery dataset. | ||
- Artifact Registry Docker repository to host the custom images. | ||
- Service account (`mlops-[env]@`) with the minimum permissions required by Vertex AI and Dataflow (if this service is used inside of the Vertex AI Pipeline). | ||
- Service account (`github@`) to be used by Workload Identity Federation, to federate Github identity (Optional). | ||
- Secret to store the Github SSH key to get access the CICD code repo. | ||
1. Vertex Workbench (for the experimentation environment). | ||
1. GCP Project (optional) to host all the resources. | ||
1. Isolated VPC network and a subnet to be used by Vertex and Dataflow. Alternatively, an external Shared VPC can be configured using the `network_config`variable. | ||
1. Firewall rule to allow the internal subnet communication required by Dataflow. | ||
1. Cloud NAT required to reach the internet from the different computing resources (Vertex and Dataflow). | ||
1. GCS buckets to host Vertex AI and Cloud Build Artifacts. By default the buckets will be regional and should match the Vertex AI region for the different resources (i.e. Vertex Managed Dataset) and processes (i.e. Vertex trainining). | ||
1. BigQuery Dataset where the training data will be stored. This is optional, since the training data could be already hosted in an existing BigQuery dataset. | ||
1. Artifact Registry Docker repository to host the custom images. | ||
1. Service account (`PREFIX-sa-mlops`) with the minimum permissions required by Vertex AI and Dataflow (if this service is used inside of the Vertex AI Pipeline). | ||
1. Service account (`PREFIX-sa-github@`) to be used by Workload Identity Federation, to federate Github identity (Optional). | ||
1. Secret Manager to store the Github SSH key to get access the CICD code repo. | ||
|
||
## Documentation | ||
|
||
![MLOps project description](./images/mlops_projects.png "MLOps project description") | ||
|
||
|
@@ -46,69 +52,81 @@ Please note that these groups are not suitable for production grade environments | |
## What's next? | ||
|
||
This blueprint can be used as a building block for setting up an end2end ML Ops solution. As next step, you can follow this [guide](https://cloud.google.com/architecture/architecture-for-mlops-using-tfx-kubeflow-pipelines-and-cloud-build) to setup a Vertex AI pipeline and run it on the deployed infraestructure. | ||
|
||
## Usage | ||
|
||
Basic usage of this module is as follows: | ||
|
||
```hcl | ||
module "test" { | ||
source = "./fabric/blueprints/data-solutions/vertex-mlops/" | ||
notebooks = { | ||
"myworkbench" = { | ||
type = "USER_MANAGED" | ||
} | ||
} | ||
prefix = "pref-dev" | ||
project_config = { | ||
billing_account_id = "000000-123456-123456" | ||
parent = "folders/111111111111" | ||
project_id = "test-dev" | ||
} | ||
} | ||
# tftest modules=11 resources=60 | ||
``` | ||
|
||
<!-- BEGIN TFDOC --> | ||
|
||
## Variables | ||
|
||
| name | description | type | required | default | | ||
|---|---|:---:|:---:|:---:| | ||
| [project_id](variables.tf#L101) | Project id, references existing project if `project_create` is null. | <code>string</code> | ✓ | | | ||
| [notebooks](variables.tf#L69) | Vertex AI workbenchs to be deployed. Service Account runtime/instances deployed. | <code title="map(object({ type = string machine_type = optional(string, "n1-standard-4") internal_ip_only = optional(bool, true) idle_shutdown = optional(bool, false) owner = optional(string) }))">map(object({…}))</code> | ✓ | | | ||
| [project_config](variables.tf#L96) | Provide 'billing_account_id' value if project creation is needed, uses existing 'project_id' if null. Parent is in 'folders/nnn' or 'organizations/nnn' format. | <code title="object({ billing_account_id = optional(string) parent = optional(string) project_id = string })">object({…})</code> | ✓ | | | ||
| [bucket_name](variables.tf#L18) | GCS bucket name to store the Vertex AI artifacts. | <code>string</code> | | <code>null</code> | | ||
| [dataset_name](variables.tf#L24) | BigQuery Dataset to store the training data. | <code>string</code> | | <code>null</code> | | ||
| [groups](variables.tf#L30) | Name of the groups ([email protected]) to apply opinionated IAM permissions. | <code title="object({ gcp-ml-ds = string gcp-ml-eng = string gcp-ml-viewer = string })">object({…})</code> | | <code title="{ gcp-ml-ds = null gcp-ml-eng = null gcp-ml-viewer = null }">{…}</code> | | ||
| [identity_pool_claims](variables.tf#L45) | Claims to be used by Workload Identity Federation (i.e.: attribute.repository/ORGANIZATION/REPO). If a not null value is provided, then google_iam_workload_identity_pool resource will be created. | <code>string</code> | | <code>null</code> | | ||
| [labels](variables.tf#L51) | Labels to be assigned at project level. | <code>map(string)</code> | | <code>{}</code> | | ||
| [location](variables.tf#L57) | Location used for multi-regional resources. | <code>string</code> | | <code>"eu"</code> | | ||
| [network_config](variables.tf#L63) | Shared VPC network configurations to use. If null networks will be created in projects with preconfigured values. | <code title="object({ host_project = string network_self_link = string subnet_self_link = string })">object({…})</code> | | <code>null</code> | | ||
| [notebooks](variables.tf#L73) | Vertex AI workbenchs to be deployed. | <code title="map(object({ owner = string region = string subnet = string internal_ip_only = optional(bool, false) idle_shutdown = optional(bool) }))">map(object({…}))</code> | | <code>{}</code> | | ||
| [prefix](variables.tf#L86) | Prefix used for the project id. | <code>string</code> | | <code>null</code> | | ||
| [project_create](variables.tf#L92) | Provide values if project creation is needed, uses existing project if null. Parent is in 'folders/nnn' or 'organizations/nnn' format. | <code title="object({ billing_account_id = string parent = string })">object({…})</code> | | <code>null</code> | | ||
| [project_services](variables.tf#L106) | List of core services enabled on all projects. | <code>list(string)</code> | | <code title="[ "aiplatform.googleapis.com", "artifactregistry.googleapis.com", "bigquery.googleapis.com", "cloudbuild.googleapis.com", "compute.googleapis.com", "datacatalog.googleapis.com", "dataflow.googleapis.com", "iam.googleapis.com", "monitoring.googleapis.com", "notebooks.googleapis.com", "secretmanager.googleapis.com", "servicenetworking.googleapis.com", "serviceusage.googleapis.com" ]">[…]</code> | | ||
| [region](variables.tf#L126) | Region used for regional resources. | <code>string</code> | | <code>"europe-west4"</code> | | ||
| [repo_name](variables.tf#L132) | Cloud Source Repository name. null to avoid to create it. | <code>string</code> | | <code>null</code> | | ||
| [sa_mlops_name](variables.tf#L138) | Name for the MLOPs Service Account. | <code>string</code> | | <code>"sa-mlops"</code> | | ||
| [groups](variables.tf#L30) | Name of the groups ([email protected]) to apply opinionated IAM permissions. | <code title="object({ gcp-ml-ds = optional(string) gcp-ml-eng = optional(string) gcp-ml-viewer = optional(string) })">object({…})</code> | | <code>{}</code> | | ||
| [identity_pool_claims](variables.tf#L41) | Claims to be used by Workload Identity Federation (i.e.: attribute.repository/ORGANIZATION/REPO). If a not null value is provided, then google_iam_workload_identity_pool resource will be created. | <code>string</code> | | <code>null</code> | | ||
| [labels](variables.tf#L47) | Labels to be assigned at project level. | <code>map(string)</code> | | <code>{}</code> | | ||
| [location](variables.tf#L53) | Location used for multi-regional resources. | <code>string</code> | | <code>"eu"</code> | | ||
| [network_config](variables.tf#L59) | Shared VPC network configurations to use. If null networks will be created in projects with preconfigured values. | <code title="object({ host_project = string network_self_link = string subnet_self_link = string })">object({…})</code> | | <code>null</code> | | ||
| [prefix](variables.tf#L90) | Prefix used for the project id. | <code>string</code> | | <code>null</code> | | ||
| [region](variables.tf#L110) | Region used for regional resources. | <code>string</code> | | <code>"europe-west4"</code> | | ||
| [repo_name](variables.tf#L116) | Cloud Source Repository name. null to avoid to create it. | <code>string</code> | | <code>null</code> | | ||
| [service_encryption_keys](variables.tf#L122) | Cloud KMS to use to encrypt different services. Key location should match service region. | <code title="object({ aiplatform = optional(string) bq = optional(string) notebooks = optional(string) secretmanager = optional(string) storage = optional(string) })">object({…})</code> | | <code>{}</code> | | ||
|
||
## Outputs | ||
|
||
| name | description | sensitive | | ||
|---|---|:---:| | ||
| [github](outputs.tf#L33) | Github Configuration. | | | ||
| [notebook](outputs.tf#L39) | Vertex AI managed notebook details. | | | ||
| [project](outputs.tf#L44) | The project resource as return by the `project` module. | | | ||
| [project_id](outputs.tf#L49) | Project ID. | | | ||
| [github](outputs.tf#L30) | Github Configuration. | | | ||
| [notebook](outputs.tf#L35) | Vertex AI notebooks ids. | | | ||
| [project](outputs.tf#L43) | The project resource as return by the `project` module. | | | ||
|
||
<!-- END TFDOC --> | ||
|
||
## TODO | ||
|
||
- Add support for User Managed Notebooks, SA permission option and non default SA for Single User mode. | ||
- Improve default naming for local VPC and Cloud NAT | ||
|
||
## Test | ||
|
||
```hcl | ||
module "test" { | ||
source = "./fabric/blueprints/data-solutions/vertex-mlops/" | ||
labels = { | ||
"env" : "dev", | ||
"team" : "ml" | ||
"env" = "dev", | ||
"team" = "ml" | ||
} | ||
bucket_name = "test-dev" | ||
dataset_name = "test" | ||
bucket_name = "gcs-test" | ||
dataset_name = "bq-test" | ||
identity_pool_claims = "attribute.repository/ORGANIZATION/REPO" | ||
notebooks = { | ||
"myworkbench" : { | ||
"owner" : "[email protected]", | ||
"region" : "europe-west4", | ||
"subnet" : "default", | ||
"myworkbench" = { | ||
type = "USER_MANAGED" | ||
} | ||
} | ||
prefix = "pref" | ||
project_id = "test-dev" | ||
project_create = { | ||
prefix = "pref-dev" | ||
project_config = { | ||
billing_account_id = "000000-123456-123456" | ||
parent = "folders/111111111111" | ||
project_id = "test-dev" | ||
} | ||
} | ||
# tftest modules=12 resources=57 | ||
# tftest modules=13 resources=65 | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -44,14 +44,11 @@ module "artifact_registry" { | |
project_id = module.project.project_id | ||
location = var.region | ||
format = "DOCKER" | ||
# iam = { | ||
# "roles/artifactregistry.admin" = ["group:[email protected]"] | ||
# } | ||
} | ||
|
||
module "service-account-github" { | ||
source = "../../../modules/iam-service-account" | ||
name = "sa-github" | ||
name = "${var.prefix}-sa-github" | ||
project_id = module.project.project_id | ||
iam = var.identity_pool_claims == null ? {} : { "roles/iam.workloadIdentityUser" = ["principalSet://iam.googleapis.com/${google_iam_workload_identity_pool.github_pool[0].name}/${var.identity_pool_claims}"] } | ||
} | ||
|
@@ -63,6 +60,9 @@ module "secret-manager" { | |
secrets = { | ||
github-key = [var.region] | ||
} | ||
encryption_key = { | ||
"${var.region}" = var.service_encryption_keys.secretmanager | ||
} | ||
iam = { | ||
github-key = { | ||
"roles/secretmanager.secretAccessor" = [ | ||
|
@@ -71,4 +71,4 @@ module "secret-manager" { | |
] | ||
} | ||
} | ||
} | ||
} |
Oops, something went wrong.