Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve Data Playground example #738

Merged
merged 7 commits into from
Aug 9, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions examples/data-solutions/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,6 @@ This [example](./cloudsql-multiregion/) creates a [Cloud SQL instance](https://c

### Data Playground starter with Cloud Vertex AI Notebook and GCS

<a href="./data-playground/" title="Data Playground starter with Cloud Vertex AI Notebook and GCS"><img src="./data-playground/diagram.png" align="left" width="280px"></a>
This [example](./data-playground/) creates a [Vertex AI Notebook](https://cloud.google.com/vertex-ai/docs/workbench/introduction) running under a VPC network and a starter GCS bucket to store inputs and outputs of data experiments.
<a href="./data-playground/" title="Data Playground project with Cloud Vertex AI Notebook, BigQuery and GCS"><img src="./data-playground/diagram.png" align="left" width="280px"></a>
This [example](./data-playground/) creates a [Vertex AI Notebook](https://cloud.google.com/vertex-ai/docs/workbench/introduction) running on a VPC with a private IP and a dedicated Service Account. A GCS bucket and a BigQuery dataset are created to store inputs and outputs of data experiments.
<br clear="left">
72 changes: 48 additions & 24 deletions examples/data-solutions/data-playground/README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Data Playground

This example creates a minimum viable template for a data experimentation project with the needed APIs enabled, basic VPC and Firewall set in place, GCS bucket and an AI notebook to get started.
This example creates a minimum viable architecture for a data experimentation project with the needed APIs enabled, VPC and Firewall set in place, BigQuesy dataset, GCS bucket and an AI notebook to get started.

This is the high level diagram:

Expand All @@ -10,34 +10,58 @@ This is the high level diagram:

This sample creates several distinct groups of resources:

- projects
- Service Project configured for GCE instances and GCS buckets
- project
- networking
- VPC network
- One default subnet
- VPC network with a default subnet and CloudNat
- Firewall rules for [SSH access via IAP](https://cloud.google.com/iap/docs/using-tcp-forwarding) and open communication within the VPC
- Vertex AI notebook
- One Jupyter lab notebook instance with public access
- GCS
- One bucket initial bucket
- Vertex AI Workbench notebook configured with a private IP and using a dedicated Service Account
- One GCS bucket
- One BigQuery dataset

## Deploy your enviroment
We assume the identiy running the following steps has the following role:

- resourcemanager.projectCreator in case a new project will be created.
- owner on the project in case you use an existing project.

Run Terraform init:
```
$ terraform init
```

Configure the Terraform variable in your terraform.tfvars file. You need to spefify at least the following variables:
```
prefix = "prefix"
project_id = "data-001"
```

You can run now:
```
$ terraform apply
```

You can now connect to the Vertex AI notbook to perform your data analysy.
<!-- BEGIN TFDOC -->

## Variables
| name | description | type | required | default |
| ------------------------- | --------------------------------------------------------------------------------------------------------------------------------------- | ----------- | -------- | ------------ |
| project\_id | Project id, references existing project if \`project\_create\` is null. | string | ✓ | |
| location | The location where resources will be deployed | string | | europe |
| region | The region where resources will be deployed. | string | | europe-west1 |
| project\_create | Provide values if project creation is needed, uses existing project if null. Parent format: folders/folder\_id or organizations/org\_id | object({…}) | | null |
| prefix | Unique prefix used for resource names. Not used for project if 'project\_create' is null. | string | | dp |
| service\_encryption\_keys | Cloud KMS to use to encrypt different services. Key location should match service region. | object({…}) | | null |
| vpc\_config | Parameters to create a simple VPC for the Data Playground | object({…}) | | {...} |

| name | description | type | required | default |
|---|---|:---:|:---:|:---:|
| [prefix](variables.tf#L36) | Unique prefix used for resource names. Not used for project if 'project_create' is null. | <code>string</code> | ✓ | |
| [project_id](variables.tf#L22) | Project id, references existing project if `project_create` is null. | <code>string</code> | ✓ | |
| [location](variables.tf#L16) | The location where resources will be deployed. | <code>string</code> | | <code>&#34;EU&#34;</code> |
| [project_create](variables.tf#L27) | Provide values if project creation is needed, uses existing project if null. Parent format: folders/folder_id or organizations/org_id | <code title="object&#40;&#123;&#10; billing_account_id &#61; string&#10; parent &#61; string&#10;&#125;&#41;">object&#40;&#123;&#8230;&#125;&#41;</code> | | <code>null</code> |
| [region](variables.tf#L41) | The region where resources will be deployed. | <code>string</code> | | <code>&#34;europe-west1&#34;</code> |
| [vpc_config](variables.tf#L57) | Parameters to create a VPC. | <code title="object&#40;&#123;&#10; ip_cidr_range &#61; string&#10;&#125;&#41;">object&#40;&#123;&#8230;&#125;&#41;</code> | | <code title="&#123;&#10; ip_cidr_range &#61; &#34;10.0.0.0&#47;20&#34;&#10;&#125;">&#123;&#8230;&#125;</code> |

## Outputs
| Name | Description |
| ----------------------------------------------------------------------------------------------------------------------------------------------------- | ----------------------- |
| bucket | GCS Bucket URL. |
| project | Project id |
| vpc | VPC Network name |
| notebook | Vertex AI notebook name |

| name | description | sensitive |
|---|---|:---:|
| [bucket](outputs.tf#L15) | GCS Bucket URL. | |
| [dataset](outputs.tf#L20) | GCS Bucket URL. | |
| [notebook](outputs.tf#L25) | Vertex AI notebook details. | |
| [project](outputs.tf#L33) | Project id | |
| [vpc](outputs.tf#L38) | VPC Network | |

lcaggio marked this conversation as resolved.
Show resolved Hide resolved
<!-- END TFDOC -->
Binary file modified examples/data-solutions/data-playground/diagram.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
91 changes: 76 additions & 15 deletions examples/data-solutions/data-playground/main.tf
Original file line number Diff line number Diff line change
Expand Up @@ -27,25 +27,33 @@ module "project" {
project_create = var.project_create != null
prefix = var.project_create == null ? null : var.prefix
services = [
"stackdriver.googleapis.com",
"compute.googleapis.com",
"storage-component.googleapis.com",
"storage.googleapis.com",
"servicenetworking.googleapis.com",
"bigquery.googleapis.com",
"bigquerystorage.googleapis.com",
"bigqueryreservation.googleapis.com",
"composer.googleapis.com",
"compute.googleapis.com",
"dataflow.googleapis.com",
"ml.googleapis.com",
"notebooks.googleapis.com",
"composer.googleapis.com"
"servicenetworking.googleapis.com",
"stackdriver.googleapis.com",
"storage.googleapis.com",
"storage-component.googleapis.com"
]
policy_boolean = {
# "constraints/compute.requireOsLogin" = false
# Example of applying a project wide policy, mainly useful for Composer
}
service_encryption_key_ids = {
compute = [try(local.service_encryption_keys.compute, null)]
bq = [try(local.service_encryption_keys.bq, null)]
storage = [try(local.service_encryption_keys.storage, null)]
}

service_config = {
disable_on_destroy = false,
disable_dependent_services = false
}
}

###############################################################################
Expand All @@ -55,11 +63,11 @@ module "project" {
module "vpc" {
source = "../../../modules/net-vpc"
project_id = module.project.project_id
name = var.vpc_config.vpc_name
name = "${var.prefix}-vpc"
subnets = [
{
ip_cidr_range = var.vpc_config.ip_cidr_range
name = var.vpc_config.subnet_name
name = "${var.prefix}-subnet"
region = var.region
secondary_ip_range = {}
}
Expand All @@ -71,27 +79,73 @@ module "vpc-firewall" {
project_id = module.project.project_id
network = module.vpc.name
admin_ranges = [var.vpc_config.ip_cidr_range]
custom_rules = {
#TODO Remove and rely on 'ssh' tag once terraform-provider-google/issues/9273 is fixed
("${var.prefix}-iap") = {
description = "Enable SSH from IAP on Notebooks."
direction = "INGRESS"
action = "allow"
sources = []
ranges = ["35.235.240.0/20"]
targets = ["notebook-instance"]
use_service_accounts = false
rules = [{ protocol = "tcp", ports = [22] }]
extra_attributes = {}
}
}
}

module "cloudnat" {
source = "../../../modules/net-cloudnat"
project_id = module.project.project_id
name = "${var.prefix}-default"
region = var.region
router_network = module.vpc.name
}

###############################################################################
# GCS #
# Storage #
###############################################################################

module "base-gcs-bucket" {
module "bucket" {
source = "../../../modules/gcs"
project_id = module.project.project_id
prefix = module.project.project_id
name = "base"
prefix = var.prefix
location = var.location
name = "data"
encryption_key = try(local.service_encryption_keys.storage, null) # Example assignment of an encryption key
}

module "dataset" {
source = "../../../modules/bigquery-dataset"
project_id = module.project.project_id
id = "${var.prefix}_data"
encryption_key = try(local.service_encryption_keys.bq, null) # Example assignment of an encryption key
}

###############################################################################
# Vertex AI Notebook #
###############################################################################
# TODO: Add encryption_key to Vertex AI notebooks as well
# TODO: Add shared VPC support

module "service-account-notebook" {
source = "../../../modules/iam-service-account"
project_id = module.project.project_id
name = "notebook-sa"
iam_project_roles = {
(module.project.project_id) = [
"roles/bigquery.admin",
"roles/bigquery.jobUser",
"roles/bigquery.dataEditor",
"roles/bigquery.user",
"roles/storage.admin",
]
}
}

resource "google_notebooks_instance" "playground" {
name = "data-play-notebook"
name = "${var.prefix}-notebook"
location = format("%s-%s", var.region, "b")
machine_type = "e2-medium"
project = module.project.project_id
Expand All @@ -104,10 +158,17 @@ resource "google_notebooks_instance" "playground" {
install_gpu_driver = true
boot_disk_type = "PD_SSD"
boot_disk_size_gb = 110
disk_encryption = try(local.service_encryption_keys.compute != null, false) ? "CMEK" : "GMEK"
kms_key = try(local.service_encryption_keys.compute, null)

no_public_ip = false
no_public_ip = true
lcaggio marked this conversation as resolved.
Show resolved Hide resolved
no_proxy_access = false

network = module.vpc.network.id
subnet = module.vpc.subnets[format("%s/%s", var.region, var.vpc_config.subnet_name)].id
subnet = module.vpc.subnets[format("%s/%s", var.region, "${var.prefix}-subnet")].id

service_account = module.service-account-notebook.email

#TODO Uncomment once terraform-provider-google/issues/9273 is fixed
# tags = ["ssh"]
}
16 changes: 12 additions & 4 deletions examples/data-solutions/data-playground/outputs.tf
Original file line number Diff line number Diff line change
Expand Up @@ -14,12 +14,20 @@

lcaggio marked this conversation as resolved.
Show resolved Hide resolved
output "bucket" {
description = "GCS Bucket URL."
value = module.base-gcs-bucket.url
value = module.bucket.url
}

output "dataset" {
description = "GCS Bucket URL."
value = module.dataset.id
}

output "notebook" {
description = "Vertex AI notebook"
value = resource.google_notebooks_instance.playground.name
description = "Vertex AI notebook details."
value = {
name = resource.google_notebooks_instance.playground.name
id = resource.google_notebooks_instance.playground.id
}
}

output "project" {
Expand All @@ -30,4 +38,4 @@ output "project" {
output "vpc" {
description = "VPC Network"
value = module.vpc.name
}
}
13 changes: 5 additions & 8 deletions examples/data-solutions/data-playground/variables.tf
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@
variable "location" {
description = "The location where resources will be deployed."
type = string
default = "europe"
default = "EU"
}

variable "project_id" {
Expand All @@ -36,7 +36,6 @@ variable "project_create" {
variable "prefix" {
description = "Unique prefix used for resource names. Not used for project if 'project_create' is null."
type = string
default = "dp"
}

variable "region" {
Expand All @@ -48,21 +47,19 @@ variable "region" {
variable "service_encryption_keys" { # service encription key
description = "Cloud KMS to use to encrypt different services. Key location should match service region."
type = object({
bq = string
compute = string
storage = string
})
default = null
}

variable "vpc_config" {
description = "Parameters to create a simple VPC for the Data Playground"
description = "Parameters to create a VPC."
type = object({
ip_cidr_range = string
subnet_name = string
vpc_name = string
})
default = {
ip_cidr_range = "10.0.0.0/20"
subnet_name = "default-subnet"
vpc_name = "data-playground-vpc"
}
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@
module "test" {
source = "../../../../../examples/data-solutions/data-playground/"
project_id = "sampleproject"
prefix = "tst"
project_create = {
billing_account_id = "123456-123456-123456",
parent = "folders/467898377"
Expand Down
4 changes: 2 additions & 2 deletions tests/examples/data_solutions/data_playground/test_plan.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,5 +22,5 @@
def test_resources(e2e_plan_runner):
"Test that plan works and the numbers of resources is as expected."
modules, resources = e2e_plan_runner(FIXTURES_DIR)
assert len(modules) == 4
assert len(resources) == 23
assert len(modules) == 7
assert len(resources) == 34