Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Minimal Data Platform blueprint #1362

Merged
merged 8 commits into from
May 8, 2023
Merged
Show file tree
Hide file tree
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion blueprints/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ Currently available blueprints:

- **apigee** - [Apigee Hybrid on GKE](./apigee/hybrid-gke/), [Apigee X analytics in BigQuery](./apigee/bigquery-analytics), [Apigee network patterns](./apigee/network-patterns/)
- **cloud operations** - [Active Directory Federation Services](./cloud-operations/adfs), [Cloud Asset Inventory feeds for resource change tracking and remediation](./cloud-operations/asset-inventory-feed-remediation), [Fine-grained Cloud DNS IAM via Service Directory](./cloud-operations/dns-fine-grained-iam), [Cloud DNS & Shared VPC design](./cloud-operations/dns-shared-vpc), [Delegated Role Grants](./cloud-operations/iam-delegated-role-grants), [Networking Dashboard](./cloud-operations/network-dashboard), [Managing on-prem service account keys by uploading public keys](./cloud-operations/onprem-sa-key-management), [Compute Image builder with Hashicorp Packer](./cloud-operations/packer-image-builder), [Packer example](./cloud-operations/packer-image-builder/packer), [Compute Engine quota monitoring](./cloud-operations/quota-monitoring), [Scheduled Cloud Asset Inventory Export to Bigquery](./cloud-operations/scheduled-asset-inventory-export-bq), [Configuring workload identity federation with Terraform Cloud/Enterprise workflows](./cloud-operations/terraform-cloud-dynamic-credentials), [TCP healthcheck and restart for unmanaged GCE instances](./cloud-operations/unmanaged-instances-healthcheck), [Migrate for Compute Engine (v5) blueprints](./cloud-operations/vm-migration), [Configuring workload identity federation to access Google Cloud resources from apps running on Azure](./cloud-operations/workload-identity-federation)
- **data solutions** - [GCE and GCS CMEK via centralized Cloud KMS](./data-solutions/cmek-via-centralized-kms), [Cloud Composer version 2 private instance, supporting Shared VPC and external CMEK key](./data-solutions/composer-2), [Cloud SQL instance with multi-region read replicas](./data-solutions/cloudsql-multiregion), [Data Platform](./data-solutions/data-platform-foundations), [Spinning up a foundation data pipeline on Google Cloud using Cloud Storage, Dataflow and BigQuery](./data-solutions/gcs-to-bq-with-least-privileges), [#SQL Server Always On Groups blueprint](./data-solutions/sqlserver-alwayson), [Data Playground](./data-solutions/data-playground), [MLOps with Vertex AI](./data-solutions/vertex-mlops), [Shielded Folder](./data-solutions/shielded-folder), [BigQuery ML and Vertex AI Pipeline](./data-solutions/bq-ml)
- **data solutions** - [GCE and GCS CMEK via centralized Cloud KMS](./data-solutions/cmek-via-centralized-kms), [Cloud Composer version 2 private instance, supporting Shared VPC and external CMEK key](./data-solutions/composer-2), [Cloud SQL instance with multi-region read replicas](./data-solutions/cloudsql-multiregion), [Data Platform](./data-solutions/data-platform-foundations), [Minimal Data Platform](./data-solutions/data-platform-minimal), [Spinning up a foundation data pipeline on Google Cloud using Cloud Storage, Dataflow and BigQuery](./data-solutions/gcs-to-bq-with-least-privileges), [#SQL Server Always On Groups blueprint](./data-solutions/sqlserver-alwayson), [Data Playground](./data-solutions/data-playground), [MLOps with Vertex AI](./data-solutions/vertex-mlops), [Shielded Folder](./data-solutions/shielded-folder), [BigQuery ML and Vertex AI Pipeline](./data-solutions/bq-ml)
- **factories** - [The why and the how of Resource Factories](./factories), [Google Cloud Identity Group Factory](./factories/cloud-identity-group-factory), [Google Cloud BQ Factory](./factories/bigquery-factory), [Google Cloud VPC Firewall Factory](./factories/net-vpc-firewall-yaml), [Minimal Project Factory](./factories/project-factory)
- **GKE** - [Binary Authorization Pipeline Blueprint](./gke/binauthz), [Storage API](./gke/binauthz/image), [Multi-cluster mesh on GKE (fleet API)](./gke/multi-cluster-mesh-gke-fleet-api), [GKE Multitenant Blueprint](./gke/multitenant-fleet), [Shared VPC with GKE support](./networking/shared-vpc-gke/), [GKE Autopilot](./gke/autopilot)
- **networking** - [Calling a private Cloud Function from On-premises](./networking/private-cloud-function-from-onprem), [Decentralized firewall management](./networking/decentralized-firewall), [Decentralized firewall validator](./networking/decentralized-firewall/validator), [Network filtering with Squid](./networking/filtering-proxy), [GLB and multi-regional daisy-chaining through hybrid NEGs](./networking/glb-hybrid-neg-internal), [Hybrid connectivity to on-premise services through PSC](./networking/psc-hybrid), [HTTP Load Balancer with Cloud Armor](./networking/glb-and-armor), [Hub and Spoke via VPN](./networking/hub-and-spoke-vpn), [Hub and Spoke via VPC Peering](./networking/hub-and-spoke-peering), [Internal Load Balancer as Next Hop](./networking/ilb-next-hop), [Network filtering with Squid with isolated VPCs using Private Service Connect](./networking/filtering-proxy-psc), On-prem DNS and Google Private Access, [PSC Producer](./networking/psc-hybrid/psc-producer), [PSC Consumer](./networking/psc-hybrid/psc-consumer), [Shared VPC with optional GKE cluster](./networking/shared-vpc-gke)
Expand Down
11 changes: 9 additions & 2 deletions blueprints/data-solutions/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,8 +29,15 @@ This [blueprint](./composer-2/) creates a [Cloud Composer](https://cloud.google.

### Data Platform Foundations

<a href="./data-platform-foundations/" title="Data Platform Foundations"><img src="./data-platform-foundations/images/overview_diagram.png" align="left" width="280px"></a>
This [blueprint](./data-platform-foundations/) implements a robust and flexible Data Foundation on GCP that provides opinionated defaults, allowing customers to build and scale out additional data pipelines quickly and reliably.
<a href="./data-platform-foundations/" title="Data Platform"><img src="./data-platform-foundations/images/overview_diagram.png" align="left" width="280px"></a>
This [blueprint](./data-platform-foundations/) implements a robust and flexible Data Platform on GCP that provides opinionated defaults, allowing customers to build and scale out additional data pipelines quickly and reliably.

<br clear="left">

### Minimal Data Platform

<a href="./data-platform-minimal/" title="Minimal Data Platform"><img src="./data-platform-minimal/images/diagram.png" align="left" width="280px"></a>
This [blueprint](./data-platform-minimal/) implements a minimal Data Platform on GCP that provides opinionated defaults, allowing customers to build and scale out additional data pipelines quickly and reliably.

<br clear="left">

Expand Down
2 changes: 2 additions & 0 deletions blueprints/data-solutions/data-platform-foundations/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,8 @@

This module implements an opinionated Data Platform Architecture that creates and setup projects and related resources that compose an end-to-end data environment.

For a minimal Data Platform, plese refer to the [Minimal Data Platform](../data-platform-minimal/) blueprint.

The code is intentionally simple, as it's intended to provide a generic initial setup and then allow easy customizations to complete the implementation of the intended design.

The following diagram is a high-level reference of the resources created and managed here:
Expand Down
77 changes: 77 additions & 0 deletions blueprints/data-solutions/data-platform-minimal/01-landing.tf
Original file line number Diff line number Diff line change
@@ -0,0 +1,77 @@
# Copyright 2022 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# tfdoc:file:description Landing project and resources.

locals {
iam_lnd = {
"roles/storage.objectCreator" = [module.land-sa-cs-0.iam_email]
"roles/storage.objectViewer" = [module.processing-sa-cmp-0.iam_email]
"roles/storage.objectAdmin" = [module.processing-sa-dp-0.iam_email]
}
}

module "land-project" {
source = "../../../modules/project"
parent = var.project_config.parent
billing_account = var.project_config.billing_account_id
project_create = var.project_config.billing_account_id != null
prefix = var.project_config.billing_account_id == null ? null : var.prefix
name = (
var.project_config.billing_account_id == null
? var.project_config.project_ids.landing
: "${var.project_config.project_ids.landing}${local.project_suffix}"
)
iam = var.project_config.billing_account_id != null ? local.iam_lnd : null
iam_additive = var.project_config.billing_account_id == null ? local.iam_lnd : null
services = [
"cloudkms.googleapis.com",
"cloudresourcemanager.googleapis.com",
"iam.googleapis.com",
"serviceusage.googleapis.com",
"stackdriver.googleapis.com",
"storage.googleapis.com",
"storage-component.googleapis.com",
]
service_encryption_key_ids = {
bq = [var.service_encryption_keys.bq]
storage = [var.service_encryption_keys.storage]
}
}

# Cloud Storage

module "land-sa-cs-0" {
source = "../../../modules/iam-service-account"
project_id = module.land-project.project_id
prefix = var.prefix
name = "lnd-cs-0"
display_name = "Data platform GCS landing service account."
iam = {
"roles/iam.serviceAccountTokenCreator" = [
local.groups_iam.data-engineers
]
}
}

module "land-cs-0" {
source = "../../../modules/gcs"
project_id = module.land-project.project_id
prefix = var.prefix
name = "lnd-cs-0"
location = var.location
storage_class = "MULTI_REGIONAL"
encryption_key = var.service_encryption_keys.storage
force_destroy = var.data_force_destroy
}
122 changes: 122 additions & 0 deletions blueprints/data-solutions/data-platform-minimal/02-composer.tf
Original file line number Diff line number Diff line change
@@ -0,0 +1,122 @@
# Copyright 2022 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# tfdoc:file:description Cloud Composer resources.

locals {
env_variables = {
BQ_LOCATION = var.location
CURATED_BQ_DATASET = module.cur-bq-0.dataset_id
CURATED_GCS = module.cur-cs-0.url
CURATED_PRJ = module.cur-project.project_id
DP_KMS_KEY = var.service_encryption_keys.compute
DP_REGION = var.region
GCP_REGION = var.region
LAND_PRJ = module.land-project.project_id
LAND_GCS = module.land-cs-0.name
PHS_CLUSTER_NAME = module.processing-dp-historyserver.name
PROCESSING_GCS = module.processing-cs-0.name
PROCESSING_PRJ = module.processing-project.project_id
PROCESSING_SA_DP = module.processing-sa-dp-0.email
PROCESSING_SUBNET = local.processing_subnet
PROCESSING_VPC = local.processing_vpc
}
}

module "processing-sa-cmp-0" {
source = "../../../modules/iam-service-account"
project_id = module.processing-project.project_id
prefix = var.prefix
name = "prc-cmp-0"
display_name = "Data platform Composer service account"
iam = {
"roles/iam.serviceAccountTokenCreator" = [local.groups_iam.data-engineers]
"roles/iam.serviceAccountUser" = [module.processing-sa-cmp-0.iam_email]
}
}

resource "google_composer_environment" "processing-cmp-0" {
count = var.composer_config.disable_deployment == true ? 0 : 1
project = module.processing-project.project_id
name = "${var.prefix}-prc-cmp-0"
region = var.region
config {
software_config {
airflow_config_overrides = try(var.composer_config.software_config.airflow_config_overrides, null)
lcaggio marked this conversation as resolved.
Show resolved Hide resolved
lcaggio marked this conversation as resolved.
Show resolved Hide resolved
pypi_packages = try(var.composer_config.software_config.pypi_packages, null)
env_variables = merge(
try(var.composer_config.software_config.env_variables, null), local.env_variables
)
image_version = var.composer_config.software_config.image_version
}
dynamic "workloads_config" {
lcaggio marked this conversation as resolved.
Show resolved Hide resolved
for_each = (try(var.composer_config.workloads_config, null) != null ? { 1 = 1 } : {})

content {
scheduler {
cpu = var.composer_config.workloads_config.scheduler.cpu
memory_gb = var.composer_config.workloads_config.scheduler.memory_gb
storage_gb = var.composer_config.workloads_config.scheduler.storage_gb
count = var.composer_config.workloads_config.scheduler.count
}
web_server {
cpu = var.composer_config.workloads_config.web_server.cpu
memory_gb = var.composer_config.workloads_config.web_server.memory_gb
storage_gb = var.composer_config.workloads_config.web_server.storage_gb
}
worker {
cpu = var.composer_config.workloads_config.worker.cpu
memory_gb = var.composer_config.workloads_config.worker.memory_gb
storage_gb = var.composer_config.workloads_config.worker.storage_gb
min_count = var.composer_config.workloads_config.worker.min_count
max_count = var.composer_config.workloads_config.worker.max_count
}
}
}

environment_size = var.composer_config.environment_size

node_config {
network = local.processing_vpc
subnetwork = local.processing_subnet
service_account = module.processing-sa-cmp-0.email
enable_ip_masq_agent = true
tags = ["composer-worker"]
ip_allocation_policy {
cluster_secondary_range_name = var.network_config.composer_ip_ranges.pods_range_name
services_secondary_range_name = var.network_config.composer_ip_ranges.services_range_name
}
}
private_environment_config {
enable_private_endpoint = "true"
cloud_sql_ipv4_cidr_block = var.network_config.composer_ip_ranges.cloud_sql
master_ipv4_cidr_block = var.network_config.composer_ip_ranges.gke_master
cloud_composer_connection_subnetwork = var.network_config.composer_ip_ranges.connection_subnetwork
}
dynamic "encryption_config" {
for_each = (
var.service_encryption_keys.composer != null
? { 1 = 1 }
: {}
)
content {
kms_key_name = var.service_encryption_keys.composer
}
}
}
depends_on = [
google_project_iam_member.shared_vpc,
module.processing-project
]
}
139 changes: 139 additions & 0 deletions blueprints/data-solutions/data-platform-minimal/02-dataproc.tf
Original file line number Diff line number Diff line change
@@ -0,0 +1,139 @@
# Copyright 2022 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# tfdoc:file:description Cloud Dataproc resources.

locals {
iam_prc = {
"roles/bigquery.jobUser" = [
module.processing-sa-dp-0.iam_email, local.groups_iam.data-engineers
]
}
processing_dp_subnet = (
local.use_shared_vpc
? var.network_config.subnet_self_links.orchestration
: values(module.processing-vpc.0.subnet_self_links)[0]
lcaggio marked this conversation as resolved.
Show resolved Hide resolved
)
processing_dp_vpc = (
local.use_shared_vpc
? var.network_config.network_self_link
: module.processing-vpc.0.self_link
)
}

module "processing-cs-dp-history" {
source = "../../../modules/gcs"
project_id = module.processing-project.project_id
prefix = var.prefix
name = "prc-cs-dp-history"
location = var.region
storage_class = "REGIONAL"
encryption_key = var.service_encryption_keys.storage
}

module "processing-sa-dp-0" {
source = "../../../modules/iam-service-account"
project_id = module.processing-project.project_id
prefix = var.prefix
name = "prc-dp-0"
display_name = "Dataproc service account"
iam = {
"roles/iam.serviceAccountTokenCreator" = [
local.groups_iam.data-engineers,
module.processing-sa-cmp-0.iam_email
],
"roles/iam.serviceAccountUser" = [
module.processing-sa-cmp-0.iam_email
]
}
}

module "processing-dp-staging-0" {
source = "../../../modules/gcs"
project_id = module.processing-project.project_id
prefix = var.prefix
name = "prc-stg-0"
location = var.location
storage_class = "MULTI_REGIONAL"
encryption_key = var.service_encryption_keys.storage
}

module "processing-dp-temp-0" {
source = "../../../modules/gcs"
project_id = module.processing-project.project_id
prefix = var.prefix
name = "prc-tmp-0"
location = var.location
storage_class = "MULTI_REGIONAL"
encryption_key = var.service_encryption_keys.storage
}

module "processing-dp-log-0" {
source = "../../../modules/gcs"
project_id = module.processing-project.project_id
prefix = var.prefix
name = "prc-log-0"
location = var.location
storage_class = "MULTI_REGIONAL"
encryption_key = var.service_encryption_keys.storage
}

module "processing-dp-historyserver" {
source = "../../../modules/dataproc"
project_id = module.processing-project.project_id
name = "hystory-server"
prefix = var.prefix
region = var.region
dataproc_config = {
cluster_config = {
staging_bucket = module.processing-dp-staging-0.name
temp_bucket = module.processing-dp-temp-0.name
gce_cluster_config = {
subnetwork = module.processing-vpc[0].subnets["${var.region}/${var.prefix}-processing"].self_link
zone = "${var.region}-b"
service_account = module.processing-sa-dp-0.email
service_account_scopes = ["cloud-platform"]
internal_ip_only = true
}
worker_config = {
num_instances = 0
machine_type = null
min_cpu_platform = null
image_uri = null
}
software_config = {
override_properties = {
"dataproc:dataproc.allow.zero.workers" = "true"
"dataproc:job.history.to-gcs.enabled" = "true"
"spark:spark.history.fs.logDirectory" = (
"gs://${module.processing-dp-staging-0.name}/*/spark-job-history"
)
"spark:spark.eventLog.dir" = (
"gs://${module.processing-dp-staging-0.name}/*/spark-job-history"
)
"spark:spark.history.custom.executor.log.url.applyIncompleteApplication" = "false"
"spark:spark.history.custom.executor.log.url" = (
"{{YARN_LOG_SERVER_URL}}/{{NM_HOST}}:{{NM_PORT}}/{{CONTAINER_ID}}/{{CONTAINER_ID}}/{{USER}}/{{FILE_NAME}}"
)
}
}
endpoint_config = {
enable_http_port_access = "true"
}
encryption_config = {
kms_key_name = var.service_encryption_keys.compute
}
}
}
}
Loading