-
Notifications
You must be signed in to change notification settings - Fork 910
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
4 changed files
with
138 additions
and
84 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -14,6 +14,7 @@ This module Manages a Google Cloud [Dataproc](https://cloud.google.com/dataproc) | |
- [Additive IAM](#additive-iam) | ||
- [Variables](#variables) | ||
- [Outputs](#outputs) | ||
- [Fixtures](#fixtures) | ||
<!-- END TOC --> | ||
|
||
## TODO | ||
|
@@ -25,95 +26,169 @@ This module Manages a Google Cloud [Dataproc](https://cloud.google.com/dataproc) | |
### Simple | ||
|
||
```hcl | ||
module "processing-dp-cluster-2" { | ||
module "dataproc-cluster" { | ||
source = "./fabric/modules/dataproc" | ||
project_id = "my-project" | ||
project_id = var.project_id | ||
name = "my-cluster" | ||
region = "europe-west1" | ||
region = var.region | ||
} | ||
# tftest modules=1 resources=1 | ||
``` | ||
|
||
### Cluster configuration on GCE | ||
|
||
To set cluster configuration use the 'dataproc_config.cluster_config' variable. | ||
To set cluster configuration use the 'dataproc_config.cluster_config' variable. If you don't want to use dedicated service account, remember to grant `roles/dataproc.worker` to Compute Default Service Account. | ||
|
||
```hcl | ||
module "dataproc-service-account" { | ||
source = "./fabric/modules/iam-service-account" | ||
project_id = var.project_id | ||
name = "dataproc-worker" | ||
iam_project_roles = { | ||
(var.project_id) = ["roles/dataproc.worker"] | ||
} | ||
} | ||
module "firewall" { | ||
source = "./fabric/modules/net-vpc-firewall" | ||
project_id = var.project_id | ||
network = var.vpc.name | ||
ingress_rules = { | ||
allow-ingress-dataproc = { | ||
description = "Allow all traffic between Dataproc nodes." | ||
targets = ["dataproc"] | ||
sources = ["dataproc"] | ||
} | ||
} | ||
} | ||
module "processing-dp-cluster" { | ||
source = "./fabric/modules/dataproc" | ||
project_id = "my-project" | ||
project_id = var.project_id | ||
name = "my-cluster" | ||
region = "europe-west1" | ||
prefix = "prefix" | ||
region = var.region | ||
dataproc_config = { | ||
cluster_config = { | ||
gce_cluster_config = { | ||
subnetwork = "https://www.googleapis.com/compute/v1/projects/PROJECT/regions/europe-west1/subnetworks/SUBNET" | ||
zone = "europe-west1-b" | ||
service_account = "" | ||
service_account_scopes = ["cloud-platform"] | ||
internal_ip_only = true | ||
service_account = module.dataproc-service-account.email | ||
service_account_scopes = ["cloud-platform"] | ||
subnetwork = var.subnet.self_link | ||
tags = ["dataproc"] | ||
zone = "${var.region}-b" | ||
} | ||
} | ||
} | ||
depends_on = [ | ||
module.dataproc-service-account, # ensure all grants are done before creating the cluster | ||
] | ||
} | ||
# tftest modules=1 resources=1 | ||
# tftest modules=3 resources=7 | ||
``` | ||
|
||
### Cluster configuration on GCE with CMEK encryption | ||
|
||
To set cluster configuration use the Customer Managed Encryption key, set `dataproc_config.encryption_config.` variable. The Compute Engine service agent and the Cloud Storage service agent need to have `CryptoKey Encrypter/Decrypter` role on they configured KMS key ([Documentation](https://cloud.google.com/dataproc/docs/concepts/configuring-clusters/customer-managed-encryption)). | ||
|
||
```hcl | ||
module "dataproc-service-account" { | ||
source = "./fabric/modules/iam-service-account" | ||
project_id = var.project_id | ||
name = "dataproc-worker" | ||
iam_project_roles = { | ||
(var.project_id) = ["roles/dataproc.worker", "roles/cloudkms.cryptoKeyEncrypterDecrypter"] | ||
} | ||
} | ||
module "firewall" { | ||
source = "./fabric/modules/net-vpc-firewall" | ||
project_id = var.project_id | ||
network = var.vpc.name | ||
ingress_rules = { | ||
allow-ingress-dataproc = { | ||
description = "Allow all traffic between Dataproc nodes." | ||
targets = ["dataproc"] | ||
sources = ["dataproc"] | ||
} | ||
} | ||
} | ||
module "processing-dp-cluster" { | ||
source = "./fabric/modules/dataproc" | ||
project_id = "my-project" | ||
project_id = var.project_id | ||
name = "my-cluster" | ||
region = "europe-west1" | ||
prefix = "prefix" | ||
region = var.region | ||
dataproc_config = { | ||
cluster_config = { | ||
gce_cluster_config = { | ||
subnetwork = "https://www.googleapis.com/compute/v1/projects/PROJECT/regions/europe-west1/subnetworks/SUBNET" | ||
zone = "europe-west1-b" | ||
service_account = "" | ||
service_account_scopes = ["cloud-platform"] | ||
internal_ip_only = true | ||
service_account = module.dataproc-service-account.email | ||
service_account_scopes = ["cloud-platform"] | ||
subnetwork = var.subnet.self_link | ||
tags = ["dataproc"] | ||
zone = "${var.region}-b" | ||
} | ||
} | ||
encryption_config = { | ||
kms_key_name = "projects/project-id/locations/region/keyRings/key-ring-name/cryptoKeys/key-name" | ||
kms_key_name = var.kms_key.id | ||
} | ||
} | ||
depends_on = [ | ||
module.dataproc-service-account, # ensure all grants are done before creating the cluster | ||
] | ||
} | ||
# tftest modules=1 resources=1 | ||
# tftest modules=3 resources=8 | ||
``` | ||
|
||
### Cluster configuration on GKE | ||
|
||
To set cluster configuration GKE use the 'dataproc_config.virtual_cluster_config' variable. | ||
To set cluster configuration GKE use the 'dataproc_config.virtual_cluster_config' variable. This example shows usage of [dedicated Service Account](https://cloud.google.com/dataproc/docs/guides/dpgke/dataproc-gke-iam#custom_iam_configuration). | ||
|
||
```hcl | ||
locals { | ||
dataproc_namespace = "foobar" | ||
} | ||
module "dataproc-service-account" { | ||
source = "./fabric/modules/iam-service-account" | ||
project_id = var.project_id | ||
name = "dataproc-worker" | ||
iam = { | ||
"roles/iam.workloadIdentityUser" = [ | ||
"serviceAccount:${var.project_id}.svc.id.goog[${local.dataproc_namespace}/agent]", | ||
"serviceAccount:${var.project_id}.svc.id.goog[${local.dataproc_namespace}/spark-driver]", | ||
"serviceAccount:${var.project_id}.svc.id.goog[${local.dataproc_namespace}/spark-executor]" | ||
] | ||
} | ||
iam_project_roles = { | ||
(var.project_id) = ["roles/dataproc.worker"] | ||
} | ||
depends_on = [ | ||
module.gke-cluster-standard, # granting workloadIdentityUser requires cluster/pool to be created first | ||
] | ||
} | ||
module "processing-dp-cluster" { | ||
source = "./fabric/modules/dataproc" | ||
project_id = "my-project" | ||
name = "my-gke-cluster" | ||
region = "europe-west1" | ||
prefix = "prefix" | ||
project_id = var.project_id | ||
name = "my-dataproc-cluster" | ||
region = var.region | ||
dataproc_config = { | ||
virtual_cluster_config = { | ||
kubernetes_cluster_config = { | ||
kubernetes_namespace = "foobar" | ||
kubernetes_namespace = local.dataproc_namespace | ||
kubernetes_software_config = { | ||
component_version = { | ||
"SPARK" : "3.1-dataproc-7" | ||
"SPARK" : "3.1-dataproc-14" | ||
} | ||
properties = { | ||
"spark:spark.kubernetes.container.image" : "us-east4-docker.pkg.dev/cloud-dataproc/dpgke/sparkengine:dataproc-14" | ||
"dataproc:dataproc.gke.agent.google-service-account" = module.dataproc-service-account.email | ||
"dataproc:dataproc.gke.spark.driver.google-service-account" = module.dataproc-service-account.email | ||
"dataproc:dataproc.gke.spark.executor.google-service-account" = module.dataproc-service-account.email | ||
} | ||
} | ||
gke_cluster_config = { | ||
gke_cluster_target = "projects/my-project/locations/my-location/clusters/gke-cluster-name" | ||
gke_cluster_target = module.gke-cluster-standard.id | ||
node_pool_target = { | ||
node_pool = "node-pool-name" | ||
roles = ["DEFAULT"] | ||
|
@@ -123,7 +198,7 @@ module "processing-dp-cluster" { | |
} | ||
} | ||
} | ||
# tftest modules=1 resources=1 | ||
# tftest modules=4 resources=6 fixtures=fixtures/gke-cluster-standard.tf e2e | ||
``` | ||
|
||
## IAM | ||
|
@@ -143,10 +218,9 @@ Refer to the [project module](../project/README.md#iam) for examples of the IAM | |
```hcl | ||
module "processing-dp-cluster" { | ||
source = "./fabric/modules/dataproc" | ||
project_id = "my-project" | ||
project_id = var.project_id | ||
name = "my-cluster" | ||
region = "europe-west1" | ||
prefix = "prefix" | ||
region = var.region | ||
iam_by_principals = { | ||
"group:[email protected]" = [ | ||
"roles/dataproc.viewer" | ||
|
@@ -166,10 +240,9 @@ module "processing-dp-cluster" { | |
```hcl | ||
module "processing-dp-cluster" { | ||
source = "./fabric/modules/dataproc" | ||
project_id = "my-project" | ||
project_id = var.project_id | ||
name = "my-cluster" | ||
region = "europe-west1" | ||
prefix = "prefix" | ||
region = var.region | ||
iam_bindings_additive = { | ||
am1-viewer = { | ||
member = "user:[email protected]" | ||
|
@@ -185,24 +258,23 @@ module "processing-dp-cluster" { | |
| name | description | type | required | default | | ||
|---|---|:---:|:---:|:---:| | ||
| [name](variables.tf#L191) | Cluster name. | <code>string</code> | ✓ | | | ||
| [project_id](variables.tf#L206) | Project ID. | <code>string</code> | ✓ | | | ||
| [region](variables.tf#L211) | Dataproc region. | <code>string</code> | ✓ | | | ||
| [project_id](variables.tf#L196) | Project ID. | <code>string</code> | ✓ | | | ||
| [region](variables.tf#L201) | Dataproc region. | <code>string</code> | ✓ | | | ||
| [dataproc_config](variables.tf#L17) | Dataproc cluster config. | <code title="object({ graceful_decommission_timeout = optional(string) cluster_config = optional(object({ staging_bucket = optional(string) temp_bucket = optional(string) gce_cluster_config = optional(object({ zone = optional(string) network = optional(string) subnetwork = optional(string) service_account = optional(string) service_account_scopes = optional(list(string)) tags = optional(list(string), []) internal_ip_only = optional(bool) metadata = optional(map(string), {}) reservation_affinity = optional(object({ consume_reservation_type = string key = string values = string })) node_group_affinity = optional(object({ node_group_uri = string })) shielded_instance_config = optional(object({ enable_secure_boot = bool enable_vtpm = bool enable_integrity_monitoring = bool })) })) master_config = optional(object({ num_instances = number machine_type = string min_cpu_platform = string image_uri = string disk_config = optional(object({ boot_disk_type = string boot_disk_size_gb = number num_local_ssds = number })) accelerators = optional(object({ accelerator_type = string accelerator_count = number })) })) worker_config = optional(object({ num_instances = number machine_type = string min_cpu_platform = string disk_config = optional(object({ boot_disk_type = string boot_disk_size_gb = number num_local_ssds = number })) image_uri = string accelerators = optional(object({ accelerator_type = string accelerator_count = number })) })) preemptible_worker_config = optional(object({ num_instances = number preemptibility = string disk_config = optional(object({ boot_disk_type = string boot_disk_size_gb = number num_local_ssds = number })) })) software_config = optional(object({ image_version = optional(string) override_properties = map(string) optional_components = optional(list(string)) })) security_config = optional(object({ kerberos_config = object({ cross_realm_trust_admin_server = optional(string) cross_realm_trust_kdc = optional(string) cross_realm_trust_realm = optional(string) cross_realm_trust_shared_password_uri = optional(string) enable_kerberos = optional(string) kdc_db_key_uri = optional(string) key_password_uri = optional(string) keystore_uri = optional(string) keystore_password_uri = optional(string) kms_key_uri = string realm = optional(string) root_principal_password_uri = string tgt_lifetime_hours = optional(string) truststore_password_uri = optional(string) truststore_uri = optional(string) }) })) autoscaling_config = optional(object({ policy_uri = string })) initialization_action = optional(object({ script = string timeout_sec = optional(string) })) encryption_config = optional(object({ kms_key_name = string })) lifecycle_config = optional(object({ idle_delete_ttl = optional(string) auto_delete_time = optional(string) })) endpoint_config = optional(object({ enable_http_port_access = string })) dataproc_metric_config = optional(object({ metrics = list(object({ metric_source = string metric_overrides = optional(list(string)) })) })) metastore_config = optional(object({ dataproc_metastore_service = string })) })) virtual_cluster_config = optional(object({ staging_bucket = optional(string) auxiliary_services_config = optional(object({ metastore_config = optional(object({ dataproc_metastore_service = string })) spark_history_server_config = optional(object({ dataproc_cluster = string })) })) kubernetes_cluster_config = object({ kubernetes_namespace = optional(string) kubernetes_software_config = object({ component_version = map(string) properties = optional(map(string)) }) gke_cluster_config = object({ gke_cluster_target = optional(string) node_pool_target = optional(object({ node_pool = string roles = list(string) node_pool_config = optional(object({ autoscaling = optional(object({ min_node_count = optional(number) max_node_count = optional(number) })) config = object({ machine_type = optional(string) preemptible = optional(bool) local_ssd_count = optional(number) min_cpu_platform = optional(string) spot = optional(bool) }) locations = optional(list(string)) })) })) }) }) })) })">object({…})</code> | | <code>{}</code> | | ||
| [iam](variables-iam.tf#L24) | IAM bindings in {ROLE => [MEMBERS]} format. | <code>map(list(string))</code> | | <code>{}</code> | | ||
| [iam_bindings](variables-iam.tf#L31) | Authoritative IAM bindings in {KEY => {role = ROLE, members = [], condition = {}}}. Keys are arbitrary. | <code title="map(object({ members = list(string) role = string condition = optional(object({ expression = string title = string description = optional(string) })) }))">map(object({…}))</code> | | <code>{}</code> | | ||
| [iam_bindings_additive](variables-iam.tf#L46) | Individual additive IAM bindings. Keys are arbitrary. | <code title="map(object({ member = string role = string condition = optional(object({ expression = string title = string description = optional(string) })) }))">map(object({…}))</code> | | <code>{}</code> | | ||
| [iam_by_principals](variables-iam.tf#L17) | Authoritative IAM binding in {PRINCIPAL => [ROLES]} format. Principals need to be statically defined to avoid cycle errors. Merged internally with the `iam` variable. | <code>map(list(string))</code> | | <code>{}</code> | | ||
| [labels](variables.tf#L185) | The resource labels for instance to use to annotate any related underlying resources, such as Compute Engine VMs. | <code>map(string)</code> | | <code>{}</code> | | ||
| [prefix](variables.tf#L196) | Optional prefix used to generate project id and name. | <code>string</code> | | <code>null</code> | | ||
| [service_account](variables.tf#L216) | Service account to set on the Dataproc cluster. | <code>string</code> | | <code>null</code> | | ||
|
||
## Outputs | ||
|
||
| name | description | sensitive | | ||
|---|---|:---:| | ||
| [bucket_names](outputs.tf#L19) | List of bucket names which have been assigned to the cluster. | | | ||
| [http_ports](outputs.tf#L24) | The map of port descriptions to URLs. | | | ||
| [id](outputs.tf#L29) | Fully qualified cluster id. | | | ||
| [instance_names](outputs.tf#L34) | List of instance names which have been assigned to the cluster. | | | ||
| [name](outputs.tf#L43) | The name of the cluster. | | | ||
| [id](outputs.tf#L30) | Fully qualified cluster id. | | | ||
| [name](outputs.tf#L45) | The name of the cluster. | | | ||
|
||
## Fixtures | ||
|
||
- [gke-cluster-standard.tf](../../tests/fixtures/gke-cluster-standard.tf) | ||
<!-- END TFDOC --> |
Oops, something went wrong.