diff --git a/docs/gke/correctness-test.md b/docs/gke/correctness-test.md index c2a43f4efe3..b8582878f52 100644 --- a/docs/gke/correctness-test.md +++ b/docs/gke/correctness-test.md @@ -64,177 +64,13 @@ from the Kustomization directory: kubectl apply -k src/main/k8s/dev/kingdom ``` -## Configure event data source - -There are two data sources that can be used for -[test events](../../src/main/proto/wfa/measurement/api/v2alpha/event_templates/testing/test_event.proto): - -1. Synthetic generator - - Events are generated according to - [simulator synthetic data specifications](../../src/main/proto/wfa/measurement/api/v2alpha/event_group_metadata/testing/simulator_synthetic_data_spec.proto), - consisting of a single `SyntheticPopulationSpec` and a - `SyntheticEventGroupSpec` for each `EventGroup`. There are default - specifications included, but you can replace these with your own after - before you apply the K8s Kustomization. - -2. BigQuery table - - Events are read from a Google Cloud BigQuery table. See the section below on - how to populate the table. - -### Populate BigQuery table - -The BigQuery table can be populated with synthetic event data generated using -the -[`uk-pilot-synthetic-data-gen` script](https://github.com/world-federation-of-advertisers/uk-pilot-synthetic-data-gen). - -The `dev` configuration expects a table named `labelled_events` in a dataset -named `demo` in the `us-central1` region. The table can be created in the -[Google Cloud Console](https://console.cloud.google.com/bigquery), specifying -the generated CSV file with automatic schema detection. - -![image-step-4-1](step-4-1.png)![image-step-4-1](step-4-2.png) - -You will need to ensure that the simulator service account has access to this -table. See -[Granting BigQuery table access](cluster-config.md#granting-bigquery-table-access). - ## Deploy EDP simulators -The correctness test assumes that you have six Event Data Provider (EDP) -simulators running, each acting as a different fake `DataProvider`. - -### Initial setup - -1. Create a K8s cluster - - The simulators can run in their own cluster. You can use the Google Cloud - SDK to create a new one, substituting your own - [Use least privilege service account](https://cloud.google.com/kubernetes-engine/docs/how-to/hardening-your-cluster#use_least_privilege_sa) - address: - - ```shell - gcloud container clusters create simulators \ - --service-account="gke-cluster@halo-cmm-demo.iam.gserviceaccount.com" \ - --num-nodes=4 --enable-autoscaling --min-nodes=4 --max-nodes=8 \ - --machine-type=e2-small - ``` - - Point your KUBECONFIG to this cluster: - - ```shell - gcloud container clusters get-credentials simulators - ``` - -1. Create a `simulator` K8s service account - - The underlying IAM service account must be able to create BigQuery jobs and - access the `labelled_events` BigQuery table. See the - [configuration guide](cluster-config.md#workload-identity) for details. - -### Build and push simulator image - -If you aren't using pre-built release images, you can build the image yourself -from source and push them to a container registry. For example, if you're using -the [Google Container Registry](https://cloud.google.com/container-registry), -you would specify `gcr.io` as your container registry and your Cloud project -name as your image repository prefix. - -The build target to use depends on the event data source. Assuming a project -named `halo-cmm-demo` and an image tag `build-0001`, run the following to build -and push the image: - -* Synthetic generator - - ```shell - bazel run -c opt //src/main/docker:push_synthetic_generator_edp_simulator_runner_image \ - --define container_registry=gcr.io \ - --define image_repo_prefix=halo-cmm-demo --define image_tag=build-0001 - ``` - -* BigQuery - - ```shell - bazel run -c opt //src/main/docker:push_bigquery_edp_simulator_runner_image \ - --define container_registry=gcr.io \ - --define image_repo_prefix=halo-cmm-demo --define image_tag=build-0001 - ``` - -### Generate K8s Kustomization - -Run the following, substituting your own values: - -* Synthetic generator - - ```shell - bazel build //src/main/k8s/dev:synthetic_generator_edp_simulators.tar \ - --define=kingdom_public_api_target=v2alpha.kingdom.dev.halo-cmm.org:8443 \ - --define=duchy_public_api_target=public.worker1.dev.halo-cmm.org:8443 \ - --define=mc_name=measurementConsumers/TGWOaWehLQ8 \ - --define=edp1_name=dataProviders/HRL1wWehTSM \ - --define=edp1_cert_name=dataProviders/HRL1wWehTSM/certificates/HRL1wWehTSM \ - --define=edp2_name=dataProviders/djQdz2ehSSE \ - --define=edp2_cert_name=dataProviders/djQdz2ehSSE/certificates/djQdz2ehSSE \ - --define=edp3_name=dataProviders/SQ99TmehSA8 \ - --define=edp3_cert_name=dataProviders/SQ99TmehSA8/certificates/SQ99TmehSA8 \ - --define=edp4_name=dataProviders/TBZkB5heuL0 \ - --define=edp4_cert_name=dataProviders/TBZkB5heuL0/certificates/TBZkB5heuL0 \ - --define=edp5_name=dataProviders/HOCBxZheuS8 \ - --define=edp5_cert_name=dataProviders/HOCBxZheuS8/certificates/HOCBxZheuS8 \ - --define=edp6_name=dataProviders/VGExFmehRhY \ - --define=edp6_cert_name=dataProviders/VGExFmehRhY/certificates/VGExFmehRhY \ - --define container_registry=gcr.io \ - --define image_repo_prefix=halo-cmm-demo --define image_tag=build-0001 - ``` - - The resulting archive will contain `SyntheticEventGroupSpec` messages in - text format under `src/main/k8s/dev/synthetic_generator_config_files/`. - These can be replaced in order to customize the synthetic generator. - -* BigQuery - - ```shell - bazel build //src/main/k8s/dev:bigquery_edp_simulators.tar \ - --define=kingdom_public_api_target=v2alpha.kingdom.dev.halo-cmm.org:8443 \ - --define=duchy_public_api_target=public.worker1.dev.halo-cmm.org:8443 \ - --define=mc_name=measurementConsumers/TGWOaWehLQ8 \ - --define=edp1_name=dataProviders/HRL1wWehTSM \ - --define=edp1_cert_name=dataProviders/HRL1wWehTSM/certificates/HRL1wWehTSM \ - --define=edp2_name=dataProviders/djQdz2ehSSE \ - --define=edp2_cert_name=dataProviders/djQdz2ehSSE/certificates/djQdz2ehSSE \ - --define=edp3_name=dataProviders/SQ99TmehSA8 \ - --define=edp3_cert_name=dataProviders/SQ99TmehSA8/certificates/SQ99TmehSA8 \ - --define=edp4_name=dataProviders/TBZkB5heuL0 \ - --define=edp4_cert_name=dataProviders/TBZkB5heuL0/certificates/TBZkB5heuL0 \ - --define=edp5_name=dataProviders/HOCBxZheuS8 \ - --define=edp5_cert_name=dataProviders/HOCBxZheuS8/certificates/HOCBxZheuS8 \ - --define=edp6_name=dataProviders/VGExFmehRhY \ - --define=edp6_cert_name=dataProviders/VGExFmehRhY/certificates/VGExFmehRhY \ - --define container_registry=gcr.io \ - --define=google_cloud_project=halo-cmm-demo \ - --define=bigquery_dataset=demo \ - --define=bigquery_table=labelled_events \ - --define image_repo_prefix=halo-cmm-demo --define image_tag=build-0001 - ``` - -Extract the generated archive to some directory. - -### Apply K8s Kustomization - -From the Kustomization directory, run - -* Synthetic generator - - ```shell - kubectl apply -k src/main/k8s/dev/synthetic_generator_edp_simulators - ``` - -* BigQuery - - ```shell - kubectl apply -k src/main/k8s/dev/bigquery_edp_simulators - ``` +See the [simulator deployment guide](simulator-deployment.md). The test assumes +that there are valid events in the range `[2021-03-15, 2021-03-17]`. The +synthetic generator variant assumes that the event message type is +`wfa.measurement.api.v2alpha.event_templates.testing.TestEvent`, and the +BigQuery variant assumes the event message type is `halo_cmm.uk.pilot.Event`. ## Run the correctness test @@ -263,8 +99,9 @@ Run the following, substituting your own values: --define=bigquery_table=labelled_events ``` -The test generally takes around 6 minutes to complete, since that is how long -the MPC protocol takes to finish. Eventually, you should see logs like this +The time the test takes depends on the size of the data set. With the default +synthetic generator configuration, this is about an hour. Eventually, you should +see logs like this: ``` Jan 27, 2022 12:47:01 AM org.wfanet.measurement.loadtest.frontend.FrontendSimulator process diff --git a/docs/gke/simulator-deployment.md b/docs/gke/simulator-deployment.md new file mode 100644 index 00000000000..bb7c9d2bd6e --- /dev/null +++ b/docs/gke/simulator-deployment.md @@ -0,0 +1,187 @@ +# Deploying EDP simulators on GKE + +The event data provider (EDP) simulator can be used to simulate a `DataProvider` +that fulfills event `Requisition`s. + +## Background + +The configuration for the [`dev` environment](../../src/main/k8s/dev) can be +used as the basis for deploying CMMS components using Google Kubernetes Engine +(GKE) on another Google Cloud project. + +## Before You Start + +See [Machine Setup](machine-setup.md). + +## Configure event data source + +There are two data sources that can be used: + +1. Synthetic generator + + Events are generated according to + [simulator synthetic data specifications](../../src/main/proto/wfa/measurement/api/v2alpha/event_group_metadata/testing/simulator_synthetic_data_spec.proto), + consisting of a single `SyntheticPopulationSpec` and a + `SyntheticEventGroupSpec` for each `EventGroup`. There are default + specifications included, but you can replace these with your own after + before you apply the K8s Kustomization. + + This data source supports any event message type. + +2. BigQuery table + + Events are read from a Google Cloud BigQuery table. See the section below on + how to populate the table. + + This data source currently only supports the `halo_cmm.uk.pilot.Event` + message type. + +### Populate BigQuery table + +The BigQuery table schema has the following columns: + +* `date` +* Type: `DATE` +* `publisher_id` +* Type: `INTEGER` +* `vid` +* Type: `INTEGER` +* `digital_video_completion_status` +* Type: `STRING` +* Values: +* `0% - 25%` +* `25% - 50%` +* `50% - 75%` +* `75% - 100%` +* `100%` +* `viewability` +* Type: `STRING` +* Values: +* `viewable_0_percent_to_50_percent` +* `viewable_50_percent_to_100_percent` +* `viewable_100_percent` + +The `dev` configuration expects a table named `labelled_events` in a dataset +named `demo` in the `us-central1` region. The table can be created in the +[Google Cloud Console](https://console.cloud.google.com/bigquery), specifying a +CSV file with automatic schema detection. + +The +[`uk-pilot-synthetic-data-gen` script](https://github.com/world-federation-of-advertisers/uk-pilot-synthetic-data-gen) +may be helpful in generating a CSV file with test events. + +## Provision Google Cloud Project infrastructure + +This can be done using Terraform. See [the guide](terraform.md) to use the +example configuration for the simulators. + +Applying the Terraform configuration will create a new cluster. You can use the +`gcloud` CLI to obtain credentials so that you can access the cluster via +`kubectl`. For example: + +```shell +gcloud container clusters get-credentials simulators +``` + +## Build and push container image (optional) + +If you aren't using pre-built release images, you can build the image yourself +from source and push them to a container registry. For example, if you're using +the [Google Container Registry](https://cloud.google.com/container-registry), +you would specify `gcr.io` as your container registry and your Cloud project +name as your image repository prefix. + +The build target to use depends on the event data source. Assuming a project +named `halo-cmm-demo` and an image tag `build-0001`, run the following to build +and push the image: + +* Synthetic generator + + ```shell + bazel run -c opt //src/main/docker:push_synthetic_generator_edp_simulator_runner_image \ + --define container_registry=gcr.io \ + --define image_repo_prefix=halo-cmm-demo --define image_tag=build-0001 + ``` + +* BigQuery + + ```shell + bazel run -c opt //src/main/docker:push_bigquery_edp_simulator_runner_image \ + --define container_registry=gcr.io \ + --define image_repo_prefix=halo-cmm-demo --define image_tag=build-0001 + ``` + +## Generate K8s Kustomization + +Run the following, substituting your own values: + +* Synthetic generator + + ```shell + bazel build //src/main/k8s/dev:synthetic_generator_edp_simulators.tar \ + --define=kingdom_public_api_target=v2alpha.kingdom.dev.halo-cmm.org:8443 \ + --define=duchy_public_api_target=public.worker1.dev.halo-cmm.org:8443 \ + --define=mc_name=measurementConsumers/TGWOaWehLQ8 \ + --define=edp1_name=dataProviders/HRL1wWehTSM \ + --define=edp1_cert_name=dataProviders/HRL1wWehTSM/certificates/HRL1wWehTSM \ + --define=edp2_name=dataProviders/djQdz2ehSSE \ + --define=edp2_cert_name=dataProviders/djQdz2ehSSE/certificates/djQdz2ehSSE \ + --define=edp3_name=dataProviders/SQ99TmehSA8 \ + --define=edp3_cert_name=dataProviders/SQ99TmehSA8/certificates/SQ99TmehSA8 \ + --define=edp4_name=dataProviders/TBZkB5heuL0 \ + --define=edp4_cert_name=dataProviders/TBZkB5heuL0/certificates/TBZkB5heuL0 \ + --define=edp5_name=dataProviders/HOCBxZheuS8 \ + --define=edp5_cert_name=dataProviders/HOCBxZheuS8/certificates/HOCBxZheuS8 \ + --define=edp6_name=dataProviders/VGExFmehRhY \ + --define=edp6_cert_name=dataProviders/VGExFmehRhY/certificates/VGExFmehRhY \ + --define container_registry=gcr.io \ + --define image_repo_prefix=halo-cmm-demo --define image_tag=build-0001 + ``` + + The resulting archive will contain `SyntheticEventGroupSpec` messages in + text format under `src/main/k8s/dev/synthetic_generator_config_files/`. + These can be replaced in order to customize the synthetic generator. + +* BigQuery + + ```shell + bazel build //src/main/k8s/dev:bigquery_edp_simulators.tar \ + --define=kingdom_public_api_target=v2alpha.kingdom.dev.halo-cmm.org:8443 \ + --define=duchy_public_api_target=public.worker1.dev.halo-cmm.org:8443 \ + --define=mc_name=measurementConsumers/TGWOaWehLQ8 \ + --define=edp1_name=dataProviders/HRL1wWehTSM \ + --define=edp1_cert_name=dataProviders/HRL1wWehTSM/certificates/HRL1wWehTSM \ + --define=edp2_name=dataProviders/djQdz2ehSSE \ + --define=edp2_cert_name=dataProviders/djQdz2ehSSE/certificates/djQdz2ehSSE \ + --define=edp3_name=dataProviders/SQ99TmehSA8 \ + --define=edp3_cert_name=dataProviders/SQ99TmehSA8/certificates/SQ99TmehSA8 \ + --define=edp4_name=dataProviders/TBZkB5heuL0 \ + --define=edp4_cert_name=dataProviders/TBZkB5heuL0/certificates/TBZkB5heuL0 \ + --define=edp5_name=dataProviders/HOCBxZheuS8 \ + --define=edp5_cert_name=dataProviders/HOCBxZheuS8/certificates/HOCBxZheuS8 \ + --define=edp6_name=dataProviders/VGExFmehRhY \ + --define=edp6_cert_name=dataProviders/VGExFmehRhY/certificates/VGExFmehRhY \ + --define container_registry=gcr.io \ + --define=google_cloud_project=halo-cmm-demo \ + --define=bigquery_dataset=demo \ + --define=bigquery_table=labelled_events \ + --define image_repo_prefix=halo-cmm-demo --define image_tag=build-0001 + ``` + +Extract the generated archive to some directory. + +## Apply K8s Kustomization + +From the Kustomization directory, run + +* Synthetic generator + + ```shell + kubectl apply -k src/main/k8s/dev/synthetic_generator_edp_simulators + ``` + +* BigQuery + + ```shell + kubectl apply -k src/main/k8s/dev/bigquery_edp_simulators + ``` diff --git a/src/main/k8s/dev/README.md b/src/main/k8s/dev/README.md index 95b7e3823c6..03e50b624da 100644 --- a/src/main/k8s/dev/README.md +++ b/src/main/k8s/dev/README.md @@ -1,5 +1,4 @@ -# Dev Environment Configuration +# Dev Configuration -Configuration for the `dev` environment in the `halo-cmm-dev` Google Cloud project. - -This configuration can be adapted and used as the basis for other Google Cloud projects. +Configuration for Halo development and testing environments. This configuration +can be adapted and used as the basis for other environments. diff --git a/src/main/terraform/gcloud/examples/simulators/README.md b/src/main/terraform/gcloud/examples/simulators/README.md new file mode 100644 index 00000000000..572c8676eeb --- /dev/null +++ b/src/main/terraform/gcloud/examples/simulators/README.md @@ -0,0 +1,19 @@ +# EDP Simulators + +This illustrates how to configure the infrastructure for EDP simulators in a Google Cloud Project. + +## Resources + +* [Common resources](../../modules/common) +* IAM service account + * IAM membership for Kubernetes service account to impersonate + * Optional IAM membership for BigQuery table access. +* GKE [cluster](../../modules/cluster) with application-level secret encryption in the specified location + * Default node pool + * Spot VM node pool + +## Preconditions + +* The Google Cloud Project has APIs enabled for the above resources. +* The account running Terraform has permissions to manage the above resources. +* [Default values](https://registry.terraform.io/providers/hashicorp/google/latest/docs/guides/provider_reference#provider-default-values-configuration) are specified in the environment the `google` provider. diff --git a/src/main/terraform/gcloud/examples/simulators/main.tf b/src/main/terraform/gcloud/examples/simulators/main.tf new file mode 100644 index 00000000000..8fd36825d3a --- /dev/null +++ b/src/main/terraform/gcloud/examples/simulators/main.tf @@ -0,0 +1,56 @@ +provider "google" {} + +data "google_client_config" "default" {} + +locals { + cluster_location = var.cluster_location == null ? data.google_client_config.default.zone : var.cluster_location + key_ring_location = var.key_ring_location == null ? data.google_client_config.default.region : var.key_ring_location +} + +module "common" { + source = "../../modules/common" + + key_ring_name = var.key_ring_name + key_ring_location = local.key_ring_location +} + +module "simulators_cluster" { + source = "../../modules/cluster" + + name = var.cluster_name + location = local.cluster_location + secret_key = module.common.cluster_secret_key +} + +data "google_container_cluster" "simulators" { + name = var.cluster_name + location = local.cluster_location + + # Defer reading of cluster resource until it exists. + depends_on = [module.simulators_cluster] +} + +module "simulators_default_node_pool" { + source = "../../modules/node-pool" + + name = "default" + cluster = data.google_container_cluster.simulators + service_account = module.common.cluster_service_account + machine_type = "e2-standard-2" + max_node_count = 2 +} + +module "simulators_spot_node_pool" { + source = "../../modules/node-pool" + + name = "spot" + cluster = data.google_container_cluster.simulators + service_account = module.common.cluster_service_account + machine_type = "c2-standard-4" + max_node_count = 3 + spot = true +} + +module "simulators" { + source = "../../modules/simulators" +} diff --git a/src/main/terraform/gcloud/examples/simulators/variables.tf b/src/main/terraform/gcloud/examples/simulators/variables.tf new file mode 100644 index 00000000000..ba13f03cf9e --- /dev/null +++ b/src/main/terraform/gcloud/examples/simulators/variables.tf @@ -0,0 +1,49 @@ +# Copyright 2023 The Cross-Media Measurement Authors +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +variable "cluster_name" { + description = "Name of the cluster." + type = string + default = "simulators" + nullable = false +} + +variable "cluster_location" { + description = "Location of Kubernetes clusters. Defaults to provider zone." + type = string + default = null +} + +variable "key_ring_name" { + description = "Name of the KMS key ring." + type = string + default = "halo-cmms" + nullable = false +} + +variable "key_ring_location" { + description = "Location of the KMS key ring. Defaults to provider region." + type = string + default = null +} + +variable "bigquery_table" { + description = "`google_bigquery_table` containing labeled test events." + type = object({ + dataset_id = string + id = string + }) + nullable = true + default = null +} diff --git a/src/main/terraform/gcloud/examples/simulators/versions.tf b/src/main/terraform/gcloud/examples/simulators/versions.tf new file mode 100644 index 00000000000..070240678eb --- /dev/null +++ b/src/main/terraform/gcloud/examples/simulators/versions.tf @@ -0,0 +1,22 @@ +# Copyright 2023 The Cross-Media Measurement Authors +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +terraform { + required_providers { + google = { + source = "hashicorp/google" + version = "~> 5.4.0" + } + } +} diff --git a/src/test/kotlin/org/wfanet/measurement/integration/k8s/BigQueryCorrectnessTest.kt b/src/test/kotlin/org/wfanet/measurement/integration/k8s/BigQueryCorrectnessTest.kt index 4a6884f412e..3ecefbcf558 100644 --- a/src/test/kotlin/org/wfanet/measurement/integration/k8s/BigQueryCorrectnessTest.kt +++ b/src/test/kotlin/org/wfanet/measurement/integration/k8s/BigQueryCorrectnessTest.kt @@ -113,6 +113,7 @@ class BigQueryCorrectnessTest : AbstractCorrectnessTest(measurementSystem) { MEASUREMENT_CONSUMER_SIGNING_CERTS.trustedCertificates, eventQuery, ProtocolConfig.NoiseMechanism.CONTINUOUS_GAUSSIAN, + filterExpression = FILTER_EXPRESSION, ) } @@ -124,6 +125,7 @@ class BigQueryCorrectnessTest : AbstractCorrectnessTest(measurementSystem) { } companion object { + private const val FILTER_EXPRESSION = "video.completed_25_percent_plus == true" private val RPC_DEADLINE_DURATION = Duration.ofSeconds(30) private val CONFIG_PATH = Paths.get("src", "test", "kotlin", "org", "wfanet", "measurement", "integration", "k8s")