Skip to content

Commit

Permalink
refactor!: Delete BigQuery and CSV EDP simulator variants (#1884)
Browse files Browse the repository at this point in the history
Closes #1881
  • Loading branch information
SanjayVas authored Oct 29, 2024
1 parent f388774 commit c65c61f
Show file tree
Hide file tree
Showing 29 changed files with 54 additions and 1,329 deletions.
2 changes: 0 additions & 2 deletions .github/workflows/build-test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -80,8 +80,6 @@ jobs:
build --define edp6_name=dataProviders/foo6
build --define edp6_cert_name=dataProviders/foo6/certificates/bar6
build --define google_cloud_project=example-project
build --define bigquery_dataset=example-dataset
build --define bigquery_table=events
EOF
- name: Check lockfile
Expand Down
1 change: 0 additions & 1 deletion .github/workflows/scan-images.yml
Original file line number Diff line number Diff line change
Expand Up @@ -75,7 +75,6 @@ jobs:
- panel-exchange/gcloud-example-daemon
- panel-exchange/aws-example-daemon
- simulator/synthetic-generator-edp
- simulator/bigquery-edp
- reporting/v2/postgres-internal-server
- duchy/postgres-update-schema
- duchy/gcloud-postgres-update-schema
Expand Down
2 changes: 0 additions & 2 deletions build/variables.bzl
Original file line number Diff line number Diff line change
Expand Up @@ -127,8 +127,6 @@ SIMULATOR_K8S_SETTINGS = struct(
edp5_cert_name = "$(edp5_cert_name)",
edp6_name = "$(edp6_name)",
edp6_cert_name = "$(edp6_cert_name)",
bigquery_dataset = "$(bigquery_dataset)",
bigquery_table = "$(bigquery_table)",
)

# Settings for Grafana Kubernetes deployments.
Expand Down
36 changes: 10 additions & 26 deletions docs/gke/correctness-test.md
Original file line number Diff line number Diff line change
Expand Up @@ -67,37 +67,21 @@ kubectl apply -k src/main/k8s/dev/kingdom
## Deploy EDP simulators

See the [simulator deployment guide](simulator-deployment.md). The test assumes
that there are valid events in the range `[2021-03-15, 2021-03-17]`. The
synthetic generator variant assumes that the event message type is
`wfa.measurement.api.v2alpha.event_templates.testing.TestEvent`, and the
BigQuery variant assumes the event message type is `halo_cmm.uk.pilot.Event`.
that there are valid events in the range `[2021-03-15, 2021-03-17]`. The test
assumes that the event message type is
`wfa.measurement.api.v2alpha.event_templates.testing.TestEvent`.

## Run the correctness test

Run the following, substituting your own values:

* Synthetic generator

```shell
bazel test //src/test/kotlin/org/wfanet/measurement/integration/k8s:SyntheticGeneratorCorrectnessTest
--test_output=streamed \
--define=kingdom_public_api_target=v2alpha.kingdom.dev.halo-cmm.org:8443 \
--define=mc_name=measurementConsumers/Rcn7fKd25C8 \
--define=mc_api_key=W9q4zad246g
```

* BigQuery

```shell
bazel test //src/test/kotlin/org/wfanet/measurement/integration/k8s:BigQueryCorrectnessTest
--test_output=streamed \
--define=kingdom_public_api_target=v2alpha.kingdom.dev.halo-cmm.org:8443 \
--define=mc_name=measurementConsumers/Rcn7fKd25C8 \
--define=mc_api_key=W9q4zad246g \
--define=google_cloud_project=halo-cmm-demo \
--define=bigquery_dataset=demo \
--define=bigquery_table=labelled_events
```
```shell
bazel test //src/test/kotlin/org/wfanet/measurement/integration/k8s:SyntheticGeneratorCorrectnessTest \
--test_output=streamed \
--define=kingdom_public_api_target=v2alpha.kingdom.dev.halo-cmm.org:8443 \
--define=mc_name=measurementConsumers/Rcn7fKd25C8 \
--define=mc_api_key=W9q4zad246g
```

The time the test takes depends on the size of the data set. With the default
synthetic generator configuration, this is about an hour. Eventually, you should
Expand Down
182 changes: 43 additions & 139 deletions docs/gke/simulator-deployment.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,60 +15,13 @@ See [Machine Setup](machine-setup.md).

## Configure event data source

There are two data sources that can be used:

1. Synthetic generator

Events are generated according to
[simulator synthetic data specifications](../../src/main/proto/wfa/measurement/api/v2alpha/event_group_metadata/testing/simulator_synthetic_data_spec.proto),
consisting of a single `SyntheticPopulationSpec` and a
`SyntheticEventGroupSpec` for each `EventGroup`. There are default
specifications included, but you can replace these with your own after
before you apply the K8s Kustomization.

This data source supports any event message type.

2. BigQuery table

Events are read from a Google Cloud BigQuery table. See the section below on
how to populate the table.

This data source currently only supports the `halo_cmm.uk.pilot.Event`
message type.

### Populate BigQuery table

The BigQuery table schema has the following columns:

* `date`
* Type: `DATE`
* `publisher_id`
* Type: `INTEGER`
* `vid`
* Type: `INTEGER`
* `digital_video_completion_status`
* Type: `STRING`
* Values:
* `0% - 25%`
* `25% - 50%`
* `50% - 75%`
* `75% - 100%`
* `100%`
* `viewability`
* Type: `STRING`
* Values:
* `viewable_0_percent_to_50_percent`
* `viewable_50_percent_to_100_percent`
* `viewable_100_percent`

The `dev` configuration expects a table named `labelled_events` in a dataset
named `demo` in the `us-central1` region. The table can be created in the
[Google Cloud Console](https://console.cloud.google.com/bigquery), specifying a
CSV file with automatic schema detection.

The
[`uk-pilot-synthetic-data-gen` script](https://github.com/world-federation-of-advertisers/uk-pilot-synthetic-data-gen)
may be helpful in generating a CSV file with test events.
Events are generated according to
[simulator synthetic data specifications](../../src/main/proto/wfa/measurement/api/v2alpha/event_group_metadata/testing/simulator_synthetic_data_spec.proto),
consisting of a single `SyntheticPopulationSpec` and a `SyntheticEventGroupSpec`
for each `EventGroup`. There are default specifications included, but you can
replace these with your own after before you apply the K8s Kustomization.

This data source supports any event message type.

## Provision Google Cloud Project infrastructure

Expand All @@ -83,7 +36,7 @@ Applying the Terraform configuration will create a new cluster. You can use the
gcloud container clusters get-credentials simulators
```

## Build and push container image (optional)
## Build and push container image (not recommended)

If you aren't using pre-built release images, you can build the image yourself
from source and push them to a container registry. For example, if you're using
Expand All @@ -95,99 +48,50 @@ The build target to use depends on the event data source. Assuming a project
named `halo-cmm-demo` and an image tag `build-0001`, run the following to build
and push the image:

* Synthetic generator

```shell
bazel run -c opt //src/main/docker:push_synthetic_generator_edp_simulator_runner_image \
--define container_registry=gcr.io \
--define image_repo_prefix=halo-cmm-demo --define image_tag=build-0001
```

* BigQuery

```shell
bazel run -c opt //src/main/docker:push_bigquery_edp_simulator_runner_image \
--define container_registry=gcr.io \
--define image_repo_prefix=halo-cmm-demo --define image_tag=build-0001
```
```shell
bazel run -c opt //src/main/docker:push_synthetic_generator_edp_simulator_runner_image \
--define container_registry=gcr.io \
--define image_repo_prefix=halo-cmm-demo --define image_tag=build-0001
```

## Generate K8s Kustomization

Run the following, substituting your own values:

* Synthetic generator

```shell
bazel build //src/main/k8s/dev:synthetic_generator_edp_simulators.tar \
--define=kingdom_public_api_target=v2alpha.kingdom.dev.halo-cmm.org:8443 \
--define=worker1_id=worker1
--define=worker1_public_api_target=public.worker1.dev.halo-cmm.org:8443 \
--define=worker2_id=worker2
--define=worker2_public_api_target=public.worker2.dev.halo-cmm.org:8443 \
--define=mc_name=measurementConsumers/TGWOaWehLQ8 \
--define=edp1_name=dataProviders/HRL1wWehTSM \
--define=edp1_cert_name=dataProviders/HRL1wWehTSM/certificates/HRL1wWehTSM \
--define=edp2_name=dataProviders/djQdz2ehSSE \
--define=edp2_cert_name=dataProviders/djQdz2ehSSE/certificates/djQdz2ehSSE \
--define=edp3_name=dataProviders/SQ99TmehSA8 \
--define=edp3_cert_name=dataProviders/SQ99TmehSA8/certificates/SQ99TmehSA8 \
--define=edp4_name=dataProviders/TBZkB5heuL0 \
--define=edp4_cert_name=dataProviders/TBZkB5heuL0/certificates/TBZkB5heuL0 \
--define=edp5_name=dataProviders/HOCBxZheuS8 \
--define=edp5_cert_name=dataProviders/HOCBxZheuS8/certificates/HOCBxZheuS8 \
--define=edp6_name=dataProviders/VGExFmehRhY \
--define=edp6_cert_name=dataProviders/VGExFmehRhY/certificates/VGExFmehRhY \
--define container_registry=gcr.io \
--define image_repo_prefix=halo-cmm-demo --define image_tag=build-0001
```

The resulting archive will contain `SyntheticEventGroupSpec` messages in
text format under `src/main/k8s/dev/synthetic_generator_config_files/`.
These can be replaced in order to customize the synthetic generator.

* BigQuery

```shell
bazel build //src/main/k8s/dev:bigquery_edp_simulators.tar \
--define=kingdom_public_api_target=v2alpha.kingdom.dev.halo-cmm.org:8443 \
--define=worker1_id=worker1
--define=worker1_public_api_target=public.worker1.dev.halo-cmm.org:8443 \
--define=worker2_id=worker2
--define=worker2_public_api_target=public.worker2.dev.halo-cmm.org:8443 \
--define=mc_name=measurementConsumers/TGWOaWehLQ8 \
--define=edp1_name=dataProviders/HRL1wWehTSM \
--define=edp1_cert_name=dataProviders/HRL1wWehTSM/certificates/HRL1wWehTSM \
--define=edp2_name=dataProviders/djQdz2ehSSE \
--define=edp2_cert_name=dataProviders/djQdz2ehSSE/certificates/djQdz2ehSSE \
--define=edp3_name=dataProviders/SQ99TmehSA8 \
--define=edp3_cert_name=dataProviders/SQ99TmehSA8/certificates/SQ99TmehSA8 \
--define=edp4_name=dataProviders/TBZkB5heuL0 \
--define=edp4_cert_name=dataProviders/TBZkB5heuL0/certificates/TBZkB5heuL0 \
--define=edp5_name=dataProviders/HOCBxZheuS8 \
--define=edp5_cert_name=dataProviders/HOCBxZheuS8/certificates/HOCBxZheuS8 \
--define=edp6_name=dataProviders/VGExFmehRhY \
--define=edp6_cert_name=dataProviders/VGExFmehRhY/certificates/VGExFmehRhY \
--define container_registry=gcr.io \
--define=google_cloud_project=halo-cmm-demo \
--define=bigquery_dataset=demo \
--define=bigquery_table=labelled_events \
--define image_repo_prefix=halo-cmm-demo --define image_tag=build-0001
```
```shell
bazel build //src/main/k8s/dev:synthetic_generator_edp_simulators.tar \
--define=kingdom_public_api_target=v2alpha.kingdom.dev.halo-cmm.org:8443 \
--define=worker1_id=worker1
--define=worker1_public_api_target=public.worker1.dev.halo-cmm.org:8443 \
--define=worker2_id=worker2
--define=worker2_public_api_target=public.worker2.dev.halo-cmm.org:8443 \
--define=mc_name=measurementConsumers/TGWOaWehLQ8 \
--define=edp1_name=dataProviders/HRL1wWehTSM \
--define=edp1_cert_name=dataProviders/HRL1wWehTSM/certificates/HRL1wWehTSM \
--define=edp2_name=dataProviders/djQdz2ehSSE \
--define=edp2_cert_name=dataProviders/djQdz2ehSSE/certificates/djQdz2ehSSE \
--define=edp3_name=dataProviders/SQ99TmehSA8 \
--define=edp3_cert_name=dataProviders/SQ99TmehSA8/certificates/SQ99TmehSA8 \
--define=edp4_name=dataProviders/TBZkB5heuL0 \
--define=edp4_cert_name=dataProviders/TBZkB5heuL0/certificates/TBZkB5heuL0 \
--define=edp5_name=dataProviders/HOCBxZheuS8 \
--define=edp5_cert_name=dataProviders/HOCBxZheuS8/certificates/HOCBxZheuS8 \
--define=edp6_name=dataProviders/VGExFmehRhY \
--define=edp6_cert_name=dataProviders/VGExFmehRhY/certificates/VGExFmehRhY \
--define container_registry=gcr.io \
--define image_repo_prefix=halo-cmm-demo --define image_tag=build-0001
```

The resulting archive will contain `SyntheticEventGroupSpec` messages in text
format under `src/main/k8s/dev/synthetic_generator_config_files/`. These can be
replaced in order to customize the synthetic generator.

Extract the generated archive to some directory.

## Apply K8s Kustomization

From the Kustomization directory, run

* Synthetic generator

```shell
kubectl apply -k src/main/k8s/dev/synthetic_generator_edp_simulators
```

* BigQuery

```shell
kubectl apply -k src/main/k8s/dev/bigquery_edp_simulators
```
```shell
kubectl apply -k src/main/k8s/dev/synthetic_generator_edp_simulators
```
10 changes: 0 additions & 10 deletions src/main/docker/images.bzl
Original file line number Diff line number Diff line change
Expand Up @@ -91,11 +91,6 @@ COMMON_IMAGES = [
image = "//src/main/kotlin/org/wfanet/measurement/loadtest/panelmatchresourcesetup:panel_match_resource_setup_runner_image",
repository = _PREFIX + "/loadtest/panel-match-resource-setup",
),
struct(
name = "csv_edp_simulator_runner_image",
image = "//src/main/kotlin/org/wfanet/measurement/loadtest/dataprovider:csv_edp_simulator_runner_image",
repository = _PREFIX + "/simulator/csv-edp",
),
struct(
name = "synthetic_generator_edp_simulator_runner_image",
image = "//src/main/kotlin/org/wfanet/measurement/loadtest/dataprovider:synthetic_generator_edp_simulator_runner_image",
Expand Down Expand Up @@ -141,11 +136,6 @@ GKE_IMAGES = [
image = "//src/main/kotlin/org/wfanet/measurement/duchy/deploy/gcloud/job/mill/shareshuffle:gcs_honest_majority_share_shuffle_mill_job_image",
repository = _PREFIX + "/duchy/honest-majority-share-shuffle-mill",
),
struct(
name = "bigquery_edp_simulator_runner_image",
image = "//src/main/kotlin/org/wfanet/measurement/loadtest/dataprovider:bigquery_edp_simulator_runner_image",
repository = _PREFIX + "/simulator/bigquery-edp",
),
struct(
name = "duchy_gcloud_postgres_update_schema_image",
image = "//src/main/kotlin/org/wfanet/measurement/duchy/deploy/gcloud/postgres/tools:update_schema_image",
Expand Down
25 changes: 0 additions & 25 deletions src/main/k8s/dev/BUILD.bazel
Original file line number Diff line number Diff line change
Expand Up @@ -383,31 +383,6 @@ EDP_SIMULATOR_TAGS = {
"google_cloud_project": GCLOUD_SETTINGS.project,
}

cue_dump(
name = "bigquery_edp_simulator_gke",
srcs = ["bigquery_edp_simulator_gke.cue"],
cue_tags = dict(EDP_SIMULATOR_TAGS.items() + {
"bigquery_dataset": SIMULATOR_K8S_SETTINGS.bigquery_dataset,
"bigquery_table": SIMULATOR_K8S_SETTINGS.bigquery_table,
}.items()),
tags = ["manual"],
deps = [":edp_simulator_gke"],
)

kustomization_dir(
name = "bigquery_edp_simulators",
testonly = True,
srcs = [
"resource_requirements.yaml",
":bigquery_edp_simulator_gke",
],
generate_kustomization = True,
tags = ["manual"],
deps = [
"//src/main/k8s/testing/secretfiles:kustomization",
],
)

cue_dump(
name = "synthetic_generator_edp_simulator_gke",
srcs = ["synthetic_generator_edp_simulator_gke.cue"],
Expand Down
46 changes: 0 additions & 46 deletions src/main/k8s/dev/bigquery_edp_simulator_gke.cue

This file was deleted.

2 changes: 1 addition & 1 deletion src/main/k8s/edp_simulator.cue
Original file line number Diff line number Diff line change
Expand Up @@ -110,7 +110,7 @@ import "list"
"\(deployment._name)": {
_app_label: deployment.spec.template.metadata.labels.app
_egresses: {
// Need to be able to access Kingdom and BigQuery.
// Need to be able to access Kingdom.
any: {}
}
}
Expand Down
Loading

0 comments on commit c65c61f

Please sign in to comment.